What coverage depth is needed for WES?

Germline WES typically requires 80–100× mean depth to achieve reliable variant calls across the exome, particularly in GC-rich regions that are harder to capture. Somatic WES of tumor-normal pairs often uses 150–200× or higher to sensitively detect low-allele-frequency somatic mutations in heterogeneous tumor samples. Clinical diagnostic WES typically mandates ≥95% of target bases covered at ≥20×.

Whole Exome Sequencing (WES): Definition & Applications

Table of content

What is Whole Exome Sequencing?
Understanding the Exome
WES vs. WGS: Which to Choose?
Step-by-Step WES Workflow
WES Bioinformatics Pipeline
Applications of WES in Life Sciences
WES in Rare & Undiagnosed Disease
WES in Oncology & Biomarker Discovery
WES & Pharmacogenomics
Limitations & Considerations
How Excelra Supports WES Projects
Conclusion
Frequently Asked Questions (FAQ)

QUICK DEFINITION

Whole Exome Sequencing (WES) is a target-enriched next-generation sequencing (NGS) method that selectively captures and sequences all protein-coding regions of the genome, collectively known as the exome. While the exome comprises roughly 180,000 exons representing only 1–2% of the total human genome (~30–50 megabases), it uniquely harbors approximately 85% of all known disease-causing mutations.

Key takeaways

Targeted Efficiency: WES isolates functional coding regions by utilizing specific oligonucleotide capture chemistry before sequencing, optimizing cost and runtime.
High Diagnostic Yield: By focusing where pathogenic variations are densest, WES delivers an impressive 25–35% clinical diagnostic yield for rare and Mendelian genetic conditions.
Variant Coverage: It cleanly maps out coding single nucleotide variants (SNVs), small frameshift indels, splice-site variants, and targeted exonic copy number variations (CNVs).
Cost-Effective Scaling: WES runs roughly 3–5× cheaper than equivalent WGS workflows, producing tighter, highly manageable data payloads (~8–15 GB per sample).
Precision Oncology Value: Somatic cancer evaluations routinely employ WES at 150–200× depth to calculate Tumor Mutational Burden (TMB) and reveal actionable drug targets.
Structural Limitations: WES cannot reliably identify structural rearrangements, deep intronic mutations, or variants located in non-coding regulatory machinery.

What is whole exome sequencing (WES)?

Whole exome sequencing (WES) is a next-generation sequencing (NGS) technique that selectively captures and sequences all protein-coding regions of the genome — collectively known as the exome. Whole exome sequencing covers approximately 180,000 exons across ~20,000 protein-coding genes, representing roughly 1–2% of the total human genome (around 30–50 megabases of sequence). Despite this small footprint, the exome harbors approximately 85% of all known disease-causing mutations — making whole exome sequencing one of the most efficient and cost-effective strategies for identifying pathogenic variants in clinical research and drug discovery.

WES is enabled by Next Generation Sequencing (NGS) platforms combined with targeted capture chemistry: oligonucleotide probes
complementary to all known exonic sequences selectively hybridize to and enrich exonic DNA from a total genomic DNA library, and the enriched fraction is sequenced at high depth. This target enrichment step distinguishes whole exome sequencing from Whole Genome Sequencing (WGS) — which sequences the entire genome — and from targeted gene panels, which cover only a pre-defined set of genes.

WES detects a range of variant types, including:

Single Nucleotide Variants (SNVs) — missense, nonsense, and synonymous substitutions in coding regions
Small insertions and deletions (indels) — frameshift mutations affecting protein function
Splice-site variants — mutations at exon-intron boundaries affecting RNA splicing
Copy Number Variants (CNVs) — exonic duplications and deletions (with dedicated CNV callers)
In-frame indels — insertions/deletions that do not disrupt the reading frame but alter protein sequence

Understanding the exome in whole exome sequencing

The human genome is approximately 3.2 billion base pairs in length, but only a small fraction of it directly encodes proteins. Genes are organized into exons (protein-coding sequences) and introns (non-coding intervening sequences). When a gene is transcribed into pre-mRNA, the introns are spliced out and the exons are joined to form mature mRNA, which is then translated into protein.

The exome is the collective term for all exonic sequences across the genome. In humans, this totals approximately 30–50 megabases — roughly 1% of the genome. Despite this small footprint, the exome is where the genetic instructions for virtually every protein in the body are encoded, making it the most functionally dense region of the genome.

The high concentration of disease-causing mutations in the exome reflects the fact that protein-coding variants — particularly loss-of-function mutations, dominant-negative variants, and gain-of-function missense mutations — are the primary molecular mechanisms underlying most Mendelian genetic disorders and a large proportion of cancer driver mutations. Non-coding regulatory variants, which WES misses, also contribute to disease but are typically smaller in effect size and harder to interpret functionally.

WES vs. WGS: Which to choose?

The choice between WES and WGS is one of the most important decisions in genomics study design. The right answer depends on research objectives, sample type, budget, and the variant types of interest.

WES vs. WGS — Decision Guide
Factor	Choose WES	Choose WGS
Primary research question	Protein-altering coding variants; rare disease; clinical diagnostics	Novel variant discovery; regulatory variants; structural variants; metagenomics
Cost per sample	Lower (~$200–500 depending on scale)	Higher (~$600–1,500 depending on depth)
Data size per sample	~8–15 GB (100× depth)	~100–150 GB (30× depth)
Non-coding region coverage	Minimal (only adjacent UTR/splice sites)	Complete — all introns, regulatory, and intergenic regions
Structural variant detection	Limited — exonic CNVs only	Comprehensive — SVs, CNVs, translocations, inversions
Sequencing depth required	80–150× (clinical); 100–200× (somatic)	30× (germline); 60–100× (somatic)
Established clinical use	Yes — widely validated for rare disease and cancer	Growing — clinical WGS programs expanding globally
FFPE sample compatibility	Good — well-optimized for degraded DNA	Moderate — requires higher-quality input for structural SV detection

For many clinical and research applications, WES represents the optimal balance of cost, data manageability, and diagnostic yield. As sequencing costs continue to fall, WGS is increasingly viable even in clinical settings — but WES remains the workhorse of rare disease diagnostics and cancer biomarker programs globally.

Step-by-Step whole exome sequencing workflow

The WES wet-lab workflow proceeds from biological sample to sequencing-ready library through a series of carefully controlled steps. Each phase introduces potential sources of bias or error that must be monitored through rigorous quality control.

1. Sample collection & DNA extraction

Genomic DNA is extracted from the biological sample of interest — most commonly peripheral blood (for germline studies), fresh-frozen tumor tissue (for somatic cancer studies), or FFPE tissue blocks (for retrospective clinical studies). DNA quality is assessed by Qubit fluorometry (concentration), Bioanalyzer or TapeStation (fragment integrity), and A260/A280 spectrophotometry (purity). FFPE samples require additional consideration due to DNA fragmentation and formalin-induced crosslinking, which can introduce artefactual variants if not properly accounted for in the bioinformatics pipeline.

2. Library preparation

Extracted DNA is fragmented (to ~150–250 bp by sonication or enzymatic methods), end-repaired, A-tailed, and ligated with indexed sequencing adapters. Unique Molecular Identifiers (UMIs) may be incorporated at this stage to enable accurate deduplication and error correction, particularly important for somatic WES where low-allele-frequency variants must be distinguished from sequencing errors. The adapter-ligated library undergoes initial amplification to generate sufficient material for the capture step.

3. Exome capture (Target enrichment)

This is the step that distinguishes WES from WGS. The genomic DNA library is hybridized with biotinylated RNA or DNA oligonucleotide probes that are complementary to all known exonic sequences (based on reference annotations such as GENCODE or RefSeq). These probes selectively capture exonic fragments, which are then pulled down using streptavidin-coated magnetic beads and released. The most widely used commercial capture kits include Illumina’s Nextera Exome, Agilent’s SureSelect, and Twist Bioscience’s Human Core Exome — each covering slightly different genomic territories and numbers of exons.

4. Post-Capture amplification & quality control

The captured exonic library is amplified by PCR and assessed for quality: fragment size distribution (Bioanalyzer), library concentration (Qubit), and exome content enrichment (qPCR of known target regions). A well-prepared WES library should show a tight fragment size distribution and high on-target enrichment rate (typically >70–80% of reads mapping to the capture bait regions).

5. High-Throughput sequencing

The final library is loaded onto an NGS sequencer — most commonly an Illumina platform (NovaSeq 6000/X, NextSeq) — and sequenced using paired-end reads. Standard germline WES uses 2×100 bp or 2×150 bp paired-end reads at 80–100× mean target coverage. Somatic tumor-normal WES typically uses 150–200× or higher. For clinical diagnostic WES, quality metrics such as percent of target bases at ≥20× coverage and uniformity across capture regions are carefully monitored against pre-defined thresholds.

6. Bioinformatics analysis & interpretation

Raw sequencing reads are processed through a bioinformatics pipeline for alignment, variant calling, annotation, and interpretation. This is described in detail in the WES Bioinformatics Pipeline section below.

Whole exome sequencing bioinformatics pipeline

WES bioinformatics analysis transforms raw sequencing reads into a list of candidate genetic variants with functional and clinical context. Excelra’s bioinformatics team and OP² Online Pipeline Platform deliver validated, scalable WES pipelines aligned with GATK best practices and configurable for both research and clinical applications.

Read quality control & trimming

Raw FASTQ reads are assessed using FastQC for per-base quality scores, adapter contamination, GC content bias, and duplication rates. Trimming tools (Trimmomatic, Fastp) remove adapter sequences and low-quality bases. Particular attention is paid to capture-related artefacts: over-represented sequences near probe junctions, and GC-content bias introduced by differential hybridization efficiency across GC-rich and GC-poor exons.

Alignment to reference genome

Trimmed reads are aligned to the human reference genome (GRCh38) using BWA-MEM, which handles the gapped alignments needed to correctly map reads spanning splice junctions and short indels. Alignment is followed by coordinate-sorted BAM file generation using samtools. On-target alignment statistics — including the percentage of reads mapping to capture bait regions, mean target depth, and uniformity metrics (e.g., fold-80 penalty) — are calculated at this stage.

Duplicate marking & BQSR

PCR duplicates are flagged using Picard MarkDuplicates or samtools markdup. For UMI-containing libraries, UMI-aware deduplication tools (fgbio, UMI-tools) are used instead. Base quality score recalibration (BQSR) using GATK corrects systematic biases in the quality scores assigned by the sequencer — improving the accuracy of downstream variant calling, particularly for low-allele-frequency somatic variants.

Variant calling

The cornerstone of the WES bioinformatics pipeline. For germline variant calling, GATK HaplotypeCaller is the gold standard — reassembling reads in each active genomic region using a local de Bruijn graph to identify SNVs and indels with high sensitivity and specificity. For somatic variant calling in tumor-normal pairs, Mutect2 (GATK) or Strelka2 are the preferred tools, incorporating tumor-specific allele frequency models and artifact filters. This connects directly to Excelra’s expertise in NGS data analysis and variant calling.

Variant annotation

Called variants are annotated with biological and clinical context using VEP (Ensembl Variant Effect Predictor), ANNOVAR, or SnpEff. Annotation layers include:

Functional impact predictions (synonymous, missense, frameshift, splice-altering, stop-gain)
In silico pathogenicity scores (CADD, REVEL, SIFT, PolyPhen-2)
Population allele frequencies (gnomAD v4, 1000 Genomes, ClinVar)
Clinical significance classifications (ClinVar pathogenic/likely pathogenic/VUS)
Cancer-specific databases (COSMIC, OncoKB, CGI)
Pharmacogenomic databases (PharmGKB, CPIC)

Variant filtering & prioritization

Raw WES typically calls 20,000–100,000 variants per sample. Systematic filtering reduces this to a manageable candidate list. Common filters include: VQSR (Variant Quality Score Recalibration) or hard-filter thresholds; minor allele frequency cutoffs from gnomAD (e.g., MAF <0.01 for rare disease); functional impact filters (retaining only protein-altering variants); and disease-specific phenotype-to-genotype matching using tools like PhenIX or LIRICAL. For somatic studies, tumor-specific filters remove germline polymorphisms and sequencing artefacts.

Downstream analysis & reporting

Prioritized variants are reviewed in clinical or research context: manual inspection in IGV (Integrative Genomics Viewer), orthogonal validation planning (Sanger sequencing, ddPCR), and clinical report generation for diagnostic WES. For research WES, downstream analyses may include burden testing across cohorts, gene-set enrichment analysis, somatic signature extraction, and integration with transcriptomic or clinical data through multi-omics analysis frameworks.

Key applications of whole exome sequencing in life sciences

WES has become one of the most widely deployed genomics technologies in both research and clinical settings. Its combination of comprehensive coding variant detection, manageable data volumes, and relatively accessible cost has enabled its adoption across a wide spectrum of applications.

Mendelian & rare genetic disease diagnosis

WES is the most impactful clinical application of exome sequencing. For patients with suspected Mendelian (single-gene) disorders who have not received a molecular diagnosis through standard genetic testing, WES achieves diagnostic yields of 25–35% — and higher in pediatric neurology and metabolic disease cohorts. Because WES interrogates all ~20,000 protein-coding genes simultaneously, it can identify causative variants in novel or unexpected genes that would not be included in targeted panels.

Cancer genomics

In oncology, WES of matched tumor-normal pairs identifies somatic driver mutations, tumor mutational burden (TMB), microsatellite instability (MSI), and clonal evolution patterns. Actionable somatic mutations identified through WES directly inform treatment decisions — including the selection of targeted therapies (e.g., BRAF inhibitors for BRAF V600E-mutant tumors) and eligibility for immune checkpoint inhibitor therapy based on TMB-high or MSI-high status.

Population genetics

Large-scale WES cohorts — including the UK Biobank Exome Sequencing Project (500,000 participants) and the NHLBI TOPMed program — provide unprecedented statistical power to identify rare coding variants associated with common complex diseases. These population WES datasets are transforming drug target identification by linking human genetic loss-of-function evidence to disease phenotypes in a way that predicts therapeutic benefit.

Infectious disease

Host WES is used to identify germline variants that affect susceptibility to infectious diseases — including primary immunodeficiency variants that predispose to severe bacterial or viral infections. During the COVID-19 pandemic, WES of severe COVID-19 patients identified loss-of-function variants in innate immune pathway genes (TLR7, IRF7) as risk factors for life-threatening disease, opening avenues for targeted therapeutic intervention.

Whole exome sequencing in rare & undiagnosed disease

Rare diseases collectively affect approximately 300 million people worldwide, with most cases caused by mutations in protein-coding genes. WES has fundamentally changed the diagnostic trajectory for rare disease patients — compressing what was previously a “diagnostic odyssey” spanning years and dozens of inconclusive tests into a single genomic test that can deliver a molecular diagnosis within weeks.

Trio WES (Proband + Parents)

Trio WES — sequencing the affected child (proband) together with both biological parents — is the most diagnostically powerful WES design for pediatric rare disease. By comparing the child’s variants to the parental genotypes, de novo mutations (which arise spontaneously in the proband and are absent in both parents) can be identified and prioritized. De novo mutations in developmentally critical genes are a major cause of severe pediatric neurodevelopmental disorders, intellectual disability, and congenital anomalies.

Variant classification & VUS resolution

One of the most clinically challenging aspects of diagnostic WES is the classification of variants of uncertain significance (VUS). Rigorous VUS classification uses the ACMG/AMP five-tier framework (pathogenic, likely pathogenic, VUS, likely benign, benign), incorporating: population frequency data from gnomAD; functional evidence from protein structure predictions and in vitro assays; segregation data from family members; and disease-specific literature curation. Excelra’s expert data curation team supports systematic variant evidence collection and classification.

Novel gene discovery

WES of large rare disease cohorts enables the discovery of novel disease genes — genes not previously associated with any human disorder. Computational approaches such as burden testing (testing whether a specific gene has a statistically significant excess of rare damaging variants in patients vs. controls) and gene network analysis accelerate novel gene discovery. Each new gene-disease association expands the diagnostic reach of WES in future patients and may reveal new therapeutic targets.

Whole exome sequencing in oncology & biomarker discovery

Cancer is fundamentally a disease of the genome, driven by the accumulation of somatic mutations across a cell’s lifetime. WES provides a comprehensive view of the exome-wide somatic mutation landscape in tumor tissue — delivering clinically actionable information for treatment selection, prognosis, and clinical trial enrollment.

Somatic mutation profiling

WES of tumor-normal matched pairs identifies the complete landscape of somatic coding mutations: missense mutations (amino acid substitutions), nonsense mutations (premature stop codons), frameshift indels, and splice-site variants. Driver mutations in genes such as TP53, KRAS, PIK3CA, EGFR, BRAF, and BRCA1/2 are identified and classified against curated cancer knowledge bases (OncoKB, CGI, COSMIC) to assign therapeutic relevance.

Tumor mutational burden (TMB)

TMB — defined as the number of somatic mutations per megabase of sequenced genome — is a clinically validated biomarker for response to immune checkpoint inhibitors (PD-1/PD-L1 blockers). WES provides a more comprehensive and accurate TMB calculation than targeted panel sequencing, covering more of the genome and reducing panel-specific biases. FDA-approved companion diagnostics for pembrolizumab (Keytruda) use TMB-high as a tumor-agnostic biomarker for immunotherapy eligibility.

Microsatellite instability (MSI)

MSI — the hypermutation phenotype caused by DNA mismatch repair deficiency — can be detected from WES data using tools such as MSISensor or MANTIS. MSI-high tumors are strong predictors of response to immune checkpoint inhibition across tumor types, and WES-based MSI detection is an alternative to PCR-based or immunohistochemistry-based MSI testing. Excelra’s biomarker capabilities are described in detail on our biomarker discovery page, and an example application is demonstrated in our WES biomarker signature case study.

HRD scoring & PARP inhibitor sensitivity

Homologous recombination deficiency (HRD) — caused by BRCA1/2 mutations or other HR pathway defects — creates a genomic instability signature detectable in WES data through large-scale state transitions, telomeric allelic imbalance, and loss of heterozygosity patterns. HRD scoring from WES data predicts sensitivity to PARP inhibitors and platinum-based chemotherapy, particularly in ovarian and breast cancers.

Whole exome sequencing & pharmacogenomics

WES provides comprehensive coverage of pharmacogenomically relevant coding variants — including those in cytochrome P450 (CYP) enzyme genes (CYP2D6, CYP2C19, CYP3A4), drug transporter genes (ABCB1, SLCO1B1), drug target genes, and HLA alleles associated with adverse drug reactions. This makes WES a powerful tool for pharmacogenomics research and clinical implementation.

By identifying metabolizer status (poor, intermediate, extensive, ultra-rapid) and HLA-based adverse reaction risk at the genome-wide level, WES enables precision prescribing decisions — matching the right drug at the right dose to the right patient. In oncology, pharmacogenomic insights from WES inform dosing of chemotherapeutic agents with narrow therapeutic windows and high inter-individual variability (e.g., 5-fluorouracil toxicity risk from DPYD variants; irinotecan toxicity from UGT1A1 status).

Limitations & considerations in WES

While WES is a powerful and versatile tool, it has important limitations that must be factored into study design and result interpretation.

Non-Coding Rregion blindness

WES misses all variants in non-coding regions — introns, regulatory elements (promoters, enhancers, silencers), long non-coding RNAs, and intergenic sequences. It is estimated that 15–30% of disease-causing variants may lie in non-coding regulatory sequences that WES does not capture. For diseases where non-coding variants are likely contributors, WGS is the more appropriate technology.

Uneven capture efficiency

Exome capture efficiency is not uniform across all exons. Regions with extreme GC content (very high or very low) are systematically under-represented in WES data due to differential hybridization efficiency and PCR amplification bias. This can result in inadequate coverage of clinically important exons, necessitating careful quality monitoring and potential orthogonal testing of uncovered regions.

Structural variant limitations

Large structural variants — including chromosomal translocations, large inversions, and events primarily in non-coding regions — are largely invisible to WES. Even for exonic copy number variants, WES-based CNV detection has lower resolution and sensitivity than array CGH or WGS-based approaches. For applications where structural variants are clinically or biologically important (e.g., leukemia, sarcoma, developmental disorders caused by CNVs), WGS or orthogonal SV detection methods are required.

FFPE Sample challenges

Formalin fixation chemically modifies DNA, causing deamination of cytosines (producing C→T artefacts) and cross-linking that degrades DNA integrity. While WES can be performed on FFPE-derived DNA with appropriate protocols and bioinformatic artefact filters, FFPE samples consistently yield lower library complexity, higher duplication rates, and more sequencing artefacts than fresh-frozen or blood-derived DNA. Careful pre-analytical assessment of FFPE DNA quality and specialized artefact correction tools are essential.

Data governance & compliance

WES data contains highly sensitive genetic information. Storage, sharing, and analysis must comply with applicable data protection regulations (GDPR, HIPAA) and informed consent requirements. FAIR data principles and robust data management systems (SDMS) are essential for responsible WES data stewardship in both research and clinical contexts.

How Excelra supports WES projects

Excelra delivers comprehensive WES bioinformatics capabilities — from raw data processing through clinical-grade variant reporting — supported by validated pipelines, expert computational biologists, and scalable cloud infrastructure.

End-to-End WES Pipeline Development — custom germline and somatic WES pipelines aligned to GATK best practices, deployable via the OP² Online Pipeline Platform on AWS, Azure, or GCP
Variant Calling & Multi-Database Annotation — GATK HaplotypeCaller, Mutect2, and Strelka2 with annotation against ClinVar, gnomAD, COSMIC, OncoKB, and PharmGKB
Biomarker Discovery — WES-based TMB calculation, MSI detection, HRD scoring, and somatic signature analysis for oncology biomarker programs; explore our WES biomarker case study
Rare Disease Diagnostics Support — trio WES analysis, variant prioritization using HPO phenotype matching, and ACMG/AMP-aligned VUS classification
Multi-Omics Integration — integration of WES data with RNA-seq, proteomics, and clinical data for comprehensive multi-omics analysis
FAIR-Compliant Data Management — SDMS integration and genomic data lake design for WES datasets at scale
Cloud-Native Scalability — cloud enablement for cost-efficient WES analysis across hundreds to thousands of samples

See our capabilities in action: cloud deployment of a hospital-optimised WES pipeline and identification of genomic biomarkers for cell line differentiation.

Conclusion

Whole exome sequencing has established itself as one of the most powerful and practical tools in modern genomics — offering a cost-efficient, high-resolution view of the protein-coding genome that drives discovery in rare disease diagnostics, cancer genomics, pharmacogenomics, and population-scale research alike.

By focusing sequencing depth on the 1–2% of the genome that harbors ~85% of known disease-causing mutations, WES delivers maximum biological and clinical yield per sequencing dollar — making it the technology of choice for clinical diagnostic laboratories, translational research programs, and pharmaceutical drug discovery teams worldwide.

As sequencing technology continues to evolve, whole exome sequencing is not being replaced — it is being extended. Integration with RNA-seq, proteomics, and clinical phenotype data through multi-omics frameworks is unlocking new layers of biological insight from exome datasets. AI-powered variant interpretation tools are accelerating the resolution of variants of uncertain significance (VUS). And the emergence of long-read WES protocols is beginning to address long-standing limitations around complex structural regions and phasing.

For life sciences organizations managing large-scale WES programs — from raw data processing through clinical-grade variant reporting — having the right bioinformatics infrastructure, validated pipelines, and expert analytical support is critical to realizing the full value of exome sequencing data. Excelra’s end-to-end whole exome sequencing bioinformatics capabilities, delivered through the OP² Online Pipeline Platform and a team of expert computational biologists, are designed to meet exactly that need — at any scale, on any cloud, and for any application.

What is Whole Exome Sequencing (WES)?

Whole Exome Sequencing (WES) is an NGS technique that selectively captures and sequences all protein-coding exons in the genome (the exome), which comprise ~1–2% of total genomic DNA but contain roughly 85% of known disease-causing mutations. It is widely used in rare disease diagnosis, cancer genomics, and biomarker discovery.

How does WES differ from Whole Genome Sequencing (WGS)?

WES targets only the ~1–2% of the genome that encodes proteins, while WGS sequences the entire genome including introns, regulatory regions, and non-coding DNA. WES is approximately 3–5× less expensive and generates smaller datasets, but misses structural variants and non-coding regulatory mutations that WGS captures. WES is preferred for clinical diagnostics and rare disease; WGS for comprehensive discovery research.

What is the WES bioinformatics pipeline?

The WES bioinformatics pipeline includes: quality control and read trimming; alignment to the reference genome using BWA-MEM; duplicate marking and BQSR; variant calling (GATK HaplotypeCaller for germline; Mutect2 for somatic); multi-database annotation (ClinVar, gnomAD, COSMIC); and variant filtering and prioritization for biological or clinical interpretation.

What are the main applications of WES in life sciences?

WES is applied in: rare and Mendelian disease diagnosis (25–35% diagnostic yield); cancer genomics (somatic driver mutations, TMB, MSI, HRD); biomarker discovery for patient stratification; pharmacogenomics for drug metabolism and adverse reaction risk prediction; and population genetics studies.

What sequencing depth is needed for WES?

Germline WES typically requires 80–100× mean depth; somatic tumor-normal WES uses 150–200× or higher; clinical diagnostic WES mandates ≥95% of target bases covered at ≥20×.

Can WES detect structural variants?

WES has limited structural variant detection — it can identify exonic CNVs using dedicated tools but largely misses large chromosomal rearrangements, intronic SVs, and events in non-coding regions. For comprehensive SV detection, Whole Genome Sequencing (WGS) is required.

How is WES used for biomarker discovery?

WES identifies protein-coding mutations associated with disease phenotypes, drug response, or patient outcomes. In oncology, WES-derived biomarkers include somatic driver mutations, TMB, MSI status, and HRD scores. In rare disease, WES identifies causative germline variants serving as diagnostic biomarkers. Patient stratification based on WES-derived genomic signatures is increasingly embedded in clinical trial design.

Whole Exome Sequencing (WES)