What is viral genome reconstruction from RNA-Seq data?

Viral genome reconstruction from RNA-Seq data involves assembling viral genomes from host transcriptomic sequencing using quality filtering, assembly tools, and taxonomic classification. It enables detection, analysis, and monitoring of viral species, including novel or zoonotic viruses.

Why is taxonomic classification important in viral genomics?

Taxonomic classification helps determine the identity, lineage, and abundance of viral species in RNA-Seq datasets. It supports epidemiological surveillance, zoonotic threat detection, and viral evolution tracking.

How did Excelra achieve over 98% classification accuracy in this case study?

Excelra used a dual-path assembly strategy with MEGAHIT and coronaSPAdes, followed by taxonomic classification using Kraken2 and BLAST-based identity validation. This improved sensitivity, precision, and biological relevance, resulting in >98% classification accuracy.

How does Excelra support early detection of zoonotic threats?

Excelra’s automated pipeline identifies viral sequences from host-derived RNA-Seq data, detects potential zoonotic species, and provides high-quality annotated genome assemblies—supporting surveillance and risk mitigation with a 6–12 months lead time.

How can Excelra help accelerate virology research and vaccine development?

Excelra enables rapid viral genome reconstruction, taxonomic classification, and delivery of biologically validated contigs for downstream research, vaccine target identification, antiviral development, and regulatory submissions.

What scientific informatics services does Excelra offer for viral genomics?

Excelra provides viral genome assembly, metagenomic analysis, RNA-Seq processing, high-throughput classification, BLAST validation, and scalable pipeline automation for epidemiology, virology, and precision medicine applications.

Case studies

Development of Optimized Workflow for Viral Genome Reconstruction and Taxonomic Classification from RNA-Seq Data

Overview

We partnered with a pioneering biotech company focused on bat genomes to develop an automated, scalable workflow for viral genome assembly and taxonomic classification from Illumina RNA-Seq data. By integrating advanced bioinformatics workflow managers and a dual-path assembly strategy, the pipeline achieved over 95% sequence identity and 98% classification accuracy. This solution not only accelerated virus detection and genome reconstruction but also enhanced zoonotic surveillance, supporting early identification of potential spillover threats. The project underscores Excelra’s commitment to enabling faster, data-driven decisions in virology, public health, and vaccine development, in line with AI-driven data analysis for precision medicine.

Our client

Our client is a biotech innovator at the forefront of bat genome research, studying evolutionary adaptations. The client needed Excelra’s expertise to assemble viral genomes embedded within host transcriptomic RNA-Seq data and perform accurate taxonomic classification to generate high-quality reference sequences.

Client’s challenge

The early-stage biotech company faced challenges in automating the assembly of Illumina RNA-Seq reads from host species to construct reliable virus genome assemblies and perform taxonomic classification. Species like bats and swine act as natural reservoirs for diverse viruses, some with spillover potential, as highlighted during pandemics like COVID-19. RNA-Seq data from these hosts is a crucial resource for early virus identification and genomic data management.

Client’s goals

By analyzing RNA-Seq data across species, researchers can gain insights into which viruses may pose a risk of spillover, helping pharmaceutical companies and public health organizations prepare for emerging threats.

The client sought to feed this RNA-Seq data into a virus genome assembly pipeline to create an automated, scalable workflow. The resulting viral genome assemblies would not only support their current research studying taxonomic classification but also serve as high-quality reference sequences for future studies. The client’s goal was to study virus integration in bats constructing reliable virus genome assemblies from host-derived RNA-Seq data.

Client’s goals

Our approach

Excelra designed a robust and automated pipeline to process Illumina RNA-Seq reads and assemble viral genomes. The workflow leveraged multiple bioinformatics tools and databases to streamline assembly and taxonomic analysis:

Pre-processing

Raw sequencing reads were subjected to quality trimming.
This step removes low-quality bases and adapter contamination, ensuring that only high-quality, clean reads are used for downstream analysis.

Assembly

The cleaned reads were processed through two complementary assembly strategies to maximize the detection and reconstruction of viral genomes:

A general-purpose metagenome assembler was employed, ideal for generating a broad and comprehensive assembly of all organisms present in the sample.
A virus-focused assembler optimized for RNA viruses, was specifically used to reconstruct viral genomes with higher sensitivity, particularly those related to coronaviruses and similar viral pathogens.

Annotation & taxonomic classification

A fast and accurate taxonomic classifier, was used to identify and classify the microbial and viral components present in the sample, providing taxonomic labels to the assembled sequences.
BLAST analysis was then carried out on the assembled contigs from above processes to: Confirm the taxonomic identities and validate the presence of known or novel viral species through sequence similarity against reference databases.

Quality analysis & interpretation:

The final assemblies were evaluated for:

Taxonomic composition—to determine the diversity and abundance of organisms present.
Viral genome distribution—to assess the breadth of viral detection across different taxa.
Contig assembly quality—to ensure the assembled viral genomes are contiguous, complete, and biologically meaningful.

These evaluations helped to ensure confidence in the detection, classification, and downstream interpretation of viral and microbial sequences within the sample.

Our solution

Excelra successfully assembled virus genomes with high percentage identity to references using our workflow with Illumina RNA-Seq reads. This robust workflow enables comprehensive detection and reconstruction of viral sequences from complex sequencing datasets, ensuring both accuracy and biological relevance. The key components of the pipeline include:

High identity assemblies

Successfully reconstructed viral genomes with high sequence identity to known references using Illumina RNA-Seq data.

Comprehensive detection & reconstruction

Workflow ensures accurate identification and biologically relevant reconstruction of viral sequences from complex datasets.

Quality trimming & pre-processing

Applied stringent quality control.
Removed low-quality bases and adapter sequences using standard tools.

Dual-path assembly strategy

General-purpose assembler (e.g., MEGAHIT): Captured broad sequence diversity.
Virus-specific assembler (e.g., coronaSPAdes): Optimized for sensitive reconstruction of RNA viral genomes.

Taxonomic classification via Kraken2

High-performance k-mer-based classification of contigs.
Enabled detection of both known and novel viral species.

Downstream analysis included

BLAST validation for confirming viral identities.
Distribution mapping to assess viral abundance and prevalence.
Species-level detection to resolve closely related strains.

Final output

Delivered high-quality, assembled viral contigs with validated annotations and taxonomic labels—supporting virome studies, epidemiology, and functional research.

Key performance outcomes

>95% sequence identity achieved to known reference genomes, ensuring high-confidence detection of both known and novel viruses.

>98% taxonomic classification accuracy in identifying viral species and strains.

>3x improvement in assembly turnaround time compared to manual or semi-automated methods.

Early discovery of potential zoonotic threats, providing a 6–12 months lead time for risk mitigation compared to traditional surveillance.

Up to 70% reduction in manual curation effort, freeing up bioinformatics experts for higher-value tasks.

viral-genome-classification-pipeline-Value add

Strategic business impacts

Enhanced pandemic preparedness through high-throughput zoonotic surveillance
Faster vaccine and antiviral development powered by high-quality reference genomes
Creation of proprietary genomic assets for R&D, licensing, and partnership opportunities
Improved regulatory support via quality-controlled genome assemblies for submissions and surveillance

Conclusion

Excelra delivered a customized, automated bioinformatics pipeline that enabled accurate and efficient assembly of viral genomes from host-derived RNA-Seq data. The solution significantly improved processing speed, reproducibility, and scalability, while achieving >95% sequence identity to reference genomes and >98% taxonomic classification accuracy. Designed for modular integration, the pipeline is capable of processing >10,000 datasets annually with up to 70% reduction in manual curation. This empowers large-scale viral genomics efforts—particularly in zoonotic reservoirs like bats—supporting early detection of emerging threats with a 6–12 months lead time. Additional value delivered to the client includes accelerated vaccine and antiviral target discovery, proprietary data asset creation, and robust support for regulatory submissions.

Previous ProjectBuilding a Scalable Reference Framework for a High-Quality Bat Genome Assembly
Next ProjectRequest Management System

Development of Optimized Workflow for Viral Genome Reconstruction and Taxonomic Classification from RNA-Seq Data

Overview

Our client

Client’s challenge

Client’s goals

Client’s goals

Our approach

Pre-processing

Assembly

Annotation & taxonomic classification

Quality analysis & interpretation:

Our solution

High identity assemblies

Comprehensive detection & reconstruction

Quality trimming & pre-processing

Dual-path assembly strategy

Taxonomic classification via Kraken2

Downstream analysis included

Final output

Key performance outcomes

Strategic business impacts

Conclusion

Previous ProjectBuilding a Scalable Reference Framework for a High-Quality Bat Genome Assembly

Next ProjectRequest Management System

ABOUT US

USEFUL LINKS

OUR OFFICES

CONTACT US