How does Excelra ensure high accuracy in genome assembly?

Excelra integrates PacBio HiFi long-read sequencing and Hi-C scaffolding, followed by BUSCO analysis and k-mer validation to achieve >99.9% base-level accuracy and >98% genome completeness.

What is the role of Hi-C sequencing in genome assembly?

Hi-C sequencing captures chromatin contact maps, which help assemble contigs into chromosome-scale scaffolds, improving genome accuracy, structure, and continuity.

How did Excelra reduce manual effort by 75% in this case study?

Excelra implemented a modular and automated genome assembly pipeline using PacBio HiFi and Hi-C data integration, reducing manual intervention and accelerating early discovery workflows.

Can this genome assembly framework be adapted for viral genome analysis?

Yes, the pipeline is modular and supports viral genome assembly from Illumina RNA-Seq data, enabling rapid reference development for pathogens and zoonotic viruses.

What scientific informatics services does Excelra provide for genomics?

Excelra offers genome assembly, bioinformatics automation, multi-omics integration, validation-ready outputs, and scalable workflow development for genomic research.

How does Excelra support precision medicine through genomic data?

Excelra enables precision medicine by delivering accurate genome assemblies, identifying therapeutic targets, and supporting translational research through scientific informatics solutions.

Case studies

Building a Scalable Reference Framework for a High-Quality Bat Genome Assembly

Overview

Excelra partnered with a biotech innovator to develop a scalable, automated pipeline for high-quality bat genome assembly. Utilizing PacBio HiFi and Hi-C sequencing data, the pipeline achieved >98% genome completeness and >99.9% base-level accuracy. It integrated long-read and Hi-C data for chromosome-scale assemblies and was modularly designed to support viral genome assembly as well. The solution reduced manual effort by 75% and accelerated early discovery and therapeutic target identification. With validation-ready outputs and cross-platform compatibility, Excelra’s framework empowered the client to scale genomic research efficiently, enhancing their competitive edge in bat immunogenomics.

Our client

A biotech innovator at the forefront of bat genome research, studying evolutionary adaptations. By leveraging bat genomics, the client aimed to develop therapeutic strategies, uncovering unique biological traits and immune pathways. The client, in early discovery stages, required technical acceleration through bioinformatics automation. Excelra was tasked with providing a framework to automate the assembly of PacBio HiFi sequencing reads plus Hi-C data to construct reliable bat genome assemblies.

Client’s challenge

An early-stage biotech company had generated large datasets from bat genome sequencing. Bats, with over 1,400 species, show significant genome divergence, complicating comparative genomics and reference-guided assembly. High-quality de novo genome assembly using long-read sequencing technologies was critical to accurately capture shared and species-specific genomic features.

Related case study: Scalable Bat Genome Framework

Client’s goals

Wanted to build an efficient high-quality workflow for automating the assembly process of reference genome assemblies from scratch. They required a robust, automated pipeline to:

Assemble PacBio HiFi reads
Integrate Hi-C data for scaffolding
Enable consistent and scalable genome assembly workflows
Extend the framework to support virus genome assemblies from Illumina RNA-Seq data

Our approach

Excelra designed and deployed an end-to-end genome assembly pipeline tailored to high-accuracy PacBio HiFi sequencing data and Hi-C scaffolding, also supporting viral genome assembly from short-read data. The engagement aimed to standardize the de novo assembly pipeline for bat genomes using PacBio, 10X, Bionano, and Hi-C technologies.

Genome assembly workflow

Input data

PacBio HiFi BAM files for high-accuracy long-read sequencing.
Hi-C fastq files for capturing chromatin interaction patterns.

HiFi assembly:

Performed initial genome assembly using a tool, optimized for PacBio HiFi reads.
Cleaned the assembly to remove haplotypic duplications and improve contiguity.

Hi-C integration

Aligned Hi-C reads to the draft assembly using a tool for accurate mapping.
Processed the mapped Hi-C data to prepare scaffolding input.
Scaffolded the assembly using a tool, which utilizes Hi-C contact data to order and orient contigs into chromosome-scale scaffolds.

Assembly quality assessment

Conducted a comprehensive evaluation of the genome assembly using multiple tools to ensure accuracy, completeness, and structural integrity:

Generated key assembly statistics such as N50, total assembly length, and number of contigs/scaffolds, providing a snapshot of assembly contiguity.
BUSCO (Benchmarking Universal Single-Copy Orthologs) analysis was performed using the laurasiatheria_odb10 lineage dataset to assess the completeness of the assembly in terms of conserved gene content.
Estimated base-level accuracy and completeness using a k-mer-based approach, comparing the assembly to raw sequencing reads for validation.

Final outputs

A scaffolded genome in FASTA format, representing the assembled and Hi-C–scaffolded genome sequence at chromosome-level resolution.
K-mer multiplicity plots, used to visualize and validate the consistency between the raw read data and the final assembly, helping to identify potential errors, duplications, or missing regions.

Related whitepaper: Omics Data: A Biomedical Asset Driving the Future of Drug Discovery

Our solution

Delivered a modular, automated pipeline for de novo assembly of bat genomes using PacBio HiFi reads and Illumina Hi-C reads , enabling:

High-quality, reproducible reference genomes
Smooth integration of long-read and Hi-C data
Validation-ready outputs for downstream research. The assembly metrics, reads k-mer validation, and orthologous genes validation show that we were able to assemble a correct genome from HiFi and Hi-C reads.

Key performance outcomes

>98% assembly completeness, confirmed the presence of most conserved genes in the assembled genome.

>99.9% base-level accuracy, ensured high fidelity of the assembly with minimal sequencing or structural errors.

Reduced manual effort and turnaround time by 75%, enabling rapid generation of reference-quality genomes for multiple bat species.

Cross-platform data integration, reduced integration complexity and data processing costs.

Modular design for reusability, reproducible and scalable genome assembly workflows for future species or projects, eliminating the need to re-engineer pipelines.

Strategic business impacts

Accelerated early discovery pipeline: High-quality assemblies enabled faster comparative and functional genomics research.
Enhanced therapeutic target discovery: Accurate genomes unlocked unique biological traits for novel therapeutic target identification.
Future-ready infrastructure: Modular pipeline supports future genome assemblies for bats and related pathogens.
Improved data confidence for downstream analysis: Validation-ready outputs boosted confidence in downstream analyses with minimal additional processing.
Strengthened competitive advantage: Early access to high-fidelity genomes positioned the client ahead in bat immunogenomics research.

Conclusion

Excelra delivered a scalable, automated genome assembly pipeline, enabling a biotech client to generate high-quality, chromosome-scale bat genome assemblies with >98% completeness and >99.9% accuracy. The modular, cross-platform design allowed rapid scaling of future genome projects without re-engineering. This solution accelerated early discovery, enhanced therapeutic target identification, and provided validation-ready outputs, strengthening the client’s competitive position in bat immunogenomics research.

Related internal links: Pipeline for Immune Repertoire Data Analysis, Bioinformatics for Drug Discovery, Data-Driven Drug Discovery.

Previous ProjectDesigning a Role-Based OMICs Data Lake: Scalable Metadata Architecture for Pharmaceutical R&D
Next ProjectDevelopment of Optimized Workflow for Viral Genome Reconstruction and Taxonomic Classification from RNA-Seq Data