Skip to main content

Pipeline development

In bioinformatics, omics data travels through a pipeline and emerges as insight. The time taken to complete this transformation has a substantial impact on the efficacy and efficiency of your research programs. Our scalable, custom-built pipelines optimize your analysis and accelerate your journey from high-quality data to valuable insight.

Pipelines greater than the sum of their parts

Bioinformatics combines many distinct disciplines into a single unified analysis process. Synergizing biochemistry, data science, mathematics, and cloud computing, bioinformatics is much more than the sum of its parts. But to ensure those constituent elements are optimized and work seamlessly together in the most appropriate sequence, you need an effective pipeline.

We’re true bioinformaticians. Our biologists and chemists work alongside engineers and developers, so we’re right there with you at the intersection of science and technology. Our uniquely interdisciplinary team builds custom, scalable omics pipelines to meet your exact data analysis requirements. And every pipeline we develop is easy to integrate, operate, and interact with.

Reduce costs. Save time. Improve quality.

Whatever your objectives, we develop custom pipelines to help you meet them. We build applets for individual analysis stages or entire workflows, and we can customize existing algorithms or custom-build them from the ground up.

Why choose our pipelines:

  • More economical than developing in-house
  • Scalable to the size and scope of your project
  • Customizable to suit your data type and analysis requirements
  • Easy to integrate with existing infrastructure
  • They can be deployed quickly, with minimal project downtime.
  • Include powerful admin, tracking, and reporting functionality with a clear, intuitive user interface
  • Built on DNAnexus, Seven Bridges, or any platform you choose
  • Deployable on AWS, Azure, or Google Cloud.

Workflow standardization is essential for ensuring replicable and reproducible analyses. But omics research also requires flexibility to meet changing demands and incorporate new technologies.

We meet both demands by building pipelines with modular components. You can choose the most suitable containers, and we’ll connect and organize them to meet your requirements.

Made-to-order omics platforms

You operate multiple concurrent research streams, so you need solutions capable of handling each of your data analysis requirements.

Our scientists, engineers, and biocurators develop unified pipeline platforms to meet all of your objectives:

Genomics and transcriptomics

  • RNA-seq
  • DNA-seq (WES/WGS)
  • scRNA-seq
  • miRNA-seq
  • ChIP-seq
  • ATAC-seq
  • ScRNA-seq
  • Bulk RNA-Seq TCR
  • MeDIP-Seq
  • siRNA Off-Target
  • Metagenomics

Proteomics

  • Data-independent acquisition (DIA) data processing
  • Data-dependant acquisition (DDA) data processing
  • Liquid Chromatography Mass Spectrometry (LC-MS)
  • Matrix-assisted laser desorption/ionization (MALDI

Metabolomics

  • Gas chromatography-mass spectrometry (GC-MS)
  • Lipidomics
  • Metabolite abundances

Powerful pipelines. Proven success

The value of effective pipelines is represented by the accuracy they deliver, the time they save, and the costs they reduce. That’s the value we deliver to our clients. Our pipelines feature:

  • Automated scaling that works for all fastq file sizes without user intervention
  • Automated archiving of the entire run, with files saved to data storage infrastructure to create a clear audit trail
  • Re-run functionality to allow an immediate repeat of analysis using the archived data without requiring changes to the configuration files
  • Automated run notification via email on workflow completion or encountered errors
  • Centralized storage for reference genomes, alleviating the need to store them for individual projects
  • Strategic utilization of computing resources, leading to over 50% cost savings

Our pipeline-building experience spans a wide range of engineering and bioinformatics functions. Our pipelines have been effectively deployed by some of the world’s leading life sciences organizations to meet multiple scientific, managerial, and scalability requirements

Highlights of our successfully delivered projects include:

Containerized app development on DNAnexus

We’re experts at working with RNA-seq data and have produced many pipelines to effectively process its analysis. We know that complexities can vary from client to client, and to meet different objectives, so we containerize the individual elements of the pipeline using DNAnexus. With a containerized approach, the end-user has the flexibility to choose which of the individual applications to run within the pipeline. This can dramatically improve processing times by reducing unnecessary functions where that are superfluous to the objective.

We work with our clients to identify their objectives and provide containerized processes to meet them flexibly. In an RNA-seq data analysis pipeline, for example, we develop containers for each individual process:

  • Data ingestions
  • Reprocessing of raw datasets from a public repository
  • Cleansing, harmonizing, and uploading of clinical and non-clinical data
  • Tracking data movement
  • Centralizing clinical trial data based on collected metadata
  • Monitoring jobs with percentage progress, size, speed, errors, success, and failures
  • User notification configuration

See fig.1 for an example of our RNA-seq data analysis pipelines. Each block represents a container (or applet) that delivers one specific step of the analysis:

 

Fig.2 shows a screenshot of the parameters provided for spliced transcripts alignment to a reference (STAR) mapping on the DNAnexus platform. The user is given the capability to fine-tune the analysis parameters as required, in a simple and intuitive user interface.

 

Containerized wrapper pipelines as R-packages for proteomics data analysis

Proteomics instruments generate huge amounts of data, but there are a limited number of tools available to analyze it. To meet the growing need, we build data analysis pipelines that can capably handle spectral proteomics data as well as MaxQuant and Spectronaut flat files. Fig.3 shows a successfully deployed pipeline built with R packages to ensure maximum flexibility. The pipeline normalizes data and sends it to interaction, PTM, or non-PTM models for further study. The user can then explore the output in downstream analysis with data visualization if required.

Bulk RNA-seq deconvolution methods to obtain cell type proportions from bulk RNA-seq data

Many of our clients require the functionality to explore data biologically in their pipelines. One client, for example, asked us to include the ability to deconvolute bulk RNA-seq data for biological interpretation. We have a diverse team of scientific domain experts, so we were able to explore methods to meet this requirement and develop a validated pipeline solution.

Bulk RNA-seq deconvolution is a common technique used by scientific groups to alleviate the confounding effect on gene expression levels caused by varying proportions of cell types in samples of interest (fig.4). Our pipelines make deconvolution analysis affordable and deployable in large-scale set-ups.

The pipeline development strategy for deconvolution is shown in figure 5. It starts with a survey of publicly available analysis tools. With the most appropriate tools selected, concordant data sets are identified, benchmarked, and best-performing sets are chosen for pipeline development.

Our pipelines produce effective deconvolution analyses, as shown in fig. 6

The output shows:

  • Single-cell RNA-seq and bulk RNA-seq data from human bone marrow
  • Clustered single-cell data is annotated based on the cluster (CD16+, monocytes, CD4+, naïve T-cells, etc.)
  • Deconvoluted pseudobulk mixtures created from the scRNA-seq data
  • The relationship between known and estimated cell type proportions, plotted by MusIC

Ready to get more from data?

Tell us about your objectives. We’ll help get you there.

"*" indicates required fields

This field is for validation purposes and should be left unchanged.