Workflow Managers in Bioinformatics: Nextflow, Snakemake, and CWL Compared

Contributors: James Ashmore, Felix Seifert, and Margriet Middel

Date: July 2025

Excelra has helped teams across research, biotech, and pharma build and maintain bioinformatics pipelines using just about every major workflow manager out there. Along the way, we’ve learned that the best tool really depends on the context, and sometimes on what you’re willing to live with. For this post, we sat down with some of our senior consultants for a Q&A-style deep dive into the workflow manager they know best. They shared what it does well, where it struggles, and how they decide whether it’s the right fit for a given job.

Nextflow

Margriet is a bioinformatics workflow specialist with deep experience in pipeline development and DevOps. As a core developer of Excelra’s internal OP2 platform, built on Nextflow, she brings practical expertise in designing scalable, production-grade workflows for large-scale transcriptomics and multi-omics analysis.

Q: What kind of workflows were you working with before using Nextflow?

A: When I started out in bioinformatics, most workflows I saw were just loosely connected scripts tied together with bash. They worked for small-scale, one-off analyses but weren’t sustainable. These setups were hard to scale, difficult to maintain, and often tightly coupled to the specific environments they were written in. Everything had to be handled manually — data movement, logging, retries — it was fragile and didn’t scale well as projects or datasets grew.

Q: When did you first start working with Nextflow?

A: In 2019, I came across Nextflow — a workflow manager built on Groovy. At that time, it was still in its original form, called DSL-1. The entire pipeline had to be written in a single monolithic script. It supported Docker and Singularity, which was a step forward for portability and reproducibility, but container settings were defined globally. That made it hard to manage pipelines that used multiple tool versions, and maintenance could quickly become a burden.

Q: What changed with DSL-2?

A: DSL-2 was a major step forward. Its biggest improvement was support for modularization — something DSL-1 lacked entirely. With DSL-2, we could split the pipeline into reusable components: separate modules and sub-workflows. We could assign a different container to each step, which made it much easier to manage multiple tools and versions within the same pipeline. This approach also helped keep the Docker images smaller, since each container only had to include the tools needed for that specific step.

This wasn’t just a convenience — it fundamentally changed how we built workflows. DSL-2 made pipelines more maintainable, more testable, and easier to collaborate on. It helped turn pipelines into long-term, scalable solutions rather than throwaway scripts.

Q: Can you give an example of where this modularity really paid off?

A: Absolutely. We were working with a client who needed a small pipeline to process a handful of samples. We developed it using Nextflow DSL-2 and deployed it on a single EC2 instance. Later, the project expanded — more datasets, more analysis steps, more complexity. Because of the modular design we started with, we were able to extend the pipeline quickly and with minimal disruption.

That’s where Nextflow really stands out compared to simpler solutions like bash or even Snakemake. When pipelines need to grow or evolve, Nextflow’s structure holds up well.

Q: How did you handle the increased scale of that project?

A: We integrated AWS Batch into the Nextflow configuration, which allowed us to scale up easily and process large datasets in parallel. What stood out was how minimal the changes to the actual pipeline code were — Nextflow’s cloud backend support took care of most of the complexity. That kind of flexibility made it much easier to adapt to new demands without reengineering the entire workflow.

Q: When would you recommend using Nextflow?

A: I’d recommend Nextflow when you need to combine custom scripts and existing tools into a reproducible, scalable, and maintainable pipeline. If your project involves iterative development, version control, or growing datasets, Nextflow is an excellent fit. It’s especially strong when you’re aiming for long-term maintainability and cloud scalability without sacrificing developer control.

Q: Final verdict and scorecard?

A: Nextflow has changed the way we approach pipeline development. Its modular architecture allows us to build workflows that are scalable, maintainable, and easier to extend over time. That’s been incredibly valuable, both internally for our own products but especially when supporting clients whose needs evolve or scale rapidly.

Score

Summary

Reproducibility

Supports containerization via Docker, Singularity, and Conda, ensuring consistent results.

Portability

Highly portable with configurable setups for diverse compute environments.

Scalability

Optimized for large-scale and high-throughput workflows.

Usability

Command-line interface and configuration system; requires basic scripting knowledge.

Flexibility

Offers extensive customization and seamless integration of tools and workflows.

Community

Backed by a very active nf-core community with the fastest adoption rate and support.

Platform Integration

Compatible with major platforms like Seqera Platform, DNAnexus, Seven Bridges, Terra, and AWS HealthOmics; GUI options enhance accessibility.

Learning Curve

Steep learning curve, particularly for users unfamiliar with Groovy or workflow management systems.

Maintainability

DSL-2 modular design improves maintainability, extensibility, and testability of pipelines.

Connect with Margriet to learn how to leverage Nextflow for your workflows.

Snakemake

Felix is a bioinformatics expert with a PhD in plant molecular biology and a strong track record in multi-omics and pipeline design. With hands-on experience adapting workflow managers to large-scale agricultural datasets, he focuses on building robust, domain-specific solutions that drive research innovation.

Q: When did you first start working with workflow managers, and what prompted that?

A: My first exposure to workflow managers was back in 2015, during my time as freelancer. I noticed a lot of redundancy in my daily tasks. Although bash scripts helped define sequences of commands, they weren’t ideal for reproducibility, error handling or parallelization. That’s when I began looking into workflow managers more seriously.

Q: How did you decide which workflow manager to use?

A: I initially read a blog post that compared Nextflow and Snakemake. The author didn’t conclude which was better, but it piqued my interest. I tried Nextflow first, but I wasn’t comfortable with the Go-like syntax and found the concept of channels difficult to grasp. The documentation also didn’t feel very supportive, and the error messages were often cryptic. That led me to give Snakemake a try.

Q: What was your first impression of Snakemake?

A: Snakemake clicked with me immediately. Coming from a Python and bash-scripting background, its syntax felt intuitive. The documentation was thorough, and the project’s author, Johannes Köster, was actively maintaining and supporting the tool, which was reassuring.

Q: What features of Snakemake stood out to you early on?

A: There were a few things that really impressed me right from the start: First, the way Snakemake handles task chaining felt very intuitive. You can define rules using input and output files, and it just makes sense – especially with wildcards, which let you generalize rules across different datasets. It’s a very natural way to think about building a pipeline.

I also appreciated how easy it was to integrate Docker containers or Conda environments into each rule. That made it straightforward to ensure reproducibility, manage dependencies, and even allocate hardware resources to speed up computation and run tasks in parallel.

But the feature that really won me over was how gracefully Snakemake handles reruns after failures. You don’t have to manually clean up files or comment out parts of the code you already ran. Just fix the problem and re-run – it picks up where it left off. That kind of robustness saves a lot of time and reduces frustration, especially in longer workflows.

Q: Did things change as your pipelines grew more complex?

A: They did. What started off simple and logical became more complicated. For example, some bioinformatics tools don’t let you specify output filenames, so I had to put extra effort into organising directory structures. Using wildcards became tricky too, especially when filenames already contained underscores. And when I started using more advanced features like dynamic outputs, the documentation wasn’t quite as thorough anymore, and the learning curve steepened.

Q: Were there specific limitations that Snakemake couldn’t handle well?

A: I noticed issues with performance and stability when running complex pipelines, such as whole genome assembly and gene annotation across multiple genotypes. The DAG would grow too large and eventually cause Snakemake to crash. In these cases, I had to offload parts of the workflow into custom scripts. More recently, a customer asked me to build a modularized pipeline using Snakemake. While Snakemake does support modularization, the level of flexibility they expected required a lot of custom implementations. Having also used Nextflow at Excelra for other projects, I could clearly see that modular design is an area where Snakemake falls short.

Q: Despite those challenges, do you still use Snakemake?

A: I do. Although it isn’t as perfect as it seemed in the beginning, the overall benefit still outweighs the troubleshooting. I also see Snakemake evolving, adding and changing features, although sometimes the documentation doesn’t keep up with those changes.

Q: Final verdict and scorecard?

A: Snakemake is a mature and performant workflow manager, and I would recommend it for pipelines with moderate complexity – particularly when modularization is not a key requirement.

Score

Summary

Reproducibility

Supports containerization and Conda environments for consistent, reproducible execution.

Portability

Runs across local, HPC, and cloud platforms with minimal adjustments; some minor platform-specific quirks remain.

Scalability

Enables scatter/gather parallelism and resource declarations for high-throughput workflows. Dynamic graph execution can be unstable with many output files.

Usability

Clear, intuitive syntax combining Python and shell styles, facilitating ease of use.

Flexibility

Supports dynamic workflows, though complexity is limited. Rule linking via filenames and wildcard definitions can be restrictive.

Community

Maintained by active developers with excellent beginner documentation. Advanced features are less well documented, and community initiatives are limited.

Platform Integration

Supported by some major platforms like AWS HealthOmics and Galaxy.

Learning Curve

Flat initial learning curve, making it beginner friendly. Advanced usage requires more effort due to limited documentation and need for customization.

Maintainability

Modularization is available but not enforced early, making it less commonly adopted and more difficult to apply retrospectively.

Connect with Felix to learn how to automate your workflows using Snakemake.

Common Workflow Language

James is a computational biologist with over a decade of combined experience in research and bioinformatics consulting. His consulting work focuses on omics data analysis from pre-clinical and clinical trials, including pipeline development for large-scale sequencing studies. He has built workflows for clients using Snakemake, Nextflow, and CWL, and is a core contributor to the nf-core/rnasplice project.

Q: How did CWL first come onto your radar?

A: About five years ago, I was working with a client on the Seven Bridges platform, and CWL was the workflow language in use. At first, I didn’t find it intuitive — in fact, it felt a bit awkward. But something about it stood out. It wasn’t just another tool for running workflows — it was a standard for defining them. That conceptual shift made a big difference in how I approached it.

Q: What was your initial experience like working with CWL?

A: Honestly, it was frustrating at times. Writing CWL by hand was tedious. Everything had to be explicitly defined: tools, inputs, outputs, file types, compute resources — nothing was inferred. I kept wondering why it needed to be this hard. But over time, I came to appreciate that the goal isn’t speed of implementation — it’s correctness. CWL enforces discipline, and that structure leads to workflows that are reproducible, portable, and less brittle over time.

Q: What does CWL do particularly well?

A: Its biggest strength is abstraction — the separation of workflow description from execution. You don’t tell it how to run; you describe what should happen. That means the same CWL workflow can run with minor changes across platforms: I’ve used identical CWL files on DNAnexus, Seven Bridges, and AWS HealthOmics. Portability isn’t just a bonus — it’s built in.

It also leans heavily on containerization — like Docker and Singularity — and that helps ensure consistent environments and reproducibility. I think that’s why it’s found a home in clinical genomics and other compliance-heavy domains.

Q: And where does CWL fall short?

A: The learning curve is steep, and the syntax is… not forgiving. YAML might be human-readable in theory, but CWL’s verbosity makes editing by hand a chore. You end up writing embedded JavaScript expressions for what feel like simple tasks — like specifying the location and directory structure of the output files — and that can feel clunky.

More importantly, CWL has real limitations in dynamic behaviour. Until recently, it didn’t support conditional execution, and it still doesn’t support loops or runtime-generated steps. If you need workflows that adapt based on input — say, generating a variable number of steps from a file list — CWL isn’t the best fit.

Q: Are there cases where CWL feels like the right choice?

A: Yes — especially when you’re operating within a platform that supports it well. Tools like Seven Bridges and DNAnexus offer UIs that abstract away the complexity, letting non-programmers interact with CWL-defined workflows via drag-and-drop interfaces. That’s where CWL shines: as a common foundation that platforms can build on to enable access to running bioinformatics tools and workflows.

Q: What’s the community support and ecosystem like?

A: This is one of the weaker spots. There are solid docs and tutorials, but the community itself feels scattered. Unlike Nextflow’s nf-core, there’s no central, well-curated library of community pipelines. Most CWL workflows live inside individual organizations, so it takes effort to find reusable components or shared best practices.

Q: So, when would you recommend using CWL?

A: If your priorities are long-term maintainability, reproducibility, and audit trails, CWL is worth serious consideration. It’s not the fastest path to a working pipeline, and you’ll have to invest in tooling and training. But for pipelines that need to outlast today’s platforms or funding cycles, it’s a very strong choice.

Q: Final verdict and scorecard?

A: CWL won’t win points for user-friendliness or flexibility, but it earns its place by doing things the right way. It’s strict, yes — but that rigor pays off when you need workflows to be portable, auditable, and future-proof.

Score

Summary

Reproducibility

CWL excels here. Explicit declarations and strong container support make it one of the most reproducible workflow languages available.

Portability

Runs across local, HPC, and cloud platforms with minimal changes. Minor platform-specific quirks still exist.

Scalability

Supports scatter/gather parallelism and resource declarations. Ideal for high-throughput workflows, but not dynamic at runtime.

Usability

Verbose, steep learning curve, and limited flexibility. GUI platforms help, but writing CWL by hand remains tedious.

Flexibility

Lacks dynamic control flow, loops, and runtime branching. Recently added conditionals improve it slightly.

Community

Active maintainers and solid docs, but no central hub or curated pipeline ecosystem like nf-core. Fragmented tooling across organizations.

Platform Integration

Supported by major platforms like DNAnexus, Seven Bridges, Terra, and AWS HealthOmics. GUI interfaces broaden accessibility.

Learning Curve

Documentation is strong, but verbosity and rigid structure make CWL difficult for newcomers, especially without a GUI.

Maintainability

Designed for long-term clarity and correctness. Excellent for teams that prioritize stability and auditability.

Connect with James to explore the capabilities of the Common Workflow Language .

Scorecard summary

Reproducibility

5 – Supports Docker, Singularity, Conda.

5 – Docker and Conda support per rule.

5 – Explicit declarations and container use.

Portability

5 – Configurable for many environments.

4 – Local, HPC, and cloud; minor quirks.

4 – Broad platform support; minor issues.

Scalability

5 – Ideal for large-scale workflows.

4 – Good parallelism; large DAGs may crash.

4 – Strong but not dynamic.

Usability

4 – Requires scripting knowledge.

4 – Intuitive for Python/bash users.

2 – Verbose, complex syntax.

Flexibility

5 – Highly customizable; modular.

4 – Dynamic workflows; wildcards restrictive.

2 – No loops or runtime dynamics.

Community & Ecosystem

4 – Active nf-core, growing tools.

4 – Strong docs, limited community extension.

2 – Fragmented; no centralized pipeline hub.

Platform Integration

5 – Integrates with major platforms and GUIs.

2 – Integrates with some major platforms.

4 – Strong integration with GUIs and platforms.

Learning Curve

3 – Steep if unfamiliar with Groovy.

4 – Friendly for beginners; steeper with complexity.

2 – Steep; mitigated by GUI tools.

Maintainability

5 – Modular DSL-2 improves long-term use.

3 – Modularization harder to retrofit.

5 – Built for auditability and clarity.

Workflow Managers in Bioinformatics: A Practical Q&A

Nextflow

Q: What kind of workflows were you working with before using Nextflow?

Q: When did you first start working with Nextflow?

Q: What changed with DSL-2?

Q: Can you give an example of where this modularity really paid off?

Q: How did you handle the increased scale of that project?

Q: When would you recommend using Nextflow?

Q: Final verdict and scorecard?

Category

Score

Summary

Connect with Margriet to learn how to leverage Nextflow for your workflows.

Snakemake

Q: When did you first start working with workflow managers, and what prompted that?

Q: How did you decide which workflow manager to use?

Q: What was your first impression of Snakemake?

Q: What features of Snakemake stood out to you early on?

Q: Did things change as your pipelines grew more complex?

Q: Were there specific limitations that Snakemake couldn’t handle well?

Q: Despite those challenges, do you still use Snakemake?

Q: Final verdict and scorecard?

Category

Score

Summary

Connect with Felix to learn how to automate your workflows using Snakemake.

Common Workflow Language

Q: How did CWL first come onto your radar?

Q: What was your initial experience like working with CWL?

Q: What does CWL do particularly well?

Q: And where does CWL fall short?

Q: Are there cases where CWL feels like the right choice?

Q: What’s the community support and ecosystem like?

Q: So, when would you recommend using CWL?

Q: Final verdict and scorecard?

Category

Score

Summary

Connect with James to explore the capabilities of the Common Workflow Language .

Scorecard summary

Reproducibility

Portability

Scalability

Usability

Flexibility

Community & Ecosystem

Platform Integration

Learning Curve

Maintainability

Recommended For You

FAIR Data in Biopharma: An Excelra Perspective

AI Agents: Transforming Intelligent Workflows in Life Sciences

Data Readiness for AI in Pharma and Biotech

ABOUT US

USEFUL LINKS

OUR OFFICES

CONTACT US

Please fill the form

GOSTAR™ SAR Databases - Popupbox

What data do you need?

GOSTAR™ Small Molecules

Request for demo - GOSTAR™ Small Molecule

GOSTAR™ TPD

Request for demo - GOSTAR™ TPD

GOSTAR™ Large Molecules

Let's Connect - GOSTAR™ Large Molecules

BioVisualizer

Thank you for showing interest in the BioVisualizer™

Download Whitepaper

Download Whitepaper

Online Pipeline Platform

Online Pipeline Platform (OP2)

jobSeniorConsultantLifeScienceInformatics