The Pipeline

We have developed an automated pipeline for detecting genetic variants from High-throUghput GEnome SEQuencing, termed HugeSeq. HugeSeq is a fully integrated system for genome analysis from mapping reads to the identification and annotation of all types of variants: SNPS, Indels and SVs. The complete variant detection and characterization workflow of the HugeSeq pipeline is depicted below. The pipeline consists of a modular framework on which common algorithms are implemented. It is comprised of three phases: 1) a mapping phase that prepares, aligns and formats reads; 2) a sorting phase that combine and sort reads for parallel processing of variant calling; and 3) a reduction phase that calls and annotates the different variants (SNPs, Indels and SVs). HugeSeq is based on a MapReduce  approach and runs in a parallel computational environment, making it highly efficient and scalable. The detailed characteristics of HugeSeq, the individual algorithms it contains and its performance characteristics are described in our Nature Biotechnology paper.