The Pipeline

We have developed an automated pipeline for detecting genetic variants from High-throUghput GEnome SEQuencing, termed HugeSeq. HugeSeq is a fully integrated system for genome analysis from mapping reads to the identification and annotation of all types of variants: SNPS, Indels and SVs. The complete variant detection and characterization workflow of the HugeSeq pipeline is depicted below. The pipeline consists of a modular framework on which common algorithms are implemented. It is comprised of three phases: 1) a mapping phase that prepares, aligns and formats reads; 2) a sorting phase that combine and sort reads for parallel processing of variant calling; and 3) a reduction phase that calls and annotates the different variants (SNPs, Indels and SVs). HugeSeq is based on a MapReduce  approach and runs in a parallel computational environment, making it highly efficient and scalable. The detailed characteristics of HugeSeq, the individual algorithms it contains and its performance characteristics are described in our Nature Biotechnology paper. 


Comments