Evaluation of Software for the Primary Analysis of NGS Data

At the turn of the millennium, the first decoding of the human genome required the efforts of laboratories worldwide and extended over a period of time that lasted more than a decade (Human Genome Project, 1990 – 2003). The second generation sequencers (Next Generation Sequencing, NGS) in contrast, fit on the desktop and get the job done within a few days..


Sequencing by Synthesis

A large part of the high throughput technology used depends on replicating an original section of DNA (deoxyribonucleic acid), which carries the genetic hereditary information in the form of base sequences, base for base (Sequencing by Synthesis). These short DNA sections are initially fixed on micron-sized beads and reproduced in a chemical process. The prepared beads are then placed on the surface of a flow cell, where they are exposed in multiple cycles to modified, fluorescent DNA-bases that attach in specific accumulations.

By using a different fluorescence molecule for each of the four possible nucleotide bases, specific patterns of fluorescence emerge that can be recorded by photo-optic sensors using special color filters. The recorded images are then assessed by the software to determine every base attached to each bead, together with a quality measure.


From Fluorescence Images to DNA Sequences

QIAGEN is a globally operating supplier of molecular-biologic testing technologies with its headquarters in Hilden, a city near Düsseldorf. QIAGEN’s recently developed sequencer (GeneReader NGS System) works on the principle of “Sequencing by Synthesis.” We, in a joint project with QIAGEN, view and evaluate a segment of the software developed at QIAGEN for the primary analysis of the fluorescent imaging data to include the recognition of the base sequences and quality measures. We also submit specific recommendations for improvement for possible incorporation in a future version of the product.

The software must manage several tasks:

  • shifts among the various fluorescent images (the flow cells must be mechanically moved in each cycle)
  • unequal illumination of the images caused by the photo sensor optics
  • different optical properties of the four fluorescence molecules used and the color filters
  • crosstalk of the color channels and adjacent beads
  • degeneration of the fluorescent signal caused by an increase in autofluorescence and faulty incorporation of the modified bases (Lead/Lag Effect)

All effects always include a stochastic component. We contribute to the project through its competence in the area of complex stochastic modeling and algorithms.