EFFICIENT GENOMIC READ ALIGNMENT IN AN IN-MEMORY DATABASE
First Claim
1. A computer-based system for processing nucleotide sequence data, which are provided as reads, wherein the system has an interface for importing the nucleotide sequence data from a sequencer machine (M), comprising:
- a platform layer for holding process logic and an in-memory database system (IMDB) for processing nucleotide sequence data, wherein the platform layer comprises;
a worker framework with a plurality of workers, wherein each worker is running on a node of a cluster and wherein the workers are processing in parallel, wherein all results and intermediate results are stored in the in-memory database (IMDB), and with an alignment coordinator, which is adapted to provide the in-memory database system (IMDB) with a modified alignment functionality.
1 Assignment
0 Petitions
Accused Products
Abstract
A high performance, low-cost, gapped read alignment algorithm is disclosed that produces high quality alignments of a complete human genome in a few minutes. Additionally, the algorithm is more than an order of magnitude faster than previous approaches using a low-cost workstation. The results are obtained via careful algorithm engineering of the seeding based approach. The use of non-hashed seeds in combination with techniques from search engine ranking achieves fast cache-efficient processing. The algorithm can also be efficiently parallelized. Integration into an in-memory database infrastructure (IMDB) leads to low overhead for data management and further analysis.
70 Citations
16 Claims
-
1. A computer-based system for processing nucleotide sequence data, which are provided as reads, wherein the system has an interface for importing the nucleotide sequence data from a sequencer machine (M), comprising:
a platform layer for holding process logic and an in-memory database system (IMDB) for processing nucleotide sequence data, wherein the platform layer comprises; a worker framework with a plurality of workers, wherein each worker is running on a node of a cluster and wherein the workers are processing in parallel, wherein all results and intermediate results are stored in the in-memory database (IMDB), and with an alignment coordinator, which is adapted to provide the in-memory database system (IMDB) with a modified alignment functionality. - View Dependent Claims (2, 3, 4)
-
5. A computer-implemented method for processing human or non-human nucleotide sequence data with an in-memory database (IMDB), the method comprising:
-
providing a cluster with a set of computing nodes with multiple CPU cores, each implementing a worker for parallel data processing, providing nucleotide sequence data as reads in the in-memory database (IMDB) and performing data processing concurrently to sequencing, wherein the data processing comprises; aligning chunks of the read in parallel on the set of computing nodes and aggregating partial aligning results (AR) to an alignment result to be stored in the in-memory database (IMDB). - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
Specification