SYSTEMS AND METHODS FOR GROUPING AND COLLAPSING SEQUENCING READS
First Claim
1. A system for determining a nucleotide sequence from nucleotide sequencing reads, comprising:
- a non-transitory memory configured to store executable instructions and a first hash data structure for storing nucleotide sequencing reads in a plurality of bins; and
a hardware processor programmed by the executable instructions to perform a method comprising;
receiving a plurality of first nucleotide sequencing reads;
for each first nucleotide sequencing read;
generating a plurality of first identifier subsequences from a first identifier sequence of the first nucleotide sequencing read;
generating a first signature for the first nucleotide sequencing read by applying hashing to the plurality of first identifier subsequences; and
assigning the first nucleotide sequencing read to at least one first particular bin of the first hash data structure based on the first signature; and
determining a nucleotide sequence for each first particular bin of the first hash data structure with one or more first nucleotide sequencing reads assigned.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed herein are systems and methods for collapsing sequencing reads and identifying similar sequencing reads. In one example, a method includes generating a plurality of first identifier subsequences from a first identifier sequence of each nucleotide sequencing read and generating a first signature for the nucleotide sequencing read by applying hashing to the plurality of first identifier subsequences. The method may include assigning the nucleotide sequencing read to a first particular bin of a first data structure based on the first signature and determining a nucleotide sequence for each first particular bin of the first data structure with one or more nucleotide sequencing reads assigned.
1 Citation
29 Claims
-
1. A system for determining a nucleotide sequence from nucleotide sequencing reads, comprising:
-
a non-transitory memory configured to store executable instructions and a first hash data structure for storing nucleotide sequencing reads in a plurality of bins; and a hardware processor programmed by the executable instructions to perform a method comprising; receiving a plurality of first nucleotide sequencing reads; for each first nucleotide sequencing read; generating a plurality of first identifier subsequences from a first identifier sequence of the first nucleotide sequencing read; generating a first signature for the first nucleotide sequencing read by applying hashing to the plurality of first identifier subsequences; and assigning the first nucleotide sequencing read to at least one first particular bin of the first hash data structure based on the first signature; and determining a nucleotide sequence for each first particular bin of the first hash data structure with one or more first nucleotide sequencing reads assigned. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer-implemented method for determining a nucleotide sequence from nucleotide sequencing reads, comprising:
-
receiving a plurality of first nucleotide sequencing reads; for each first nucleotide sequencing read; generating a plurality of first identifier subsequences from a first identifier sequence of the first nucleotide sequencing read; generating a first signature for the first nucleotide sequencing read by applying hashing to the plurality of first identifier subsequences; and assigning the first nucleotide sequencing read to a first particular bin of a first data structure based on the first signature; and determining a nucleotide sequence for each first particular bin of the first data structure with one or more first nucleotide sequencing reads assigned. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
-
-
24. A system for identifying similar nucleotide sequencing reads, comprising:
-
non-transitory memory configured to store; executable instructions, a first hash data structure and a second hash data structure for storing a plurality of pairs of sequencing reads; and a hardware processor programmed by the executable instructions to perform a method comprising; receiving a pair of a first query nucleotide sequencing read and a second query nucleotide sequencing read; generating a plurality of first query identifier subsequences and a plurality of second query identifier subsequences from the first query nucleotide sequencing read and the second query nucleotide sequencing read, respectively; generating a first query signature and a second query signature for the first nucleotide sequencing read and the second nucleotide sequencing read, respectively, by applying hashing to the plurality of first query identifier subsequences and the plurality of second query identifier subsequences, respectively; retrieving one or more first stored pairs and one or more second stored pairs from the first hash data structure and the second hash data structure using the first query signature and the second query signature, respectively, wherein each of the first pairs and the second pairs comprises a first stored nucleotide sequencing read and a second stored nucleotide sequencing read; and determining each pair of a first stored nucleotide sequencing read and a second stored nucleotide sequencing read present in both the first stored pairs and second stored pairs as a sequencing read 1 and sequencing read 2 similar to the query sequencing read 1 and the query sequencing read 2, respectively. - View Dependent Claims (25, 26, 27)
-
-
28. A method for identifying similar nucleotide sequencing reads, comprising:
-
receiving a first query nucleotide sequencing read; generating a plurality of first query identifier subsequences from the first query nucleotide sequencing read; generating a first query signature for the first nucleotide sequencing read by applying hashing to the plurality of first query identifier subsequences; and retrieving one or more first stored nucleotide sequencing reads from a first hash data structure using the first query signature, wherein each of the first stored nucleotide sequencing reads is similar to the first query nucleotide sequencing read. - View Dependent Claims (29)
-
Specification