Methods and systems for storing sequence read data
First Claim
Patent Images
1. A method for storing sequence read data, the method comprising:
- sequencing a nucleic acid from a sample to generate a plurality of sequence reads;
identifying one or more sets of duplicative sequence reads within the plurality of sequence reads;
storing only one sequence read from each of the one or more sets of duplicative sequence in a master read file;
collecting meta information for each of the plurality of sequence reads, appending the meta information into a compressed file, and matching the meta information to a single read in the master read file; and
later retrieving the plurality of sequence reads from the compressed file and the master read file.
8 Assignments
0 Petitions
Accused Products
Abstract
The present invention generally relates to storing sequence read data. The invention can involve obtaining a plurality of sequence reads from a sample, identifying one or more sets of duplicative sequence reads within the plurality of sequence reads, and storing only one of the sequence reads from each set of duplicative sequence reads in a text file using nucleotide characters.
-
Citations
12 Claims
-
1. A method for storing sequence read data, the method comprising:
-
sequencing a nucleic acid from a sample to generate a plurality of sequence reads; identifying one or more sets of duplicative sequence reads within the plurality of sequence reads; storing only one sequence read from each of the one or more sets of duplicative sequence in a master read file; collecting meta information for each of the plurality of sequence reads, appending the meta information into a compressed file, and matching the meta information to a single read in the master read file; and later retrieving the plurality of sequence reads from the compressed file and the master read file.
-
-
2. A method for storing sequence read data, the method comprising:
-
obtaining a plurality of sequence reads from a sample by sequencing a nucleic acid from the sample; identifying one or more sets of duplicative sequence reads within the plurality of sequence reads; storing in a master read file only one sequence read from each of the one or more sets of duplicative sequence reads; and separately retrieving the plurality of sequence reads from the master read file.
-
-
3. A method for using stored sequence read data, the method comprising:
-
using a computer system comprising a memory coupled to a processor for; obtaining a master sequence read file that includes only one sequence read from each of one or more sets of duplicative sequence reads obtained from a sample and a compressed file that includes lines of metadata for the sequence reads obtained from the sample; for each line of metadata in the compressed file, retrieving an associated read from the master sequence read file and appending that line of metadata and the associated read to an output sequence read file, wherein the output sequence read file contains the sequence reads as originally obtained from the sample. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12)
-
Specification