Processing and analysis of complex nucleic acid sequence data
First Claim
Patent Images
1. A method of analyzing genomic DNA of an organism to produce a phased sequence corresponding to at least a portion of a genome of the organism, the method comprising:
- providing a plurality of aliquots of the genomic DNA of the organism;
tagging fragments of genomic DNA in each aliquot with a corresponding aliquot-specific tag sequence;
sequencing the tagged fragments of genomic DNA to obtain a plurality of reads;
receiving, at one or more computing devices, the plurality of reads corresponding to fragments of genomic DNA from the plurality of aliquots, each read comprising a sequence from a fragment of genomic DNA and an aliquot-specific tag sequence, wherein each aliquot contains less than a haploid genome equivalent of genomic DNA;
determining, with the one or more computing devices, the aliquots from which the plurality of reads originate by identifying the aliquot-specific tag sequences;
producing, with the one or more computing devices, the phased sequence from the reads by;
identifying a plurality of heterozygous loci corresponding to at least a portion of the genome of the organism based on numbers of reads having different alleles at each of the plurality of heterozygous loci; and
phasing the plurality of heterozygous loci to produce a first haplotype and a second haplotype, the phasing using the aliquots of origin for reads mapping to the plurality of heterozygous loci to determine which alleles at the heterozygous loci are on a same haplotype, wherein reads at different ones of the plurality of heterozygous loci and having the same aliquot of origin are determined to be from the same haplotype, the phased sequence corresponding to the first haplotype and the second haplotype of the at least a portion of the genome of the organism.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention is directed to logic for analysis of nucleic acid sequence data that employs algorithms that lead to a substantial improvement in sequence accuracy and that can be used to phase sequence variations, e.g., in connection with the use of the long fragment read (LFR) process.
177 Citations
39 Claims
-
1. A method of analyzing genomic DNA of an organism to produce a phased sequence corresponding to at least a portion of a genome of the organism, the method comprising:
-
providing a plurality of aliquots of the genomic DNA of the organism; tagging fragments of genomic DNA in each aliquot with a corresponding aliquot-specific tag sequence; sequencing the tagged fragments of genomic DNA to obtain a plurality of reads; receiving, at one or more computing devices, the plurality of reads corresponding to fragments of genomic DNA from the plurality of aliquots, each read comprising a sequence from a fragment of genomic DNA and an aliquot-specific tag sequence, wherein each aliquot contains less than a haploid genome equivalent of genomic DNA; determining, with the one or more computing devices, the aliquots from which the plurality of reads originate by identifying the aliquot-specific tag sequences; producing, with the one or more computing devices, the phased sequence from the reads by; identifying a plurality of heterozygous loci corresponding to at least a portion of the genome of the organism based on numbers of reads having different alleles at each of the plurality of heterozygous loci; and phasing the plurality of heterozygous loci to produce a first haplotype and a second haplotype, the phasing using the aliquots of origin for reads mapping to the plurality of heterozygous loci to determine which alleles at the heterozygous loci are on a same haplotype, wherein reads at different ones of the plurality of heterozygous loci and having the same aliquot of origin are determined to be from the same haplotype, the phased sequence corresponding to the first haplotype and the second haplotype of the at least a portion of the genome of the organism. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 29, 30, 31, 32, 33, 34)
-
-
26. A computer-readable non-transitory storage medium storing instructions which, when executed by one or more computing devices, cause the one or more computing devices to analyzing genomic DNA of an organism to produce a phased sequence corresponding to at least a portion of a genome of the organism, the instructions comprising:
receiving a plurality of reads corresponding to fragments of genomic DNA from a plurality of aliquots, each fragment of genomic DNA being tagged with an aliquot-specific tag sequence, and each read comprising sequence from a fragment of genomic DNA and an aliquot-specific tag sequence, wherein each aliquot contains less than a haploid genome equivalent of genomic DNA, wherein the plurality of reads are obtained by; providing the plurality of aliquots of the genomic DNA of the organism; tagging the fragments of genomic DNA in each aliquot with the corresponding aliquot-specific tag sequence; and sequencing the tagged fragments of genomic DNA to obtain a plurality of reads; determining the aliquots from which the plurality of reads originate by identifying the aliquot-specific tag sequences; producing the phased sequence from the reads by; identifying a plurality of heterozygous loci corresponding to at least a portion of the genome of the organism based on numbers of reads having different alleles at each of the plurality of heterozygous loci; and phasing the plurality of heterozygous loci to produce a first haplotype and a second haplotype, the phasing using the aliquots of origin for reads mapping to the plurality of heterozygous loci to determine which alleles at the heterozygous loci are on a same haplotype, wherein reads at different ones of the plurality of heterozygous loci and having the same aliquot of origin are determined to be from the same haplotype, the phased sequence corresponding to the first haplotype and the second haplotype of the at least a portion of the genome of the organism. - View Dependent Claims (35, 36, 37, 38, 39)
Specification