Methods and compositions for long fragment read sequencing
First Claim
Patent Images
1. A method of obtaining sequence information from a genome, said method comprising:
- (a) providing a population of first fragments of said genome;
(b) preparing emulsion droplets of said first fragments, such that each emulsion droplet comprises a subset of said population of first fragments;
(c) fragmenting said first fragments, thereby obtaining a population of second fragments within each emulsion droplet, such that said second fragments are shorter than said first fragments;
(d) combining individual emulsion droplets comprising said second fragments with individual emulsion droplets comprising adaptor tags or adaptor tag combinations, thereby forming fused droplets;
(e) ligating said second fragments with said adaptor tags or adaptor tag combinations within the fused droplets to form tagged fragments;
(f) combining the fused droplets to produce a mixture containing tagged fragments;
(g) obtaining sequence reads from tagged fragments in the mixture;
(h) assembling the sequence reads to produce assembled sequence information for the genome, wherein the assembled sequence information comprises heterozygous loci; and
(i) phasing the heterozygous loci using sequence information from the adaptor tags.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention is directed to methods and compositions for long fragment read sequencing. The present invention encompasses methods and compositions for preparing long fragments of genomic DNA, for processing genomic DNA for long fragment read sequencing methods, as well as software and algorithms for processing and analyzing sequence data.
-
Citations
23 Claims
-
1. A method of obtaining sequence information from a genome, said method comprising:
-
(a) providing a population of first fragments of said genome; (b) preparing emulsion droplets of said first fragments, such that each emulsion droplet comprises a subset of said population of first fragments; (c) fragmenting said first fragments, thereby obtaining a population of second fragments within each emulsion droplet, such that said second fragments are shorter than said first fragments; (d) combining individual emulsion droplets comprising said second fragments with individual emulsion droplets comprising adaptor tags or adaptor tag combinations, thereby forming fused droplets; (e) ligating said second fragments with said adaptor tags or adaptor tag combinations within the fused droplets to form tagged fragments; (f) combining the fused droplets to produce a mixture containing tagged fragments; (g) obtaining sequence reads from tagged fragments in the mixture; (h) assembling the sequence reads to produce assembled sequence information for the genome, wherein the assembled sequence information comprises heterozygous loci; and (i) phasing the heterozygous loci using sequence information from the adaptor tags. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method of obtaining sequence information from a genome, said method comprising:
-
(a) providing a population of first fragments of said genome; (b) preparing emulsion droplets of said first fragments, such that each emulsion droplet comprises a subset of said population of first fragments; (c) fragmenting said first fragments, thereby obtaining a population of second fragments within each emulsion droplet, such that said second fragments are shorter than said first fragments; (d) combining individual emulsion droplets comprising said second fragments with individual emulsion droplets comprising adaptor tags or adaptor tag combinations, thereby forming fused droplets; (e) ligating said second fragments with said adaptor tags or adaptor tag combinations within the fused droplets to form tagged fragments; (f) combining the fused droplets to produce a mixture containing tagged fragments; (g) obtaining sequence reads from tagged fragments in the mixture, (h) combining sequence reads from tagged fragments having the same adaptor tags to produce sequences of longer contiguous regions; and (i) assembling the sequence reads into sequence information for the genome wherein the sequence information comprises said sequences of longer contiguous regions. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
Specification