System and methods for detecting genetic variation
First Claim
Patent Images
1. A method of detecting genetic variation in a subject'"'"'s genome comprising:
- (a) providing a plurality of clusters of polynucleotides, wherein (i) each cluster comprises multiple copies of a nucleic acid duplex attached to a support;
(ii) each duplex in a cluster comprises a first molecule comprising sequences A-B-G′
-D′
-C′
from 5′
to 3′ and
a second molecule comprising sequences C-D-G-B′
-A′
from 5′
to 3′
;
(iii) sequence A′
is complementary to sequence A, sequence B′
is complementary to sequence B, sequence C′
is complementary to sequence C, sequence D′
is complementary to sequence D, and sequence G′
is complementary to sequence G;
(iv) sequence G is a portion of a target polynucleotide sequence from a subject and is different for each of a plurality of clusters;
(v) sequence B′
is located 5′
with respect to sequence G in the corresponding target polynucleotide sequence; and
(vi) each first molecule comprises a barcode sequence;
(b) sequencing sequence G′
by extension of a first primer comprising sequence D to produce an R1 sequence for each cluster;
(c) sequencing sequence B′
by extension of a second primer comprising sequence A to produce R2 sequence for each cluster;
(d) performing a first alignment using a first algorithm to align all R1 sequences to a first reference sequence;
(e) performing a second alignment using a second algorithm to locally align R1 sequences identified in said first alignment as likely to contain an insertion or deletion with respect to the first reference sequence, to produce a single consensus alignment for each insertion or deletion;
(f) performing an R2 alignment by aligning all R2 sequences to a second reference sequence;
(g) transmitting a report identifying sequence variation identified by steps (d) to (f) to a receiver; and
(h) hybridizing a third primer to sequence C′ and
sequencing the barcode sequence by extension of the third primer to produce a barcode sequence for each cluster.
6 Assignments
0 Petitions
Accused Products
Abstract
The invention provides methods, apparatuses, and compositions for high-throughput amplification sequencing of specific target sequences in one or more samples. In some aspects, barcode-tagged polynucleotides are sequenced simultaneously and sample sources are identified on the basis of barcode sequences. In some aspects, sequencing data are used to determine one or more genotypes at one or more loci comprising a causal genetic variant. In some aspects, systems and methods of detecting genetic variation are provided.
62 Citations
103 Claims
-
1. A method of detecting genetic variation in a subject'"'"'s genome comprising:
-
(a) providing a plurality of clusters of polynucleotides, wherein (i) each cluster comprises multiple copies of a nucleic acid duplex attached to a support;
(ii) each duplex in a cluster comprises a first molecule comprising sequences A-B-G′
-D′
-C′
from 5′
to 3′ and
a second molecule comprising sequences C-D-G-B′
-A′
from 5′
to 3′
;
(iii) sequence A′
is complementary to sequence A, sequence B′
is complementary to sequence B, sequence C′
is complementary to sequence C, sequence D′
is complementary to sequence D, and sequence G′
is complementary to sequence G;
(iv) sequence G is a portion of a target polynucleotide sequence from a subject and is different for each of a plurality of clusters;
(v) sequence B′
is located 5′
with respect to sequence G in the corresponding target polynucleotide sequence; and
(vi) each first molecule comprises a barcode sequence;(b) sequencing sequence G′
by extension of a first primer comprising sequence D to produce an R1 sequence for each cluster;(c) sequencing sequence B′
by extension of a second primer comprising sequence A to produce R2 sequence for each cluster;(d) performing a first alignment using a first algorithm to align all R1 sequences to a first reference sequence; (e) performing a second alignment using a second algorithm to locally align R1 sequences identified in said first alignment as likely to contain an insertion or deletion with respect to the first reference sequence, to produce a single consensus alignment for each insertion or deletion; (f) performing an R2 alignment by aligning all R2 sequences to a second reference sequence; (g) transmitting a report identifying sequence variation identified by steps (d) to (f) to a receiver; and (h) hybridizing a third primer to sequence C′ and
sequencing the barcode sequence by extension of the third primer to produce a barcode sequence for each cluster. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A method of detecting genetic variation in a subject'"'"'s genome comprising:
-
(a) providing sequencing data for a plurality of clusters of polynucleotides, wherein (i) each cluster comprised multiple copies of a nucleic acid duplex attached to a support;
(ii) each duplex in a cluster comprised a first molecule comprising sequences A-B-G′
-D′
-C′
from 5′
to 3′ and
a second molecule comprising sequences C-D-G-B′
-A′
from 5′
to 3′
;
(iii) sequence A′
is complementary to sequence A, sequence B′
is complementary to sequence B, sequence C′
is complementary to sequence C, sequence D′
is complementary to sequence D, and sequence G′
is complementary to sequence G;
(iv) sequence G is a portion of a target polynucleotide sequence from a subject and is different for each of a plurality of clusters;
(v) sequence B′
is located 5′
with respect to sequence G in the corresponding target polynucleotide sequence;
(vi) the sequencing data comprise R1 sequences generated by extension of a first primer comprising sequence D;
(vii) the sequencing data comprise R2 sequences generated by extension of a second primer comprising sequence A;
(viii) each first molecule comprises a barcode sequence; and
(ix) the sequencing data comprise a barcode sequence for each cluster generated by extension of a third primer comprising sequence C;(b) performing a first alignment using a first algorithm to align all R1 sequences to a first reference sequence; (c) performing a second alignment using a second algorithm to locally align R1 sequences identified in said first alignment as likely to contain an insertion or deletion with respect to the first reference sequence, to produce a single consensus alignment for each insertion or deletion; (d) performing an R2 alignment by aligning all R2 sequences to a second reference sequence; and (e) transmitting a report identifying sequence variation identified by steps (b) to (d) to a receiver. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52)
-
-
53. A method of detecting genetic variation in a subject'"'"'s genome comprising:
-
(a) providing a plurality of clusters of polynucleotides, wherein (i) each cluster comprises multiple copies of a nucleic acid duplex attached to a support;
(ii) each duplex in a cluster comprises a first molecule comprising sequences A-B-G′
-D′
-C′
from 5′
to 3′ and
a second molecule comprising sequences C-D-G-B′
-A′
from 5′
to 3′
, (iii) sequence A′
is complementary to sequence A, sequence B′
is complementary to sequence B, sequence C′
is complementary to sequence C, sequence D′
is complementary to sequence D, and sequence G′
is complementary to sequence G;
(iv) sequence G is a portion of a target polynucleotide sequence from a subject and is different for each of a plurality of clusters;
(v) sequence B′
is located 5′
with respect to sequence G in the corresponding target polynucleotide sequence; and
(vi) each first molecule comprises a barcode sequence;(b) sequencing sequence G′
by extension of a first primer comprising sequence D to produce an R1 sequence for each cluster;(c) sequencing sequence B′
by extension of a second primer comprising sequence A to produce R2 sequence for each cluster;(d) performing a first alignment using a first algorithm to align all R1 sequences to a first reference sequence; (e) performing a second alignment using a second algorithm to locally align R1 sequences identified in said first alignment as likely to contain an insertion or deletion with respect to the first reference sequence, to produce a single consensus alignment for each insertion or deletion; (f) performing an R2 alignment by aligning all R2 sequences to a second reference sequence; (g) calculating a plurality of probabilities based on the R1 sequences for the subject and including the probabilities in a report identifying sequence variation identified by steps (d) to (f), wherein each probability is a probability of the subject or a subject'"'"'s offspring having or developing a disease or trait; (h) transmitting the report to a receiver; and (i) hybridizing a third primer to sequence C′ and
sequencing the barcode sequence by extension of the third primer to produce a barcode sequence for each cluster. - View Dependent Claims (54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76)
-
-
77. A method of detecting genetic variation in a subject'"'"'s genome comprising:
-
(a) providing a plurality of clusters of polynucleotides, wherein (i) each cluster comprises multiple copies of a nucleic acid duplex attached to a support;
(ii) each duplex in a cluster comprises a first molecule comprising sequences A-B-G′
-D′
-C′
from 5′
to 3′ and
a second molecule comprising sequences C-D-G-B′
-A′
from 5′
to 3′
;
(iii) sequence A′
is complementary to sequence A, sequence B′
is complementary to sequence B, sequence C′
is complementary to sequence C, sequence D′
is complementary to sequence D, and sequence G′
is complementary to sequence G;
(iv) sequence G is a portion of a target polynucleotide sequence from a subject and is different for each of a plurality of clusters;
(v) sequence B′
is located 5′
with respect to sequence G in the corresponding target polynucleotide sequence; and
(vi) each first molecule comprises a barcode sequence;(b) sequencing sequence G′
by extension of a first primer comprising sequence D to produce an R1 sequence for each cluster;(c) sequencing sequence B′
by extension of a second primer comprising sequence A to produce R2 sequence for each cluster;(d) performing a first alignment using a first algorithm to align all R1 sequences to a first reference sequence; (e) performing a second alignment using a second algorithm to locally align R1 sequences identified in said first alignment as likely to contain an insertion or deletion with respect to the first reference sequence, to produce a single consensus alignment for each insertion or deletion; (f) performing an R2 alignment by aligning all R2 sequences to a second reference sequence; (g) calculating a plurality of probabilities based on the R1 sequences for the subject and including the probabilities in a report identifying sequence variation identified by steps (d) to (f), wherein each probability is a probability of the subject or a subject'"'"'s offspring having or developing a disease or trait; (h) transmitting the report to a receiver; (i) hybridizing a third primer to sequence C′ and
sequencing the barcode sequence by extension of the third primer to produce a barcode sequence for each cluster; and(j) grouping sequences from the clusters based on the barcode sequences.
-
-
78. A method of detecting genetic variation in a subject'"'"'s genome comprising:
-
(a) providing a plurality of clusters of polynucleotides, wherein (i) each cluster comprises multiple copies of a nucleic acid duplex attached to a support;
(ii) each duplex in a cluster comprises a first molecule comprising sequences A-B-G′
-D′
-C′
from 5′
to 3′ and
a second molecule comprising sequences C-D-G-B′
-A′
from 5′
to 3′
;
(iii) sequence A′
is complementary to sequence A, sequence B′
is complementary to sequence B, sequence C′
is complementary to sequence C, sequence D′
is complementary to sequence D, and sequence G′
is complementary to sequence G;
(iv) sequence G is a portion of a target polynucleotide sequence from a subject and is different for each of a plurality of clusters;
(v) sequence B′
is located 5′
with respect to sequence G in the corresponding target polynucleotide sequence; and
(vi) each first molecule comprises a barcode sequence;(b) sequencing sequence G′
by extension of a first primer comprising sequence D to produce an R1 sequence for each cluster;(c) sequencing sequence B′
by extension of a second primer comprising sequence A to produce R2 sequence for each cluster;(d) performing a first alignment using a first algorithm to align all R1 sequences to a first reference sequence; (e) performing a second alignment using a second algorithm to locally align R1 sequences identified in said first alignment as likely to contain an insertion or deletion with respect to the first reference sequence, to produce a single consensus alignment for each insertion or deletion; (f) performing an R2 alignment by aligning all R2 sequences to a second reference sequence; (g) calculating a plurality of probabilities based on the R1 sequences for the subject and including the probabilities in a report identifying sequence variation identified by steps (d) to (f), wherein each probability is a probability of the subject or a subject'"'"'s offspring having or developing a disease or trait; (h) transmitting the report to a receiver; (i) hybridizing a third primer to sequence C′ and
sequencing the barcode sequence by extension of the third primer to produce a barcode sequence for each cluster(j) grouping sequences from the clusters based on the barcode sequences; and (k) discarding all but one of a plurality of R1 sequences having the same sequence and alignment within a barcode sequence grouping.
-
-
79. A method of detecting genetic variation in a subject'"'"'s genome comprising:
-
(a) providing sequencing data for a plurality of clusters of polynucleotides, wherein (i) each cluster comprised multiple copies of a nucleic acid duplex attached to a support;
(ii) each duplex in a cluster comprised a first molecule comprising sequences A-B-G′
-D′
-C′
from 5′
to 3′ and
a second molecule comprising sequences C-D-G-B′
-A′
from 5′
to 3′
;
(iii) sequence A′
is complementary to sequence A, sequence B′
is complementary to sequence B, sequence C′
is complementary to sequence C, sequence D′
is complementary to sequence D, and sequence G′
is complementary to sequence G;
(iv) sequence G is a portion of a target polynucleotide sequence from a subject and is different for each of a plurality of clusters;
(v) sequence B′
is located 5′
with respect to sequence G in the corresponding target polynucleotide sequence;
(vi) the sequencing data comprise R1 sequences generated by extension of a first primer comprising sequence D;
(vii) the sequencing data comprise R2 sequences generated by extension of a second primer comprising sequence A, (viii) each first molecule comprises a barcode sequence, and (ix) wherein the sequencing data further comprises a barcode sequence for each cluster generated by extension of a third primer comprising sequence C;(b) performing a first alignment using a first algorithm to align all R1 sequences to a first reference sequence; (c) performing a second alignment using a second algorithm to locally align R1 sequences identified in said first alignment as likely to contain an insertion or deletion with respect to the first reference sequence, to produce a single consensus alignment for each insertion or deletion; (d) performing an R2 alignment by aligning all R2 sequences to a second reference sequence; (e) calculating a plurality of probabilities based on the R1 sequences for the subject and including the probabilities in a report identifying sequence variation identified by steps (b) to (d), wherein each probability is a probability of the subject or a subject'"'"'s offspring having or developing a disease or trait; and (f) transmitting the report to a receiver. - View Dependent Claims (80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101)
-
-
102. A method of detecting genetic variation in a subject'"'"'s genome comprising:
-
(a) providing sequencing data for a plurality of clusters of polynucleotides, wherein (i) each cluster comprised multiple copies of a nucleic acid duplex attached to a support;
(ii) each duplex in a cluster comprised a first molecule comprising sequences A-B-G′
-D′
-C′
from 5′
to 3′ and
a second molecule comprising sequences C-D-G-B′
-A′
from 5′
to 3′
;
(iii) sequence A′
is complementary to sequence A, sequence B′
is complementary to sequence B, sequence C′
is complementary to sequence C, sequence D′
is complementary to sequence D, and sequence G′
is complementary to sequence G;
(iv) sequence G is a portion of a target polynucleotide sequence from a subject and is different for each of a plurality of clusters;
(v) sequence B′
is located 5′
with respect to sequence G in the corresponding target polynucleotide sequence;
(vi) the sequencing data comprise R1 sequences generated by extension of a first primer comprising sequence D;
(vii) the sequencing data comprise R2 sequences generated by extension of a second primer comprising sequence A, (viii) each first molecule comprises a barcode sequence, (ix) wherein the sequencing data further comprises a barcode sequence for each cluster generated by extension of a third primer comprising sequence C; and
(x) grouping sequences from the clusters based on the barcode sequences;(b) performing a first alignment using a first algorithm to align all R1 sequences to a first reference sequence; (c) performing a second alignment using a second algorithm to locally align R1 sequences identified in said first alignment as likely to contain an insertion or deletion with respect to the first reference sequence, to produce a single consensus alignment for each insertion or deletion; (d) performing an R2 alignment by aligning all R2 sequences to a second reference sequence; (e) calculating a plurality of probabilities based on the R1 sequences for the subject and including the probabilities in a report identifying sequence variation identified by steps (b) to (d), wherein each probability is a probability of the subject or a subject'"'"'s offspring having or developing a disease or trait; and (f) transmitting the report to a receiver.
-
-
103. A method of detecting genetic variation in a subject'"'"'s genome comprising:
-
(a) providing sequencing data for a plurality of clusters of polynucleotides, wherein (i) each cluster comprised multiple copies of a nucleic acid duplex attached to a support;
(ii) each duplex in a cluster comprised a first molecule comprising sequences A-B-G′
-D′
-C′
from 5′
to 3′ and
a second molecule comprising sequences C-D-G-B′
-A′
from 5′
to 3′
;
(iii) sequence A′
is complementary to sequence A, sequence B′
is complementary to sequence B, sequence C′
is complementary to sequence C, sequence D′
is complementary to sequence D, and sequence G′
is complementary to sequence G;
(iv) sequence G is a portion of a target polynucleotide sequence from a subject and is different for each of a plurality of clusters;
(v) sequence B′
is located 5′
with respect to sequence G in the corresponding target polynucleotide sequence;
(vi) the sequencing data comprise R1 sequences generated by extension of a first primer comprising sequence D;
(vii) the sequencing data comprise R2 sequences generated by extension of a second primer comprising sequence A, (viii) each first molecule comprises a barcode sequence, (ix) wherein the sequencing data further comprises a barcode sequence for each cluster generated by extension of a third primer comprising sequence C;
(x) grouping sequences from the clusters based on the barcode sequences; and
(xi) discarding all but one of a plurality of R1 sequences having the same sequence and alignment within a barcode sequence grouping;(b) performing a first alignment using a first algorithm to align all R1 sequences to a first reference sequence; (c) performing a second alignment using a second algorithm to locally align R1 sequences identified in said first alignment as likely to contain an insertion or deletion with respect to the first reference sequence, to produce a single consensus alignment for each insertion or deletion; (d) performing an R2 alignment by aligning all R2 sequences to a second reference sequence; (e) calculating a plurality of probabilities based on the R1 sequences for the subject and including the probabilities in a report identifying sequence variation identified by steps (b) to (d), wherein each probability is a probability of the subject or a subject'"'"'s offspring having or developing a disease or trait; and (f) transmitting the report to a receiver.
-
Specification