Compositions and methods for identifying nucleic acid molecules

US 10,011,870 B2
Filed: 12/07/2016
Issued: 07/03/2018
Est. Priority Date: 12/07/2016
Status: Active Grant

First Claim

Patent Images

1. A method for sequencing at least a portion of a population of sample nucleic acid molecules, wherein the method comprises:

forming a reaction mixture comprising the population of sample nucleic acid molecules and a set of Molecular Index Tags (MITs), wherein the MITs are nucleic acid molecules, wherein the number of different MITs in the set of MITs is between 10 and 1,000, and wherein a ratio of the total number of sample nucleic acid molecules in the population of sample nucleic acid molecules to the number of different MITs in the set of MITs is at least 1,000;

1;

attaching at least one MIT from the set of MITs to a sample nucleic acid molecule or segment thereof for at least 50% of the sample nucleic acid molecules to form a population of tagged nucleic acid molecules, wherein the at least one MIT is located 5′ and

/or 3′

to the sample nucleic acid molecule or segment thereof on each tagged nucleic acid molecule and wherein the population of tagged nucleic acid molecules comprises at least one copy of each MIT of the set of MITs;

amplifying the population of tagged nucleic acid molecules to create a library of tagged nucleic acid molecules;

and determining the sequences of the attached MITs and at least a portion of the sample nucleic acid molecule or segment thereof of the tagged nucleic acid molecules in the library of tagged nucleic acid molecules.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present disclosure provides methods and compositions for sequencing nucleic acid molecules and identifying individual sample nucleic acid molecules using Molecular Index Tags (MITs). Furthermore, reaction mixtures, kits, and adapter libraries are provided.

Citations

19 Claims

1. A method for sequencing at least a portion of a population of sample nucleic acid molecules, wherein the method comprises:
- forming a reaction mixture comprising the population of sample nucleic acid molecules and a set of Molecular Index Tags (MITs), wherein the MITs are nucleic acid molecules, wherein the number of different MITs in the set of MITs is between 10 and 1,000, and wherein a ratio of the total number of sample nucleic acid molecules in the population of sample nucleic acid molecules to the number of different MITs in the set of MITs is at least 1,000;
  
  1;
  
  attaching at least one MIT from the set of MITs to a sample nucleic acid molecule or segment thereof for at least 50% of the sample nucleic acid molecules to form a population of tagged nucleic acid molecules, wherein the at least one MIT is located 5′ and
  
  /or 3′
  
  to the sample nucleic acid molecule or segment thereof on each tagged nucleic acid molecule and wherein the population of tagged nucleic acid molecules comprises at least one copy of each MIT of the set of MITs;
  
  amplifying the population of tagged nucleic acid molecules to create a library of tagged nucleic acid molecules;
  
  and determining the sequences of the attached MITs and at least a portion of the sample nucleic acid molecule or segment thereof of the tagged nucleic acid molecules in the library of tagged nucleic acid molecules.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, further comprising identifying the individual sample nucleic acid molecules that gave rise to the tagged nucleic acid molecules using the sequences of the at least one MIT on each tagged nucleic acid molecule.
  - 3. The method of claim 2, wherein the method further comprises, before identifying the individual sample nucleic acid molecules, mapping the determined sequence of the sample nucleic acid molecule or segment thereof for a tagged nucleic acid molecule to a location in the genome of the source from which the sample is derived and using the mapped genome location along with the sequence of the at least one MIT to identify the individual sample nucleic acid molecule that gave rise to the tagged nucleic acid molecule.
  - 4. The method of claim 1, wherein two MITs are attached to each sample nucleic acid molecule or segment thereof, wherein the total number of MIT molecules in the reaction mixture is at least two times greater than the total number of sample nucleic acid molecules.
  - 5. The method of claim 1, wherein the MITs are double-stranded nucleic acid molecules.
  - 6. The method of claim 5, wherein each MIT is comprised within a portion of a Y-adapter nucleic acid molecule of a set of Y-adapter nucleic acid molecules, where each Y-adapter of the set comprises a base-paired, double-stranded polynucleotide segment and at least one non-base-paired single-stranded polynucleotide segment, wherein the sequence of each of the Y-adapter nucleic acid molecules in the set, other than the MIT sequence, is identical, and wherein the MIT is a double-stranded sequence that is part of the base-paired, double-stranded polynucleotide segment.
  - 7. The method of claim 6, wherein the double-stranded polynucleotide segment is between 5 and 25 nucleotides in length, not including the MIT, and the single-stranded polynucleotide segment is between 5 and 25 nucleotides in length.
  - 8. The method of claim 1, wherein the MITs are between 4 and 8 nucleotides in length and wherein the sequence of each of the MITs in the set of MITs differs from all other MIT sequences in the set by at least 2 nucleotides.
  - 9. The method of claim 1, wherein the total number of MIT molecules in the reaction mixture is greater than the total number of sample nucleic acid molecules in the reaction mixture, wherein attaching the at least one MIT is performed by a ligation reaction, wherein the method further comprises, before determining the sequences, enriching tagged nucleic acid molecules using hybrid capture, and wherein the method further comprises, after the hybrid capture and before determining the sequence, clonally amplifying the library of tagged nucleic acid molecules onto a solid support or a plurality of solid supports, wherein determining the sequence is performed using high-throughput sequencing.
  - 10. The method of claim 2, wherein the identifying comprises identifying paired MIT-sample nucleic acid families in the library of tagged nucleic acid molecules using the determined sequences, wherein the at least one MIT on each member of a paired MIT-sample nucleic acid family are identical or complementary, wherein the sample nucleic acid molecule or segment thereof of each member of an MIT-sample nucleic acid family maps to the same coordinates on the genome of the source of the population of sample nucleic acid molecules, and wherein each member of a paired MIT-sample nucleic acid family was generated from the same individual sample nucleic acid molecule, thereby identifying amplified nucleic acid molecules that arose from the same individual sample nucleic molecule.
  - 11. The method of claim 1, wherein the population of sample nucleic acid molecules is derived from a mammalian sample and the diversity of combinations of any 2 MITs in the set of MITs exceeds the total number of sample nucleic acid molecules that span each target locus of a plurality of target loci of a genome of a mammal that is the source of the mammalian sample.
  - 12. The method of claim 2, wherein the population of sample nucleic acid molecules is derived from a sample of human blood or a fraction thereof, wherein at least some of the sample nucleic acid molecules comprise at least one target locus of a plurality of target loci from one or more chromosomes or chromosome segments of interest, and wherein the method further comprises:
    - using the identified sample nucleic acid molecules to measure a quantity of DNA for each target locus by counting the number of sample nucleic acid molecules that comprise each target locus;
      
      and determining, on a computer, the number of copies of the one or more chromosomes or chromosome segments of interest using the quantity of DNA at each target locus in the sample nucleic acid molecules.
  - 13. The method of claim 12, wherein the sample comprises 0.5 ml of plasma or less.
  - 14. The method of claim 1, wherein the population of sample nucleic acid molecules is derived from a sample comprising circulating cell-free human DNA, wherein the diversity of combinations of any 2 MITs in the set of MITs exceeds the total number of sample nucleic acid molecules that span each target locus in the human genome, and wherein the total number of MIT molecules in the reaction mixture is at least two times greater than the total number of sample nucleic acid molecules in the reaction mixture.

15. A method for identifying amplification errors from sample preparation for high-throughput sequencing or identifying base-calling errors in a high-throughput sequencing reaction of a population of tagged nucleic acid molecules derived from a sample, wherein the method comprises:
- forming a reaction mixture comprising the population of sample nucleic acid molecules and a set of Molecular Index Tags (MITs), wherein the MITs are double-stranded nucleic acid molecules, wherein the number of different MITs in the set of MITs is between 10 and 1,000, and wherein a ratio of the total number of sample nucleic acid molecules in the population of sample nucleic acid molecules to the diversity of MITs in the set of MITs is greater than 1,000;
  
  1;
  
  attaching at least one MIT from the set of MITs to a sample nucleic acid molecule or segment thereof for a plurality** of sample nucleic acid molecules to form a population of tagged nucleic acid molecules wherein the at least one MIT is located 5′ and
  
  /or 3′
  
  to the sample nucleic acid molecule or segment thereof on each tagged nucleic acid molecule and wherein the population of tagged nucleic acid molecules comprises at least one copy of each MIT in the set of MITs;
  
  amplifying the population of tagged nucleic acid molecules to create a library of tagged nucleic acid molecules;
  
  determining, using high-throughput sequencing, the sequences of the attached MITs and at least a portion of the sample nucleic acid molecule or segment thereof of the tagged nucleic acid molecules in the library of tagged nucleic acid molecules, wherein the sequence of the at least one MIT on each tagged nucleic acid molecule identifies the individual sample nucleic acid molecule that gave rise the tagged nucleic acid molecule;
  
  and identifying tagged nucleic acid molecules having amplification errors or base-calling errors by identifying tagged nucleic acid molecules in which the sample nucleic acid molecule or segment thereof has a nucleotide sequence that is found in less than 25% of tagged nucleic acid molecules derived from the same initial sample nucleic acid molecule.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The method of claim 15, wherein the population of sample nucleic acid molecules comprises fragments of genomic DNA that are greater than 50 nucleotides and not more than 500 nucleotides in length, and wherein the number of combinations of any 2 MITs in the set of MITs exceeds the total number of DNA fragments in the population of sample nucleic acid molecules that span a target locus in the genome.
  - 17. The method of claim 15, wherein two MITs are attached to each sample nucleic acid molecule or segment thereof, wherein the total number of MIT molecules in the reaction mixture is at least two times greater than the total number of sample nucleic acid molecules.
  - 18. The method of claim 15, wherein each MIT is comprised within a portion of a Y-adapter nucleic acid molecule of a set of Y-adapter nucleic acid molecules, where each Y-adapter of the set comprises a base-paired, double-stranded polynucleotide segment and at least one non-base-paired single-stranded polynucleotide segment, wherein the sequence of each of the Y-adapter nucleic acid molecules in the set, other than the MIT sequence, is identical, and wherein the MIT is a double-stranded sequence that is part of the base-paired, double-stranded polynucleotide segment.
  - 19. The method of claim 18, wherein the double-stranded polynucleotide segment is between 5 and 25 nucleotides in length, not including the MIT, and the single-stranded polynucleotide segment is between 5 and 25 nucleotides in length.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Natera Incorporated (Natera)
Original Assignee
Natera Incorporated (Natera)
Inventors
Zimmermann, Bernhard, Swenerton, Ryan, Rabinowitz, Matthew, Sigurjonsson, Styrmir, Gemelos, George, Ganguly, Apratim, Sethi, Himanshu
Primary Examiner(s)
Woolwine, Samuel C

Application Number

US15/372,279
Publication Number

US 20180155779A1
Time in Patent Office

573 Days
Field of Search

None
US Class Current
CPC Class Codes

C12Q 1/6806   Preparing nucleic acids for...

C12Q 1/6869   Methods for sequencing

C12Q 2525/179   incorporating arbitrary or ...

C12Q 2535/122   Massive parallel sequencing

C12Q 2537/16   Assays for determining copy...

C12Q 2563/179   the label being a nucleic acid

Compositions and methods for identifying nucleic acid molecules

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Compositions and methods for identifying nucleic acid molecules

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links