×

Methods and systems for detecting genetic variants

  • US 9,920,366 B2
  • Filed: 09/22/2015
  • Issued: 03/20/2018
  • Est. Priority Date: 12/28/2013
  • Status: Active Grant
First Claim
Patent Images

1. A method for detecting double-stranded deoxyribonucleic acid (DNA) molecules in a biological sample from a subject, comprising:

  • (a) tagging said double-stranded DNA molecules in said biological sample from said subject with a set of duplex tags, wherein said set of duplex tags comprises a plurality of different molecular barcodes, wherein each duplex tag of said set of duplex tags differently tags complementary strands of a double-stranded DNA molecule of said double-stranded DNA molecules in said biological sample to provide tagged strands, and wherein said tagging is performed with at least a 10X excess of duplex tags as compared to said double-stranded DNA molecules, which excess of duplex tags is sufficient to tag at least 20% of said double-stranded DNA molecules in said biological sample from said subject;

    (b) for each genetic locus in a set of one or more genetic loci in a reference genome, selectively enriching said tagged strands for a subset of said tagged strands that maps to said genetic locus, to provide enriched tagged strands;

    (c) sequencing at least a portion of said enriched tagged strands to generate a plurality of raw sequence reads from said biological sample from said subject;

    (d) grouping said plurality of raw sequence reads into a plurality of families, each family comprising raw sequence reads generated from a same parent polynucleotide, which grouping is based on (i) molecular barcodes associated with said parent polynucleotides and (ii) information from beginning and/or end portions of said raw sequences of said parent polynucleotides;

    (e) collapsing said plurality of raw sequence reads grouped into said plurality of families into a plurality of consensus sequence reads, each consensus sequence read of said plurality of consensus sequence reads (i) comprising a plurality of consensus bases for each genetic locus in said set of one or more genetic loci and (ii) being representative of single strands of said double-stranded DNA molecules;

    (f) for each genetic locus in said set of one or more genetic loci, calculating a first quantitative measure of said enriched tagged strands that map to said genetic locus for which complementary strands are detected in said plurality of consensus sequence reads;

    (g) for each genetic locus in said set of one or more genetic loci, calculating a second quantitative measure of said enriched tagged strands that map to said genetic locus for which only one strand among complementary strands is detected in said plurality of consensus sequence reads; and

    (h) for each genetic locus in said set of one or more genetic loci, calculating a third quantitative measure of said enriched tagged strands that map to said genetic locus for which neither complementary strand is detected in said plurality of consensus sequence reads, wherein said third quantitative measure is calculated based at least in part on said first and second quantitative measures, thereby detecting said double-stranded DNA molecules in said biological sample from said subject.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×