×

Phasing of heterozygous loci to determine genomic haplotypes

  • US 9,679,103 B2
  • Filed: 08/22/2012
  • Issued: 06/13/2017
  • Est. Priority Date: 08/25/2011
  • Status: Active Grant
First Claim
Patent Images

1. A method of determining at least part of a genome of an organism from one or more samples, the one or more samples including nucleic acid molecules of the organism, the method comprising:

  • providing a plurality of aliquots, each aliquot comprising nucleic acid molecules of the genome that have barcodes to track from which aliquot a nucleic acid molecule originates, wherein each of the plurality of aliquots includes less than a genomic equivalent of the genome of the organism;

    sequencing a plurality of nucleic acid molecules in the plurality of aliquots to obtain sequence reads of the plurality of nucleic acid molecules and the barcodes;

    receiving, at a computer system, sequencing information from the sequencing of the plurality of the nucleic acid molecules in the one or more samples, wherein the sequencing information includes the barcodes for tracking from which aliquot a sequence read originates;

    for each of the plurality of nucleic acid molecules;

    mapping, by the computer system, at least one sequence read of the nucleic acid molecule to a reference genome;

    identifying, by the computer system, a plurality of candidate hets of a first chromosome, a candidate het being a locus identified as having or potentially having two or more different alleles;

    for each of the plurality of candidate hets;

    determining, by the computer system, whether the candidate het connects with each of one or more other candidate hets based on barcodes of sequence reads that map to the candidate het being the same as barcodes of sequence reads that map to the other candidate het, wherein each connection specifies a first heterozygous locus having two alleles that connect respectively with two alleles of a second heterozygous locus;

    for each connection;

    determining, by the computer system, an orientation between the pair of heterozygous loci of the connection, the orientation specifying which allele of the first heterozygous locus is connected as being on a same haplotype with which allele of the second heterozygous locus, wherein the orientation is determined based on barcodes of sequence reads of connected alleles being the same;

    identifying, by the computer system, a first set of at least ten heterozygous loci that are interconnected, the first set of heterozygous loci defining a first region of the first chromosome; and

    calculating two haplotypes of the first region based on the alleles of each heterozygous locus in the first set and the orientations of the connections among the heterozygous loci in the first set.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×