Systems and methods for haplotyping
First Claim
Patent Images
1. A method for identifying haplotypes in a genome, the method comprising:
- obtaining a plurality of sequence fragments from a genome of an organism;
transforming, using a processor coupled to a memory subsystem, the sequence fragments into a graph comprising a vertex for each allele of each of a plurality of SNPs found in the plurality of sequence fragments and an edge for each pair of the alleles that are found in one of the fragments;
for each pair of the plurality of SNPs for which alleles are found in one of the fragments, determine a most-supported phase for alleles of that pair of SNPs and remove any edge from the graph representing a less-supported phase for the alleles of that pair of SNPs; and
apply a community detection operation to the largest contiguous component of the graph remaining after the edge removal to assign each vertex of that component to a haplotype,wherein the haplotype covers at least 85% of a chromosome.
12 Assignments
0 Petitions
Accused Products
Abstract
The invention relates to methods for determining a haplotype for an organism by using a system for transforming SNP alleles found in sequence fragments into vertices in a graph with edges connecting vertices for alleles that appear together in a sequence fragment. A community detection operation can be used to infer the haplotype from the graph. The system may produce a report that includes the haplotype of the SNPs found in the genome of that organism.
-
Citations
23 Claims
-
1. A method for identifying haplotypes in a genome, the method comprising:
-
obtaining a plurality of sequence fragments from a genome of an organism; transforming, using a processor coupled to a memory subsystem, the sequence fragments into a graph comprising a vertex for each allele of each of a plurality of SNPs found in the plurality of sequence fragments and an edge for each pair of the alleles that are found in one of the fragments; for each pair of the plurality of SNPs for which alleles are found in one of the fragments, determine a most-supported phase for alleles of that pair of SNPs and remove any edge from the graph representing a less-supported phase for the alleles of that pair of SNPs; and apply a community detection operation to the largest contiguous component of the graph remaining after the edge removal to assign each vertex of that component to a haplotype, wherein the haplotype covers at least 85% of a chromosome. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 19)
-
-
11. A method for identifying haplotypes in a genome, the method comprising:
-
obtaining a plurality of sequence fragments generated by sequencing nucleic acid from a genome of a patient; creating, using a computer system comprising a processor coupled to a memory subsystem, a graph comprising a vertex for each allele of each of a plurality of SNPs found in the plurality of sequence fragments and an edge for each pair of the alleles that are found in one of the fragments, wherein the graph uses pointers to identify a physical location in the memory subsystem where each vertex is stored; determining, for each pair of the plurality of SNPs for which alleles are found in one of the fragments, a best-supported phase for alleles of that pair of SNPs and remove at least one edge from the graph representing a less-supported phase for alleles of that pair of SNPs; finding a maximum likelihood assignment of vertices to one or more blocks wherein the probability of the graph given the assignment is maximized, thereby assigning each allele to a haplotype; and producing a report showing the haplotype for the patient, wherein the haplotype covers at least 85% of a chromosome. - View Dependent Claims (12, 13, 14, 15, 20, 21)
-
-
16. A system for identifying haplotypes in a genome, the system comprising a processor coupled to a memory subsystem, wherein the system is operable to:
-
obtain a plurality of sequence reads generated by sequencing nucleic acid from a genome of a patient; create a graph comprising a vertex for each allele of each of a plurality of SNPs found in the plurality of sequence fragments and an edge for each subset of the alleles that are found in one of the fragments; determine, for each pair of the plurality of SNPs for which alleles are found in one of the fragments, a best-supported phase for alleles of that pair of SNPs and remove at least one edge from the graph representing a less-supported phase for alleles of that pair of SNPs; find an optimal assignment of vertices to one or more blocks by a community detection operation, thereby assigning each allele to a haplotype; and produce a report showing the haplotype for the patient, wherein the haplotype covers at least 85% of a chromosome. - View Dependent Claims (17, 18, 22, 23)
-
Specification