EXACT HAPLOTYPE RECONSTRICTION OF F2 POPULATIONS
First Claim
1. A system for reconstructing haplotypes from genotype data, the system comprising:
- a memory;
a processor communicatively to the memory; and
a reconstruction module communicatively coupled to the memory and processor, the reconstruction module configured to perform a method comprising;
accessing a set of progeny genotype data comprising n progenies encoded with m genetic markers, wherein each of the m genetic markers comprises two values, and wherein a combination of each of the m genetic markers associated with a progeny represents a genotype sequence of the progeny;
identifying, based on at least the set of progeny genotype data, a first set of parent haplotypes associated with a first parent of the n progenies and a second set of parent haplotypes associated with a second parent of the n progenies;
determining a total minimum number of observable crossovers in the n progenies; and
constructing, based on the set of progeny genotype data and the first and second sets of parent haplotypes, an agglomerate data structure comprising a collection of sets of haplotype sequences characterizing the n progenies in terms of the first and second sets of parent haplotypes, wherein each set of haplotype sequences comprises a number of observable crossovers equal to the total minimum number of observable crossovers in the n progenies.
1 Assignment
0 Petitions
Accused Products
Abstract
A system for reconstructing haplotypes from genotype data includes a memory, a processor, and a reconstruction module. The reconstruction module is configured to access a set of progeny genotype data including n progenies encoded with m genetic markers. A first set of parent haplotypes associated with a first parent of the n progenies and a second set of parent haplotypes associated with a second parent of the n progenies are identified based on at least the set of progeny genotype data. An agglomerate data structure including a collection of sets of haplotype sequences characterizing the n progenies is constructed based on the set of progeny genotype data and the first and second sets of parent haplotypes. Each set of haplotype sequences includes a number of crossovers equal to a total minimum number of observable crossovers in the n progenies.
-
Citations
20 Claims
-
1. A system for reconstructing haplotypes from genotype data, the system comprising:
-
a memory; a processor communicatively to the memory; and a reconstruction module communicatively coupled to the memory and processor, the reconstruction module configured to perform a method comprising; accessing a set of progeny genotype data comprising n progenies encoded with m genetic markers, wherein each of the m genetic markers comprises two values, and wherein a combination of each of the m genetic markers associated with a progeny represents a genotype sequence of the progeny; identifying, based on at least the set of progeny genotype data, a first set of parent haplotypes associated with a first parent of the n progenies and a second set of parent haplotypes associated with a second parent of the n progenies; determining a total minimum number of observable crossovers in the n progenies; and constructing, based on the set of progeny genotype data and the first and second sets of parent haplotypes, an agglomerate data structure comprising a collection of sets of haplotype sequences characterizing the n progenies in terms of the first and second sets of parent haplotypes, wherein each set of haplotype sequences comprises a number of observable crossovers equal to the total minimum number of observable crossovers in the n progenies. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer program storage product for reconstructing haplotypes from genotype data, the computer program storage product comprising instructions configured to perform a method comprising:
-
accessing a set of progeny genotype data comprising n progenies encoded with m genetic markers, wherein each of the m genetic markers comprises two values, and wherein a combination of each of the m genetic markers associated with a progeny represents a genotype sequence of the progeny; identifying, based on at least the set of progeny genotype data, a first set of parent haplotypes associated with a first parent of the n progenies and a second set of parent haplotypes associated with a second parent of the n progenies; determining a total minimum number of observable crossovers in the n progenies; and constructing, based on the set of progeny genotype data and the first and second sets of parent haplotypes, an agglomerate data structure comprising a collection of sets of haplotype sequences characterizing the n progenies in terms of the first and second sets of parent haplotypes, wherein each set of haplotype sequences comprises a number of observable crossovers equal to the total minimum number of observable crossovers in the n progenies. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification