Methods, software and apparati for identifying genomic regions harboring a gene associated with a detectable trait
First Claim
1. A method of confirming that a candidate genomic region harbors a gene associated with a detectable trait comprising the steps of:
- constructing a candidate region distribution of test values using a plurality of first biallelic markers in a candidate genomic region suspected of harboring said gene associated with said detectable trait, said candidate region distribution of test values being indicative of the difference in the frequencies of said plurality of first biallelic markers in said candidate region in individuals who possess said detectable trait and control individuals who do not possess said detectable trait;
constructing a random region distribution of test values using a plurality of second biallelic markers in random genomic regions which are not suspected of harboring said gene associated with said detectable trait, said random region distribution of test values being indicative of the difference in the frequencies of said plurality of second biallelic markers in said random genomic regions in individuals who possess said detectable trait and control individuals who do not possess said detectable trait; and
determining whether said candidate region distribution of test values and said random region distribution of test values are significantly different from one another, wherein a significant difference indicates that said candidate genomic region harbors a gene associated with said detectable trait.
4 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to methods, software, and apparati for determining whether a genomic region harbors a gene associated with a detectable trait. In one embodiment, the present invention relates to a method of confirming that a genomic region harbors a gene associated with a detectable trait comprising the steps of identifying a candidate genomic region suspected of harboring the gene associated with the detectable trait, constructing a trait-associated distribution of association values using the biallelic markers in the candidate genomic region, identifying a plurality of biallelic markers in random genomic regions which are not suspected of harboring the gene associated with the detectable trait, constructing a random distribution of association values using the biallelic markers in the random genomic regions, comparing the trait-associated distribution of association values to the random distribution of association values, and determining whether the trait-associated distribution of association values and the random distribution of association values are significantly different from one another. In other embodiments, the present invention comprises software for performing the above method and devices comprising the software in a retrievable form.
-
Citations
46 Claims
-
1. A method of confirming that a candidate genomic region harbors a gene associated with a detectable trait comprising the steps of:
-
constructing a candidate region distribution of test values using a plurality of first biallelic markers in a candidate genomic region suspected of harboring said gene associated with said detectable trait, said candidate region distribution of test values being indicative of the difference in the frequencies of said plurality of first biallelic markers in said candidate region in individuals who possess said detectable trait and control individuals who do not possess said detectable trait;
constructing a random region distribution of test values using a plurality of second biallelic markers in random genomic regions which are not suspected of harboring said gene associated with said detectable trait, said random region distribution of test values being indicative of the difference in the frequencies of said plurality of second biallelic markers in said random genomic regions in individuals who possess said detectable trait and control individuals who do not possess said detectable trait; and
determining whether said candidate region distribution of test values and said random region distribution of test values are significantly different from one another, wherein a significant difference indicates that said candidate genomic region harbors a gene associated with said detectable trait. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
calculating the frequencies for each combination of biallelic markers in each group in said series of groups of first biallelic markers in said candidate genomic region in individuals expressing said detectable trait;
calculating the frequencies for each combination of biallelic markers in each group in said series of groups of first biallelic markers in said candidate genomic region in individuals who do not express said detectable trait; and
comparing the haplotype frequencies in individuals who express said trait and individuals who do not express said trait by performing a chi-squared analysis to yield said test values.
-
-
4. The method of claim 3, wherein said steps of performing a haplotype analysis on each possible combination of biallelic markers in each group in said series of groups of second biallelic markers in said random genomic regions and calculating said test values for each combination comprises the steps of:
-
calculating the frequencies for each combination of biallelic markers in each group in said series of groups of second biallelic markers in said random genomic regions in individuals expressing said detectable trait;
calculating the frequencies for each combination of biallelic markers in each group in said series of groups of second biallelic markers in said random genomic regions in individuals in individuals who do not express said detectable trait; and
comparing the haplotype frequencies in individuals who express said trait and individuals who do not express said trait by performing a chi-squared analysis to yield said test values.
-
-
5. The method of claim 4, wherein said step of comparing said candidate region distribution of test values to said random region distribution of test values comprises performing a Wilcoxon rank test.
-
6. The method of claim 4, wherein said step of comparing said candidate region distribution of test values to said random region distribution of test values comprises performing a Kolmogorov-Smirnov test.
-
7. The method of claim 4, said step of comparing said candidate region distribution of test values to said random region distribution of test values comprises performing both a Wilcoxon rank test and a Kolmogorov-Smirnov test.
-
8. The method of claim 4, wherein each of said groups of markers in said series of groups of first biallelic markers in said candidate genomic region and each of said groups of biallelic markers in said series of groups of second biallelic markers in said random genomic regions comprises 3 biallelic markers.
-
9. The method of claim 4, wherein each of said groups of biallelic markers in said series of groups of first biallelic markers in said candidate genomic region and each of said groups of biallelic markers in said series of groups of second biallelic markers in said random genomic regions comprises at least 3 biallelic markers.
-
10. The method of claim 4, wherein said biallelic markers in each of said groups in said series of groups of first biallelic markers in said candidate genomic region have an average intermarker distance selected from the group consisting of one marker every 3 kb, one marker every 5 kb, one marker every 10 kb, one marker every 20 kb, and one marker every 30 kb.
-
11. The method of claim 10, wherein said biallelic markers in each of said groups in said series of groups of second biallelic markers in said random genomic regions have an average intermarker distance selected from the group consisting of one marker every 3 kb, one marker every 5 kb, one marker every 10 kb, one marker every 20 kb, and one marker every 30 kb.
-
12. The method of claim 4 further comprising selecting random genomic regions for use in said haplotype analysis which have at least 3 biallelic markers therein.
-
13. The method of claim 12, further comprising selecting random genomic regions for use in said haplotype analysis in which said second biallelic markers have an average intermarker distance sufficient for conducting a haplotype analysis.
-
14. The method of claim 13 further comprising selecting random genomic regions for use in said haplotype analysis wherein said at least 3 biallelic markers are in Hardy-Weinberg equilibrium in individuals expressing said detectable trait and control individuals who do not express said detectable trait.
-
15. The method of claim 14 further comprising selecting random genomic regions for use in said haplotype analysis in which said at least 3 biallelic markers are not in complete linkage disequilibrium to be useful in conducting a haplotype analysis.
-
16. The method of claim 3 further comprising selecting biallelic markers in said candidate genomic region which are in Hardy-Weinberg equilibrium in individuals expressing said detectable trait and control individuals who do not express said detectable trait for use in said haplotype analysis.
-
17. The method of claim 16 further comprising determining the total number of markers in said candidate genomic region.
-
18. The method of claim 4 further comprising the step of verifying that the second biallelic markers in said random genomic regions are appropriate for use in the haplotype analysis by:
-
randomly dividing said second biallelic markers in said random genomic regions into a first verification group and a second verification group, wherein said first verification group and said second verification group contain a substantially identical number of biallelic markers;
constructing a first verification distribution of test values for the biallelic markers in said first verification group by performing a haplotype analysis on each possible combination of biallelic markers in each group in a series of groups of biallelic markers in said first verification group, calculating test values for each possible combination, and including the test value for the haplotype which has the greatest association with said trait in said first verification distribution of test values for each group in said series of groups of biallelic markers in said first verification group;
constructing a second verification distribution of test values for the biallelic markers in said second verification group by performing a haplotype analysis on each possible combination of biallelic markers in each group in a series of groups of biallelic markers in said second verification group, calculating test values for each possible combination, and including the test value for the haplotype which has the greatest association with said trait in said second verification distribution of test values for each group in said series of groups of biallelic markers in said second verification group; and
determining whether said first verification distribution and said second verification distribution are significantly different from one another, wherein said second biallelic markers in said random genomic regions are appropriate for use in the haplotype analysis if said first verification distribution and said second verification distribution are not significantly different from one another.
-
-
19. The method of claim 18 wherein said steps of performing a haplotype analysis on each possible combination of biallelic markers in each group in said series of groups of biallelic markers in said first and second verification groups and calculating said test values for each combination comprises the steps of:
-
calculating the frequencies for each combination of biallelic markers in said first verification group in each group in said series of groups of biallelic markers in individuals expressing said detectable trait;
calculating the frequencies for each combination of biallelic markers in said first verification group in each group in said series of groups of biallelic markers in individuals who do not express said detectable trait;
comparing the haplotype frequencies of said biallelic markers in said first verification group in individuals who express said trait and individuals who do not express said trait by performing a chi-squared analysis to yield said test values;
calculating the frequencies for each combination of biallelic markers in said second verification group in each group in said series of groups of biallelic markers in individuals expressing said detectable trait;
calculating the frequencies for each combination of biallelic markers in said second verification group in each group in said series of groups of biallelic markers in individuals who do not express said detectable trait;
comparing the haplotype frequencies of said biallelic markers in said second verification group in individuals who express said trait and individuals who do not express said trait by performing a chi-squared analysis to yield said test values.
-
-
20. The method of claim 19, wherein said step of determining whether said first verification distribution and said second verification distribution are significantly different from one another comprises performing a Wilcoxon rank test on said first and second verification distributions.
-
21. The method of claim 19, wherein said step of determining whether said first verification distribution and said second verification distribution are significantly different from one another comprises performing a Kolmogorov-Smirnov test on said first and second verification distributions.
-
22. The method of claim 19, wherein said step of determining whether said first verification distribution and said second verification distribution are significantly different from one another comprises performing a both a Kolmogorov-Smirnov test and a Wilcoxon rank test on said first and second verification distributions.
-
23. The method of claim 19, wherein each of said groups of biallelic markers in said series of groups of biallelic markers in said first verification group and each of said groups of biallelic markers in said series of groups of biallelic markers in said second verification group contains 3 biallelic markers.
-
24. The method of claim 19, wherein each of said groups of biallelic markers in said series of groups of biallelic markers in said first verification group and each of said groups of biallelic markers in said series of groups of biallelic markers in said second verification group contains more than 3 biallelic markers.
-
25. The method of claim 1, wherein said method is performed by a computer.
-
26. The method of claim 25, wherein said computer provides an output indicative of whether said candidate region distribution of test values and said random region distribution of test values are significantly different.
-
27. The method of claim 26 further comprising further evaluating said candidate genomic region to identify candidate genes which might be associated with said detectable trait if said output indicates that said candidate region distribution of test values and said random region distribution of test values are significantly different.
-
28. The method of claim 1 further comprising further evaluating said candidate genomic region to identify candidate genes which might be associated with said detectable trait if said candidate region distribution of test values and said random region distribution of test values are significantly different.
-
29. The method of claim 4 wherein the frequencies for each combination of biallelic markers in each group in said series of groups of first biallelic markers in said candidate genomic region and the frequencies for each combination of biallelic markers in each group in said series of groups of second biallelic markers in said random genomic regions in individuals expressing said detectable trait are calculated using the Expectation Maximization algorithm;
- and
the frequencies for each combination of biallelic markers in each group in said series of groups of first biallelic markers in said candidate genomic region and the frequencies for each combination of second biallelic markers in each group in said series of groups of second biallelic markers in said random genomic regions in individuals who do not express said detectable trait are calculated using the Expectation Maximization algorithm.
- and
-
30. A method of determining whether a candidate genomic region harbors a gene associated with a detectable trait comprising determining whether the association of a plurality of biallelic markers located in said candidate genomic region with said detectable trait is significantly different than the association of a plurality of biallelic markers located in a plurality of random genomic regions, wherein the determination of whether the association of said plurality of biallelic markers located in said candidate genomic region with said detectable trait is significantly different than the association of said plurality of biallelic markers located in a plurality of random genomic regions comprises:
-
constructing a candidate region distribution of test values using said biallelic markers in said candidate genomic region, said candidate region distribution of test values being indicative of the difference in the haplotype frequencies of said biallelic markers in said candidate region in individuals who possess said detectable trait and control individuals who do not possess said detectable trait;
constructing a random region distribution of test values using said biallelic markers in said genomic region said random region distribution of test values being indicative of the difference in the haplotype frequencies of said biallelic markers in said random genomic regions in individuals who possess said detectable trait and control individuals who do not possess said detectable trait; and
comparing said candidate region distribution of test values with said random region distribution of test values. - View Dependent Claims (31)
-
-
32. A computer system for confirming that a candidate genomic region harbors a gene associated with a detectable trait, wherein the computer system comprises instructions that when executed perform the method of:
-
constructing a candidate region distribution of test values using a plurality of first biallelic markers in a candidate genomic region suspected of harboring said gene associated with said detectable trait, said candidate region distribution of test values being indicative of the difference in the frequencies of said plurality of first biallelic markers in said candidate region in individuals who possess said detectable trait and control individuals who do not possess said detectable trait;
constructing a random region distribution of test values using a plurality of second biallelic markers in random genomic regions, said random region distribution of test values being indicative of the difference in the frequencies of said plurality of second biallelic markers in said random genomic regions in individuals who possess said detectable trait and control individuals who do not possess said detectable trait; and
determining whether said candidate region distribution of test values and said random region distribution of test values are significantly different from one another, wherein a significant difference indicates that said candidate genomic region harbors a gene associated with said detectable trait. - View Dependent Claims (33, 34, 35, 36, 37, 38)
calculating the frequencies for each combination of biallelic markers in each group in said series of groups of first biallelic markers in said candidate genomic region in individuals expressing said detectable trait;
calculating the frequencies for each combination of biallelic markers in each group in said series of groups of second biallelic markers in said candidate genomic region in individuals who do not express said detectable trait; and
comparing the haplotype frequencies in individuals who express said trait and individuals who do not express said trait by performing a chi-squared analysis to yield said test values.
-
-
35. The computer system of claim 34, wherein said instructions for performing a haplotype analysis on each possible combination of biallelic markers in each group in said series of groups of second biallelic markers in said random genomic regions and calculating said test values for each combination comprise instructions for:
-
calculating the frequencies for each combination of biallelic markers in each group in said series of groups of second markers in said random genomic regions in individuals expressing said detectable trait;
calculating the frequencies for each combination of biallelic markers in each group in said series of groups of second biallelic markers in said random genomic regions in individuals in individuals who do not express said detectable trait; and
comparing the haplotype frequencies in individuals who express said trait and individuals who do not express said trait by performing a chi-squared analysis to yield said test values.
-
-
36. The computer system of claim 35, wherein said instructions for comparing said candidate region distribution of test values to said random region distribution of test values comprise instructions for performing a Wilcoxon rank test.
-
37. The computer system of claim 35, wherein said instructions for comparing said candidate region distribution of test values to said random region distribution of test values comprise instructions for performing a Kolmogorov-Smirnov test.
-
38. The computer system of claim 35, wherein said instructions for comparing said candidate region distribution of test values to said random region distribution of test values comprise instructions for performing both a Wilcoxon rank test and a Kolmogorov-Smirnov test.
-
39. A programmed storage device comprising instructions that when executed perform the steps of:
-
constructing a candidate region distribution of test values using a plurality of first biallelic markers in a candidate genomic region suspected of harboring said gene associated with said detectable trait, said trait-associated distribution of test values being indicative of the difference in the frequencies of said plurality of first biallelic markers in said candidate region in individuals who possess said detectable trait and control individuals who do not possess said detectable trait;
constructing a random region distribution of test values using a plurality of second biallelic markers in random genomic regions, said random region distribution of test values being indicative of the difference in the frequencies of said plurality of second biallelic markers in said random genomic regions in individuals who possess said detectable trait and control individuals who do not possess said detectable trait; and
determining whether said candidate region distribution of test values and said random region distribution of test values are significantly different from one another, wherein a significant difference indicates that said candidate genomic region harbors a gene associated with said detectable trait. - View Dependent Claims (40, 41, 42, 43, 44, 45, 46)
calculating the frequencies for each combination of biallelic markers in each group in said series of groups of (first) biallelic markers in said candidate genomic region in individuals expressing said detectable trait;
calculating the frequencies for each combination of biallelic markers in each group in said series of groups of first biallelic markers in said candidate genomic region in individuals who do not express said detectable trait; and
comparing the haplotype frequencies in individuals who express said trait and individuals who do not express said trait by performing a chi-squared analysis to yield said test values.
-
-
42. The programmed storage device of claim 41, wherein said instructions for performing a haplotype analysis on each possible combination of biallelic markers in each group in said series of groups of second biallelic markers in said random genomic regions and calculating said test values for each combination comprise instructions for:
-
calculating the frequencies for each combination of biallelic markers in each group in said series of groups of second biallelic markers in said random genomic regions in individuals expressing said detectable trait;
calculating the frequencies for each combination of biallelic markers in each group in said series of groups of second biallelic markers in said random genomic regions in individuals in individuals who do not express said detectable trait; and
comparing the haplotype frequencies in individuals who express said trait and individuals who do not express said trait by performing a chi-squared analysis to yield said test values.
-
-
43. The programmed storage device of claim 42, wherein said instructions for comparing said candidate region distribution of test values to said random region distribution of test values comprise instructions for performing a Wilcoxon rank test.
-
44. The programmed storage device of claim 42, wherein said instructions for comparing said candidate region distribution of test values to said random region distribution of test values comprise instructions for performing a Kolmogorov-Smirnov test.
-
45. The programmed storage device of claim 42, wherein said instructions for comparing said candidate region distribution of test values to said random region distribution of test values comprise instructions for performing both a Wilcoxon rank test and a Kolmogorov-Smirnov test.
-
46. The programmed storage device of claim 39, wherein said programmed storage device is selected from the group consisting of a hard disk a floppy disk, Random Access Memory, Read Only Memory and Electrically Eraseable Programable Read Only Memory.
Specification