System and method for predicting chromosomal regions that control phenotypic traits
First Claim
1. A method of associating a phenotype with one or more candidate chromosomal regions in a genome of a species using a phenotypic data structure that comprises a difference in said phenotype between different strains of said species, said genome including a plurality of loci, said method comprising:
- establishing a genotypic data structure using a suitably programmed computer, said genotypic data structure corresponding to a locus in said plurality of loci, said genotypic data structure comprising a variation of at least one component of said locus between different strains of said species;
determining a correlation value for said genotypic data structure by a comparison of said phenotypic data structure with said genotypic data structure using a suitably programmed computer;
repeating said establishing and determining steps for each locus in said plurality of loci using a suitably programmed computer, thereby establishing a plurality of genotypic data structures and, for each respective genotypic data structure in the plurality of genotypic data structures, a correlation value;
identifying one or more genotypic data structures in said plurality of genotypic data structures that have correlation values that are higher than the correlation values for all other genotypic data structures in said plurality of genotypic data structures using a suitably programmed computer;
wherein the loci that correspond to said one or more genotypic data structures represent said one or more candidate chromosomal regions that associate with said phenotype and wherein an amount of said genome that is included in each locus in said plurality of loci is predetermined at a time prior to said identifying step;
wherein each respective genotypic data structure in said plurality of genotypic data structures comprises a plurality of elements and each element in each respective genotypic data structure in said plurality of genotypic data structures corresponds to a difference of at least one component of said locus corresponding to the respective genotypic data structure;
wherein, for each element in each respective genotypic data structure in said plurality of genotypic data structures, said different strains of said species are selected from a plurality of strains of said species;
wherein an amount that a variation contributes to said at least one component of a locus corresponding to a genotypic data structure in said plurality of genotypic data structures is a function of a distance said variation is away from a center of the locus; and
wherein a genotypic data structure in said plurality of genotypic data structures comprises a plurality of variations that are distributed about the center of a locus corresponding to the genotypic data structure, and said establishing step further comprises;
fitting a distribution of said plurality of variations about the center of said locus with a probability function; and
weighting each variation by a corresponding value derived from said probability function such that variations further from the center of said locus are downweighted so that they contribute less to said genotypic data structure than variations that are closer to said center of said locus; and
communicating said one or more genotypic data structures to a user, a display, a readily accessible computer memory or other computer on a network.
4 Assignments
0 Petitions
Accused Products
Abstract
A method of associating a phenotype with one or more candidate chromosomal regions in a genome of an organism includes the step of deriving a phenotypic data structure that represents differences in phenotypes between different strains of the organism. Further, a genotypic data structure is established. The genotypic data structure corresponds to a locus selected from a plurality of loci in the genome of the organism. The genotypic data structure represents variations of at least one component of the locus between different strains of the organism. The phenotypic data structure is compared to the genotypic data structure to form a correlation value. The process of establishing a genotypic data structure and comparing it to the phenotypic data structure is repeated for each locus in the plurality of loci, thereby identifying one or more genotypic data structures that form a high correlation value relative to all other compared genotypic data structures. The loci that correspond to the one or more genotypic data structures having a high correlation value represent the one or more candidate chromosomal regions.
-
Citations
12 Claims
-
1. A method of associating a phenotype with one or more candidate chromosomal regions in a genome of a species using a phenotypic data structure that comprises a difference in said phenotype between different strains of said species, said genome including a plurality of loci, said method comprising:
-
establishing a genotypic data structure using a suitably programmed computer, said genotypic data structure corresponding to a locus in said plurality of loci, said genotypic data structure comprising a variation of at least one component of said locus between different strains of said species; determining a correlation value for said genotypic data structure by a comparison of said phenotypic data structure with said genotypic data structure using a suitably programmed computer; repeating said establishing and determining steps for each locus in said plurality of loci using a suitably programmed computer, thereby establishing a plurality of genotypic data structures and, for each respective genotypic data structure in the plurality of genotypic data structures, a correlation value; identifying one or more genotypic data structures in said plurality of genotypic data structures that have correlation values that are higher than the correlation values for all other genotypic data structures in said plurality of genotypic data structures using a suitably programmed computer;
wherein the loci that correspond to said one or more genotypic data structures represent said one or more candidate chromosomal regions that associate with said phenotype and wherein an amount of said genome that is included in each locus in said plurality of loci is predetermined at a time prior to said identifying step;wherein each respective genotypic data structure in said plurality of genotypic data structures comprises a plurality of elements and each element in each respective genotypic data structure in said plurality of genotypic data structures corresponds to a difference of at least one component of said locus corresponding to the respective genotypic data structure;
wherein, for each element in each respective genotypic data structure in said plurality of genotypic data structures, said different strains of said species are selected from a plurality of strains of said species;wherein an amount that a variation contributes to said at least one component of a locus corresponding to a genotypic data structure in said plurality of genotypic data structures is a function of a distance said variation is away from a center of the locus; and wherein a genotypic data structure in said plurality of genotypic data structures comprises a plurality of variations that are distributed about the center of a locus corresponding to the genotypic data structure, and said establishing step further comprises; fitting a distribution of said plurality of variations about the center of said locus with a probability function; and weighting each variation by a corresponding value derived from said probability function such that variations further from the center of said locus are downweighted so that they contribute less to said genotypic data structure than variations that are closer to said center of said locus; and communicating said one or more genotypic data structures to a user, a display, a readily accessible computer memory or other computer on a network. - View Dependent Claims (2)
-
-
3. A method of associating a phenotype with one or more candidate chromosomal regions in a genome of a species using a phenotypic data structure that comprises a difference in said phenotype between different strains of said species, said genome including a plurality of loci, said method comprising:
-
establishing a genotypic data structure using a suitably programmed computer, said genotypic data structure corresponding to a locus in said plurality of loci, said genotypic data structure comprising a variation of at least one component of said locus between different strains of said species; determining a correlation value for said genotypic data structure by a comparison of said phenotypic data structure with said genotypic data structure using a suitably programmed computer; repeating said establishing and determining steps for each locus in said plurality of loci using a suitably programmed computer, thereby establishing a plurality of genotypic data structures and, for each respective genotypic data structure in the plurality of genotypic data structures, determining a correlation value; identifying one or more genotypic data structures in said plurality of genotypic data structures that have correlation values that are higher than the correlation values for all other genotypic data structures in said plurality of genotypic data structures using a suitably programmed computer; wherein the loci that correspond to said one or more genotypic data structures represent said one or more candidate chromosomal regions that associate with said phenotype and wherein an amount of said genome that is included in each locus in said plurality of loci is predetermined at a time prior to said identifying step;
wherein said correlation value is formed in accordance with the expression;where, c(P, GL) is said correlation value; p(i) is a value of the ith element of said phenotypic data structure; g(i) is a value of the ith element of said genotypic data structure; <
P>
is a mean value of all elements in said phenotypic data structure; and<
GL>
is a mean value of all elements in said genotypic data structure; andcommunicating said one or more genotypic data structures to a user, a display, a readily accessible computer memory or other computer on a network.
-
-
4. A method of associating a phenotype with one or more candidate chromosomal regions in a genome of a species using a phenotypic data structure that comprises a difference in said phenotype between different strains of said species, said genome including a plurality of loci, said method comprising:
-
establishing a genotypic data structure using a suitably programmed computer, said genotypic data structure corresponding to a locus in said plurality of loci, said genotypic data structure comprising a variation of at least one component of said locus between different strains of said species; determining a correlation value for said genotypic data structure by a comparison of said phenotypic data structure with said genotypic data structure using a suitably programmed computer; repeating said establishing and determining steps for each locus in said plurality of loci using a suitably programmed computer, thereby establishing a plurality of genotypic data structures and, for each respective genotypic data structure in the plurality of genotypic data structures, a correlation value; identifying one or more genotypic data structures in said plurality of genotypic data structures that have correlation values that are higher than the correlation values for all other genotypic data structures in said plurality of genotypic data structures using a suitably programmed computer; wherein the loci that correspond to said one or more genotypic data structures represent said one or more candidate chromosomal regions that associate with said phenotype and wherein an amount of said genome that is included in each locus in said plurality of loci is predetermined at a time prior to said identifying step;
wherein said correlation value is formed in accordance with the expressionwhere, c(P, GL) is said correlation value; (i) is a value of the ith element of said phenotypic data structure; (i) is a value of the ith element of said genotypic data structure; <
P>
is a mean value of all elements in said phenotypic data structure;<
GL>
is a mean value of all elements in said genotypic data structure; andZ is a function of a number of components in the locus, corresponding to the genotypic data structure, having a variation between different strains of said species; and communicating said one or more genotypic data structures to a user, a display, a readily accessible computer memory or other computer on a network. - View Dependent Claims (5)
-
-
6. A method of associating a phenotype with one or more candidate chromosomal regions in a genome of a species using a phenotypic data structure that comprises a difference in said phenotype between different strains of said species, said genome including a plurality of loci, said method comprising:
-
establishing a genotypic data structure using a suitably programmed computer, said genotypic data structure corresponding to a locus in said plurality of loci, said genotypic data structure comprising a variation of at least one component of said locus between different strains of said species; determining a correlation value for said genotypic data structure base on a comparison of said phenotypic data structure with said genotypic data structure using a suitably programmed computer; repeating said establishing and determining steps for each locus in said plurality of loci using a suitably programmed computer, thereby establishing a plurality of genotypic data structures and, for each respective genotypic data structure in the plurality of genotypic data structures, a correlation value; identifying one or more genotypic data structures in said plurality of genotypic data structures that have correlation values that are higher than the correlation values for all other genotypic data structures in said plurality of genotypic data structures using a suitably programmed computer; wherein the loci that correspond to said one or more genotypic data structures represent said one or more candidate chromosomal regions that associate with said phenotype and wherein an amount of said genome that is included in each locus in said plurality of loci is predetermined at a time prior to said identifying step;
wherein said correlation value is a correlative measure that is computed in accordance with the expression;where, (P, GL) is said correlative measure; (i) is a value of the ith element of said phenotypic data structure; (i) is a value of the ith element of said genotypic data structure; <
P>
is a mean value of all elements in said phenotypic data structure; and<
GL>
is a mean value of all elements in said genotypic data structure; andcommunicating said one or more genotypic data structures to a user, a display, a readily accessible computer memory or other computer on a network.
-
-
7. A computer program product for use in conjunction with a computer system, the computer program product comprising a physical computer readable storage medium and a computer program mechanism embedded therein, the computer program mechanism comprising:
-
a genotypic database for storing variations in genomic sequences of a plurality of strains of a species; a phenotypic data structure, said phenotypic data structure comprising a difference in a phenotype between different strains of said species; and a program module for associating said phenotype with one or more candidate chromosomal regions in a genome of said species, said genome including a plurality of loci, said program module comprising; instructions for establishing a genotypic data structure, said genotypic data structure corresponding to a locus in said plurality of loci, said genotypic data structure comprising a variation of at least one component of said locus between different strains of said species stored in said genotypic database; instructions for determining a correlation value for said genotypic data structure by a comparison of said phenotypic data structure with said genotypic data structure; instructions for repeating said instructions for establishing and said instructions for determining, for each locus in said plurality of loci, thereby establishing a plurality of genotypic data structures and, for each respective genotypic data structure in the plurality of genotypic data structures, determining a correlation value; instructions for identifying one or more genotypic data structures in said plurality of genotypic data structures that have correlation values that are higher than the correlation values for all other genotypic data structures in said plurality of genotypic data structures;
wherein the loci that correspond to said one or more genotypic data structures represent said one or more candidate chromosomal regions that associate with said phenotype and wherein an amount of said genome that is included in each locus in said plurality of loci is predetermined at a time prior to said identifying step;
wherein each said genotypic data structure in said plurality of genotypic data structures comprises a plurality of variations that are distributed about the center of a locus corresponding to the genotypic data structure, and said instructions for establishing further comprise;instructions for fitting a distribution of said plurality of variations about the center of said locus with a probability function; and instructions for weighting each variation by a corresponding value derived from said probability function such that variations further from the center of said locus are downweighted so that they contribute less to said genotypic data structure than loci that are closer to said center of said corresponding locus; and instructions for communicating said one or more genotypic data structures to a user, a display, a readily accessible computer memory or other computer on a network. - View Dependent Claims (8)
-
-
9. A computer program product for use in conjunction with a computer system, the computer program product comprising a physical computer readable storage medium and a computer program mechanism embedded therein, the computer program mechanism comprising:
-
a genotypic database for storing variations in genomic sequences of a plurality of strains of a species; a phenotypic data structure, said phenotypic data structure comprising a difference in a phenotype between different strains of said species; and a program module for associating said phenotype with one or more candidate chromosomal regions in a genome of said species, said genome including a plurality of loci, said program module comprising; instructions for establishing a genotypic data structure, said genotypic data structure corresponding to a locus in said plurality of loci, said genotypic data structure comprising a variation of at least one component of said locus between different strains of said species stored in said genotypic database; instructions for determining a correlation value for said genotypic data structure by a comparison of said phenotypic data structure with said genotypic data structure; instructions for repeating said instructions for establishing and said instructions for determining, for each locus in said plurality of loci, thereby establishing a plurality of genotypic data structures and, for each respective genotypic data structure in the plurality of genotypic data structures, determining a correlation value; instructions for identifying one or more genotypic data structures in said plurality of genotypic data structures that have correlation values that are higher than the correlation values for all other genotypic data structures in said plurality of genotypic data structures;
wherein the loci that correspond to said one or more genotypic data structures represent said one or more candidate chromosomal regions that associate with said phenotype and wherein an amount of said genome that is included in each locus in said plurality of loci is predetermined at a time prior to said identifying step, and wherein said instructions for determining include instructions for forming said correlation value in accordance with the expression;where, c(P, GL) is said correlation value; p(i) is a value of the ith element of said phenotypic data structure; g(i) is a value of the ith element of said genotypic data structure; <
P>
is a mean value of all elements in said phenotypic data structure; and<
GL>
is a mean value of all elements in said genotypic data structure; andinstructions for communicating said one or more genotypic data structures to a user, a display, a readily accessible computer memory or other computer on a network.
-
-
10. A computer program product for use in conjunction with a computer system, the computer program product comprising a physical computer readable storage medium and a computer program mechanism embedded therein, the computer program mechanism comprising:
-
a genotypic database for storing variations in genomic sequences of a plurality of strains of a species; a phenotypic data structure, said phenotypic data structure comprising a difference in a phenotype between different strains of said species; and a program module for associating said phenotype with one or more candidate chromosomal regions in a genome of said species, said genome including a plurality of loci, said program module comprising; instructions for establishing a genotypic data structure, said genotypic data structure corresponding to a locus in said plurality of loci, said genotypic data structure comprising a variation of at least one component of said locus between different strains of said species stored in said genotypic database; instructions for determining a correlation value for said genotypic data structure by a comparison of said phenotypic data structure with said genotypic data structure; instructions for repeating said instructions for establishing and said instructions for determining, for each locus in said plurality of loci, thereby establishing a plurality of genotypic data structures and, for each respective genotypic data structure in the plurality of genotypic data structures, determining a correlation value; instructions for identifying one or more genotypic data structures in said plurality of genotypic data structures that have correlation values that are higher than the correlation values for all other genotypic data structures in said plurality of genotypic data structures;
wherein the loci that correspond to said one or more genotypic data structures represent said one or more candidate chromosomal regions that associate with said phenotype and wherein an amount of said genome that is included in each locus in said plurality of loci is predetermined at a time prior to said identifying step, and wherein said instructions for determining include instructions for forming said correlation value in accordance with the expression;where, c(P, GL) is said correlation value; (i) is a value of the ith element of said phenotypic data structure; (i) is a value of the ith element of said genotypic data structure; <
P>
is a mean value of all elements in said phenotypic data structure;<
GL>
is a mean value of all elements in said genotypic data structure; andZ is a function of a number of components in the locus corresponding to the genotypic structure, having a variation between different strains of said species; and instructions for communicating said one or more genotypic data structures to a user, a display, a readily accessible computer memory or other computer on a network.
-
-
11. A computer program product for use in conjunction with a computer system, the computer program product comprising a physical computer readable storage medium and a computer program mechanism embedded therein, the computer program mechanism comprising:
-
a genotypic database for storing variations in genomic sequences of a plurality of strains of a species; a phenotypic data structure, said phenotypic data structure comprising a difference in a phenotype between different strains of said species; and a program module for associating said phenotype with one or more candidate chromosomal regions in a genome of said species, said genome including a plurality of loci, said program module comprising; instructions for establishing a genotypic data structure, said genotypic data structure corresponding to a locus in said plurality of loci, said genotypic data structure comprising a variation of at least one component of said locus between different strains of said species stored in said genotypic database; instructions for determining a correlation value for said genotypic data structure by a comparison of said phenotypic data structure with said genotypic data structure;
instructions for repeating said instructions for establishing and said instructions for determining, for each locus in said plurality of loci, thereby establishing a plurality of genotypic data structures and, for each respective genotypic data structure in the plurality of genotypic data structures, determining a correlation value;instructions for identifying one or more genotypic data structures in said plurality of genotypic data structures that have correlation values that are higher than the correlation values for all other genotypic data structures in said plurality of genotypic data structures;
wherein the loci that correspond to said one or more genotypic data structures represent said one or more candidate chromosomal regions that associate with said phenotype and wherein an amount of said genome that is included in each locus in said plurality of loci is predetermined at a time prior to said identifying step, and wherein said instructions for determining include instructions for forming said correlation value in accordance with a correlative measure that is computed in accordance with the expression;where, (P, GL) is said correlative measure; (i) is a value of the ith element of said phenotypic data structure; (i) is a value of the ith element of said genotypic data structure; <
P>
is a mean value of all elements in said phenotypic data structure; and<
GL>
is a mean value of all elements in said genotypic data structure; andinstructions for communicating said one or more genotypic data structures to a user, a display, a computer memory or other computer on a network.
-
-
12. A computer system for associating a phenotype with one or more candidate chromosomal regions in a genome of a species, said genome including a plurality of loci, the computer system comprising:
-
a central processing unit; a memory, coupled to the central processing unit, the memory storing; a genotypic database for storing variations in genomic sequences of a plurality of strains of said species; a phenotypic data structure that comprises a difference in said phenotype between different strains of said species; and a program module, said program module comprising; instructions for establishing a genotypic data structure, said genotypic data structure corresponding to a locus in said plurality of loci, said genotypic data structure comprising a variation of at least one component of said locus between different strains of said species stored in said genotypic database; instructions for determining a correlation value for said genotypic data structure by a comparison of said phenotypic data structure with said genotypic data structure; instructions for repeating said instructions for establishing and said instructions for determining, for each locus in said plurality of loci, thereby establishing a plurality of genotypic data structures and, for each respective genotypic data structure in the plurality of genotypic data structures, determining a correlation value; instructions for identifying one or more genotypic data structures in said plurality of genotypic data structures that have correlation values that are higher than the correlation values for all other genotypic data structures in said plurality of genotypic data structures;
wherein the loci that correspond to said one or more genotypic data structures represent said one or more candidate chromosomal regions that associate with said phenotype and wherein an amount of said genome that is included in each locus in said plurality of loci is predetermined at a time prior to said identifying step, and wherein said instructions for determining include instructions for forming said correlation value in accordance with the expression;where, c(P, GL) is said correlation value; p(i) is a value of the ith element of said phenotypic data structure; g(i) is a value of the ith element of said genotypic data structure; <
P>
is a mean value of all elements in said phenotypic data structure; and<
GL>
is a mean value of all elements in said genotypic data structure; andinstructions for communicating said one or more genotypic data structures to a user, a display, a readily accessible computer memory or other computer on a network.
-
Specification