System and method for predicting chromosomal regions that control phenotypic traits

US 7,698,117 B2
Filed: 12/11/2001
Issued: 04/13/2010
Est. Priority Date: 12/15/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A method of associating a phenotype with one or more candidate chromosomal regions in a genome of a species using a phenotypic data structure that comprises a difference in said phenotype between different strains of said species, said genome including a plurality of loci, said method comprising:

establishing a genotypic data structure using a suitably programmed computer, said genotypic data structure corresponding to a locus in said plurality of loci, said genotypic data structure comprising a variation of at least one component of said locus between different strains of said species;

determining a correlation value for said genotypic data structure by a comparison of said phenotypic data structure with said genotypic data structure using a suitably programmed computer;

repeating said establishing and determining steps for each locus in said plurality of loci using a suitably programmed computer, thereby establishing a plurality of genotypic data structures and, for each respective genotypic data structure in the plurality of genotypic data structures, a correlation value;

identifying one or more genotypic data structures in said plurality of genotypic data structures that have correlation values that are higher than the correlation values for all other genotypic data structures in said plurality of genotypic data structures using a suitably programmed computer;

wherein the loci that correspond to said one or more genotypic data structures represent said one or more candidate chromosomal regions that associate with said phenotype and wherein an amount of said genome that is included in each locus in said plurality of loci is predetermined at a time prior to said identifying step;

wherein each respective genotypic data structure in said plurality of genotypic data structures comprises a plurality of elements and each element in each respective genotypic data structure in said plurality of genotypic data structures corresponds to a difference of at least one component of said locus corresponding to the respective genotypic data structure;

wherein, for each element in each respective genotypic data structure in said plurality of genotypic data structures, said different strains of said species are selected from a plurality of strains of said species;

wherein an amount that a variation contributes to said at least one component of a locus corresponding to a genotypic data structure in said plurality of genotypic data structures is a function of a distance said variation is away from a center of the locus; and

wherein a genotypic data structure in said plurality of genotypic data structures comprises a plurality of variations that are distributed about the center of a locus corresponding to the genotypic data structure, and said establishing step further comprises;

fitting a distribution of said plurality of variations about the center of said locus with a probability function; and

weighting each variation by a corresponding value derived from said probability function such that variations further from the center of said locus are downweighted so that they contribute less to said genotypic data structure than variations that are closer to said center of said locus; and

communicating said one or more genotypic data structures to a user, a display, a readily accessible computer memory or other computer on a network.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of associating a phenotype with one or more candidate chromosomal regions in a genome of an organism includes the step of deriving a phenotypic data structure that represents differences in phenotypes between different strains of the organism. Further, a genotypic data structure is established. The genotypic data structure corresponds to a locus selected from a plurality of loci in the genome of the organism. The genotypic data structure represents variations of at least one component of the locus between different strains of the organism. The phenotypic data structure is compared to the genotypic data structure to form a correlation value. The process of establishing a genotypic data structure and comparing it to the phenotypic data structure is repeated for each locus in the plurality of loci, thereby identifying one or more genotypic data structures that form a high correlation value relative to all other compared genotypic data structures. The loci that correspond to the one or more genotypic data structures having a high correlation value represent the one or more candidate chromosomal regions.

Citations

12 Claims

1. A method of associating a phenotype with one or more candidate chromosomal regions in a genome of a species using a phenotypic data structure that comprises a difference in said phenotype between different strains of said species, said genome including a plurality of loci, said method comprising:
- establishing a genotypic data structure using a suitably programmed computer, said genotypic data structure corresponding to a locus in said plurality of loci, said genotypic data structure comprising a variation of at least one component of said locus between different strains of said species;
  
  determining a correlation value for said genotypic data structure by a comparison of said phenotypic data structure with said genotypic data structure using a suitably programmed computer;
  
  repeating said establishing and determining steps for each locus in said plurality of loci using a suitably programmed computer, thereby establishing a plurality of genotypic data structures and, for each respective genotypic data structure in the plurality of genotypic data structures, a correlation value;
  
  identifying one or more genotypic data structures in said plurality of genotypic data structures that have correlation values that are higher than the correlation values for all other genotypic data structures in said plurality of genotypic data structures using a suitably programmed computer;
  
  wherein the loci that correspond to said one or more genotypic data structures represent said one or more candidate chromosomal regions that associate with said phenotype and wherein an amount of said genome that is included in each locus in said plurality of loci is predetermined at a time prior to said identifying step;
  
  wherein each respective genotypic data structure in said plurality of genotypic data structures comprises a plurality of elements and each element in each respective genotypic data structure in said plurality of genotypic data structures corresponds to a difference of at least one component of said locus corresponding to the respective genotypic data structure;
  
  wherein, for each element in each respective genotypic data structure in said plurality of genotypic data structures, said different strains of said species are selected from a plurality of strains of said species;
  
  wherein an amount that a variation contributes to said at least one component of a locus corresponding to a genotypic data structure in said plurality of genotypic data structures is a function of a distance said variation is away from a center of the locus; and
  
  wherein a genotypic data structure in said plurality of genotypic data structures comprises a plurality of variations that are distributed about the center of a locus corresponding to the genotypic data structure, and said establishing step further comprises;
  
  fitting a distribution of said plurality of variations about the center of said locus with a probability function; and
  
  weighting each variation by a corresponding value derived from said probability function such that variations further from the center of said locus are downweighted so that they contribute less to said genotypic data structure than variations that are closer to said center of said locus; and
  
  communicating said one or more genotypic data structures to a user, a display, a readily accessible computer memory or other computer on a network.
- View Dependent Claims (2)
- - 2. The method of claim 1 wherein said probability function is a Gaussian probability distribution, a Poisson distribution, or a Lorentzian distribution.

3. A method of associating a phenotype with one or more candidate chromosomal regions in a genome of a species using a phenotypic data structure that comprises a difference in said phenotype between different strains of said species, said genome including a plurality of loci, said method comprising:
- establishing a genotypic data structure using a suitably programmed computer, said genotypic data structure corresponding to a locus in said plurality of loci, said genotypic data structure comprising a variation of at least one component of said locus between different strains of said species;
 
 determining a correlation value for said genotypic data structure by a comparison of said phenotypic data structure with said genotypic data structure using a suitably programmed computer;
 
 repeating said establishing and determining steps for each locus in said plurality of loci using a suitably programmed computer, thereby establishing a plurality of genotypic data structures and, for each respective genotypic data structure in the plurality of genotypic data structures, determining a correlation value;
 
 identifying one or more genotypic data structures in said plurality of genotypic data structures that have correlation values that are higher than the correlation values for all other genotypic data structures in said plurality of genotypic data structures using a suitably programmed computer;
 
 wherein the loci that correspond to said one or more genotypic data structures represent said one or more candidate chromosomal regions that associate with said phenotype and wherein an amount of said genome that is included in each locus in said plurality of loci is predetermined at a time prior to said identifying step;
 
 wherein said correlation value is formed in accordance with the expression;
 
 $c (P, G^{L}) = \frac{\sum^{i} (p (i) - ) (g (i) - < G^{L} >)}{{[\sum^{i} {(p (i) - )}^{2}] [\sum^{i} {(g (i) - < G^{L} >)}^{2}]}^{1 / 2}}$ where,c(P, G^L) is said correlation value;
 
 p(i) is a value of the i^thelement of said phenotypic data structure;
 
 g(i) is a value of the i^thelement of said genotypic data structure;
 
 
 
 is a mean value of all elements in said phenotypic data structure; and
 
 <
 
 G^L>
 
 is a mean value of all elements in said genotypic data structure; and
 
 communicating said one or more genotypic data structures to a user, a display, a readily accessible computer memory or other computer on a network.

4. A method of associating a phenotype with one or more candidate chromosomal regions in a genome of a species using a phenotypic data structure that comprises a difference in said phenotype between different strains of said species, said genome including a plurality of loci, said method comprising:
- establishing a genotypic data structure using a suitably programmed computer, said genotypic data structure corresponding to a locus in said plurality of loci, said genotypic data structure comprising a variation of at least one component of said locus between different strains of said species;
 
 determining a correlation value for said genotypic data structure by a comparison of said phenotypic data structure with said genotypic data structure using a suitably programmed computer;
 
 repeating said establishing and determining steps for each locus in said plurality of loci using a suitably programmed computer, thereby establishing a plurality of genotypic data structures and, for each respective genotypic data structure in the plurality of genotypic data structures, a correlation value;
 
 identifying one or more genotypic data structures in said plurality of genotypic data structures that have correlation values that are higher than the correlation values for all other genotypic data structures in said plurality of genotypic data structures using a suitably programmed computer;
 
 wherein the loci that correspond to said one or more genotypic data structures represent said one or more candidate chromosomal regions that associate with said phenotype and wherein an amount of said genome that is included in each locus in said plurality of loci is predetermined at a time prior to said identifying step;
 
 wherein said correlation value is formed in accordance with the expression $c (P, G^{L}) = \frac{[\sum^{i} (?? (i) - ) (ℊ (i) - < G^{L} >)] \times Z}{{[\sum^{i} {(?? (i) - )}^{2}] [\sum^{i} {(ℊ (i) - < G^{L} >)}^{2}]}^{1 / 2}}$ where,c(P, G^L) is said correlation value;
 
 (i) is a value of the i^thelement of said phenotypic data structure;
 
 (i) is a value of the i^thelement of said genotypic data structure;
 
 
 
 is a mean value of all elements in said phenotypic data structure;
 
 <
 
 G^L>
 
 is a mean value of all elements in said genotypic data structure; and
 
 Z is a function of a number of components in the locus, corresponding to the genotypic data structure, having a variation between different strains of said species; and
 
 communicating said one or more genotypic data structures to a user, a display, a readily accessible computer memory or other computer on a network.
- View Dependent Claims (5)
- - 5. The method of claim 4, wherein said function is selected from the group consisting of taking the square root of Z, squaring Z, raising Z by the power of a positive integer, taking a logarithm of Z, and taking an exponential of Z.

6. A method of associating a phenotype with one or more candidate chromosomal regions in a genome of a species using a phenotypic data structure that comprises a difference in said phenotype between different strains of said species, said genome including a plurality of loci, said method comprising:
- establishing a genotypic data structure using a suitably programmed computer, said genotypic data structure corresponding to a locus in said plurality of loci, said genotypic data structure comprising a variation of at least one component of said locus between different strains of said species;
 
 determining a correlation value for said genotypic data structure base on a comparison of said phenotypic data structure with said genotypic data structure using a suitably programmed computer;
 
 repeating said establishing and determining steps for each locus in said plurality of loci using a suitably programmed computer, thereby establishing a plurality of genotypic data structures and, for each respective genotypic data structure in the plurality of genotypic data structures, a correlation value;
 
 identifying one or more genotypic data structures in said plurality of genotypic data structures that have correlation values that are higher than the correlation values for all other genotypic data structures in said plurality of genotypic data structures using a suitably programmed computer;
 
 wherein the loci that correspond to said one or more genotypic data structures represent said one or more candidate chromosomal regions that associate with said phenotype and wherein an amount of said genome that is included in each locus in said plurality of loci is predetermined at a time prior to said identifying step;
 
 wherein said correlation value is a correlative measure that is computed in accordance with the expression;
 
 $???? (P, G^{L}) = \frac{[\sum^{i} (?? (i) - ) (ℊ (i) - < G^{L} >)]}{{[\sum^{i} {(?? (i) - )}^{2}]}^{1 / 2}}$ where,(P, G^L) is said correlative measure;
 
 (i) is a value of the i^thelement of said phenotypic data structure;
 
 (i) is a value of the i^thelement of said genotypic data structure;
 
 
 
 is a mean value of all elements in said phenotypic data structure; and
 
 <
 
 G^L>
 
 is a mean value of all elements in said genotypic data structure; and
 
 communicating said one or more genotypic data structures to a user, a display, a readily accessible computer memory or other computer on a network.

7. A computer program product for use in conjunction with a computer system, the computer program product comprising a physical computer readable storage medium and a computer program mechanism embedded therein, the computer program mechanism comprising:
- a genotypic database for storing variations in genomic sequences of a plurality of strains of a species;
  
  a phenotypic data structure, said phenotypic data structure comprising a difference in a phenotype between different strains of said species; and
  
  a program module for associating said phenotype with one or more candidate chromosomal regions in a genome of said species, said genome including a plurality of loci, said program module comprising;
  
  instructions for establishing a genotypic data structure, said genotypic data structure corresponding to a locus in said plurality of loci, said genotypic data structure comprising a variation of at least one component of said locus between different strains of said species stored in said genotypic database;
  
  instructions for determining a correlation value for said genotypic data structure by a comparison of said phenotypic data structure with said genotypic data structure;
  
  instructions for repeating said instructions for establishing and said instructions for determining, for each locus in said plurality of loci, thereby establishing a plurality of genotypic data structures and, for each respective genotypic data structure in the plurality of genotypic data structures, determining a correlation value;
  
  instructions for identifying one or more genotypic data structures in said plurality of genotypic data structures that have correlation values that are higher than the correlation values for all other genotypic data structures in said plurality of genotypic data structures;
  
  wherein the loci that correspond to said one or more genotypic data structures represent said one or more candidate chromosomal regions that associate with said phenotype and wherein an amount of said genome that is included in each locus in said plurality of loci is predetermined at a time prior to said identifying step;
  
  wherein each said genotypic data structure in said plurality of genotypic data structures comprises a plurality of variations that are distributed about the center of a locus corresponding to the genotypic data structure, and said instructions for establishing further comprise;
  
  instructions for fitting a distribution of said plurality of variations about the center of said locus with a probability function; and
  
  instructions for weighting each variation by a corresponding value derived from said probability function such that variations further from the center of said locus are downweighted so that they contribute less to said genotypic data structure than loci that are closer to said center of said corresponding locus; and
  
  instructions for communicating said one or more genotypic data structures to a user, a display, a readily accessible computer memory or other computer on a network.
- View Dependent Claims (8)
- - 8. The computer program product of claim 7 wherein said probability function is a Gaussian probability distribution, a Poisson distribution, or a Lorentzian distribution.

9. A computer program product for use in conjunction with a computer system, the computer program product comprising a physical computer readable storage medium and a computer program mechanism embedded therein, the computer program mechanism comprising:
- a genotypic database for storing variations in genomic sequences of a plurality of strains of a species;
 
 a phenotypic data structure, said phenotypic data structure comprising a difference in a phenotype between different strains of said species; and
 
 a program module for associating said phenotype with one or more candidate chromosomal regions in a genome of said species, said genome including a plurality of loci, said program module comprising;
 
 instructions for establishing a genotypic data structure, said genotypic data structure corresponding to a locus in said plurality of loci, said genotypic data structure comprising a variation of at least one component of said locus between different strains of said species stored in said genotypic database;
 
 instructions for determining a correlation value for said genotypic data structure by a comparison of said phenotypic data structure with said genotypic data structure;
 
 instructions for repeating said instructions for establishing and said instructions for determining, for each locus in said plurality of loci, thereby establishing a plurality of genotypic data structures and, for each respective genotypic data structure in the plurality of genotypic data structures, determining a correlation value;
 
 instructions for identifying one or more genotypic data structures in said plurality of genotypic data structures that have correlation values that are higher than the correlation values for all other genotypic data structures in said plurality of genotypic data structures;
 
 wherein the loci that correspond to said one or more genotypic data structures represent said one or more candidate chromosomal regions that associate with said phenotype and wherein an amount of said genome that is included in each locus in said plurality of loci is predetermined at a time prior to said identifying step, and wherein said instructions for determining include instructions for forming said correlation value in accordance with the expression;
 
 $c (P, G^{L}) = \frac{\sum^{i} (p (i) - ) (g (i) - < G^{L} >)}{{[\sum^{i} {(p (i) - )}^{2}] [\sum^{i} {(g (i) - < G^{L} >)}^{2}]}^{1 / 2}}$ where,c(P, G^L) is said correlation value;
 
 p(i) is a value of the i^thelement of said phenotypic data structure;
 
 g(i) is a value of the i^thelement of said genotypic data structure;
 
 
 
 is a mean value of all elements in said phenotypic data structure; and
 
 <
 
 G^L>
 
 is a mean value of all elements in said genotypic data structure; and
 
 instructions for communicating said one or more genotypic data structures to a user, a display, a readily accessible computer memory or other computer on a network.

10. A computer program product for use in conjunction with a computer system, the computer program product comprising a physical computer readable storage medium and a computer program mechanism embedded therein, the computer program mechanism comprising:
- a genotypic database for storing variations in genomic sequences of a plurality of strains of a species;
 
 a phenotypic data structure, said phenotypic data structure comprising a difference in a phenotype between different strains of said species; and
 
 a program module for associating said phenotype with one or more candidate chromosomal regions in a genome of said species, said genome including a plurality of loci, said program module comprising;
 
 instructions for establishing a genotypic data structure, said genotypic data structure corresponding to a locus in said plurality of loci, said genotypic data structure comprising a variation of at least one component of said locus between different strains of said species stored in said genotypic database;
 
 instructions for determining a correlation value for said genotypic data structure by a comparison of said phenotypic data structure with said genotypic data structure;
 
 instructions for repeating said instructions for establishing and said instructions for determining, for each locus in said plurality of loci, thereby establishing a plurality of genotypic data structures and, for each respective genotypic data structure in the plurality of genotypic data structures, determining a correlation value;
 
 instructions for identifying one or more genotypic data structures in said plurality of genotypic data structures that have correlation values that are higher than the correlation values for all other genotypic data structures in said plurality of genotypic data structures;
 
 wherein the loci that correspond to said one or more genotypic data structures represent said one or more candidate chromosomal regions that associate with said phenotype and wherein an amount of said genome that is included in each locus in said plurality of loci is predetermined at a time prior to said identifying step, and wherein said instructions for determining include instructions for forming said correlation value in accordance with the expression;
 
 $c (P, G^{L}) = \frac{[\sum^{i} (?? (i) - ) (ℊ (i) - < G^{L} >)] \times Z}{{[\sum^{i} {(?? (i) - )}^{2}] [\sum^{i} {(ℊ (i) - < G^{L} >)}^{2}]}^{1 / 2}}$ where,c(P, G^L) is said correlation value;
 
 (i) is a value of the i^thelement of said phenotypic data structure;
 
 (i) is a value of the i^thelement of said genotypic data structure;
 
 
 
 is a mean value of all elements in said phenotypic data structure;
 
 <
 
 G^L>
 
 is a mean value of all elements in said genotypic data structure; and
 
 Z is a function of a number of components in the locus corresponding to the genotypic structure, having a variation between different strains of said species; and
 
 instructions for communicating said one or more genotypic data structures to a user, a display, a readily accessible computer memory or other computer on a network.

11. A computer program product for use in conjunction with a computer system, the computer program product comprising a physical computer readable storage medium and a computer program mechanism embedded therein, the computer program mechanism comprising:
- a genotypic database for storing variations in genomic sequences of a plurality of strains of a species;
 
 a phenotypic data structure, said phenotypic data structure comprising a difference in a phenotype between different strains of said species; and
 
 a program module for associating said phenotype with one or more candidate chromosomal regions in a genome of said species, said genome including a plurality of loci, said program module comprising;
 
 instructions for establishing a genotypic data structure, said genotypic data structure corresponding to a locus in said plurality of loci, said genotypic data structure comprising a variation of at least one component of said locus between different strains of said species stored in said genotypic database;
 
 instructions for determining a correlation value for said genotypic data structure by a comparison of said phenotypic data structure with said genotypic data structure;
 
 instructions for repeating said instructions for establishing and said instructions for determining, for each locus in said plurality of loci, thereby establishing a plurality of genotypic data structures and, for each respective genotypic data structure in the plurality of genotypic data structures, determining a correlation value;
 
 instructions for identifying one or more genotypic data structures in said plurality of genotypic data structures that have correlation values that are higher than the correlation values for all other genotypic data structures in said plurality of genotypic data structures;
 
 wherein the loci that correspond to said one or more genotypic data structures represent said one or more candidate chromosomal regions that associate with said phenotype and wherein an amount of said genome that is included in each locus in said plurality of loci is predetermined at a time prior to said identifying step, and wherein said instructions for determining include instructions for forming said correlation value in accordance with a correlative measure that is computed in accordance with the expression;
 
 $???? (P, G^{L}) = \frac{[\sum^{i} (?? (i) - ) (ℊ (i) - < G^{L} >)]}{{[\sum^{i} {(?? (i) - )}^{2}]}^{1 / 2}}$ where,(P, G^L) is said correlative measure;
 
 (i) is a value of the i^thelement of said phenotypic data structure;
 
 (i) is a value of the i^thelement of said genotypic data structure;
 
 
 
 is a mean value of all elements in said phenotypic data structure; and
 
 <
 
 G^L>
 
 is a mean value of all elements in said genotypic data structure; and
 
 instructions for communicating said one or more genotypic data structures to a user, a display, a computer memory or other computer on a network.

12. A computer system for associating a phenotype with one or more candidate chromosomal regions in a genome of a species, said genome including a plurality of loci, the computer system comprising:
- a central processing unit;
 
 a memory, coupled to the central processing unit, the memory storing;
 
 a genotypic database for storing variations in genomic sequences of a plurality of strains of said species;
 
 a phenotypic data structure that comprises a difference in said phenotype between different strains of said species; and
 
 a program module, said program module comprising;
 
 instructions for establishing a genotypic data structure, said genotypic data structure corresponding to a locus in said plurality of loci, said genotypic data structure comprising a variation of at least one component of said locus between different strains of said species stored in said genotypic database;
 
 instructions for determining a correlation value for said genotypic data structure by a comparison of said phenotypic data structure with said genotypic data structure;
 
 instructions for repeating said instructions for establishing and said instructions for determining, for each locus in said plurality of loci, thereby establishing a plurality of genotypic data structures and, for each respective genotypic data structure in the plurality of genotypic data structures, determining a correlation value;
 
 instructions for identifying one or more genotypic data structures in said plurality of genotypic data structures that have correlation values that are higher than the correlation values for all other genotypic data structures in said plurality of genotypic data structures;
 
 wherein the loci that correspond to said one or more genotypic data structures represent said one or more candidate chromosomal regions that associate with said phenotype and wherein an amount of said genome that is included in each locus in said plurality of loci is predetermined at a time prior to said identifying step, and wherein said instructions for determining include instructions for forming said correlation value in accordance with the expression;
 
 $c (P, G^{L}) = \frac{\sum^{i} (p (i) - ) (g (i) - < G^{L} >)}{{[\sum^{i} {(p (i) - )}^{2}] [\sum^{i} {(g (i) - < G^{L} >)}^{2}]}^{1 / 2}}$ where,c(P, G^L) is said correlation value;
 
 p(i) is a value of the i^thelement of said phenotypic data structure;
 
 g(i) is a value of the i^thelement of said genotypic data structure;
 
 
 
 is a mean value of all elements in said phenotypic data structure; and
 
 <
 
 G^L>
 
 is a mean value of all elements in said genotypic data structure; and
 
 instructions for communicating said one or more genotypic data structures to a user, a display, a readily accessible computer memory or other computer on a network.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Board of Trustees of the Leland Stanford Junior University (Stanford University)
Original Assignee
Roche Palo Alto LLC (Roche Holding AG)
Inventors
Usuka, Jonathan A., Peltz, Gary Allen, Grupe, Andrew
Primary Examiner(s)
DeJong; Eric S

Application Number

US10/015,167
Publication Number

US 20020137080A1
Time in Patent Office

3,045 Days
Field of Search

702/20, 702/27, 707/1
US Class Current

703/11
CPC Class Codes

C12Q 1/6876   Nucleic acid products used ...

G16B 20/00   ICT specially adapted for f...

G16B 20/20   Allele or variant detection...

System and method for predicting chromosomal regions that control phenotypic traits

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for predicting chromosomal regions that control phenotypic traits

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links