System and method for SNP genotype clustering
First Claim
Patent Images
1. A method for allelic classification, the method comprising:
- acquiring intensity information for a plurality of samples wherein the intensity information comprises a first intensity component associated with a first allele and a second intensity component associated with a second allele;
evaluating the intensity information for each of the plurality of samples to identify one or more data clusters, each cluster associated with a discrete allelic combination and determined, in part, by comparing the first intensity component relative to the second intensity component;
generating a likelihood model that predicts the probability that a selected sample will reside within a particular data cluster based upon its intensity information; and
applying the likelihood model to each of the plurality of samples to determine its associated allelic composition.
6 Assignments
0 Petitions
Accused Products
Abstract
A system and methods for evaluating genetic information and biological data applying a clustering approach which may be used for allele calling and genotyping. Statistical analysis of sample data is performed at various levels to develop a model which associates individual data points with selected genotyping clusters and provides a relative indication of the call confidence. The methods provide a unified framework for allele-calling in many different contexts and may be applied to the data acquired from various identification methodologies.
-
Citations
83 Claims
-
1. A method for allelic classification, the method comprising:
-
acquiring intensity information for a plurality of samples wherein the intensity information comprises a first intensity component associated with a first allele and a second intensity component associated with a second allele;
evaluating the intensity information for each of the plurality of samples to identify one or more data clusters, each cluster associated with a discrete allelic combination and determined, in part, by comparing the first intensity component relative to the second intensity component;
generating a likelihood model that predicts the probability that a selected sample will reside within a particular data cluster based upon its intensity information; and
applying the likelihood model to each of the plurality of samples to determine its associated allelic composition. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A method for clustering analysis, the method comprising:
-
identifying a sample set comprising a plurality of data points, each data point having an angular value representative of an association between a first and a second intensity component;
generating a likelihood model and associated parameter set wherein the angular values of the data points are used in determining the appropriate parameters to be used in the likelihood model and wherein the efficacy of the likelihood model is assessed by evaluating the probability the likelihood model properly identifies selected data points in the sample set;
applying the likelihood model to the plurality of data points within the sample set and grouping the data points into discrete clusters; and
associating a selected classification with each discrete cluster and its component data points. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
-
34. A method for allelic classification, the method comprising:
-
identifying a sample set comprising a plurality of data points each having at least two component intensity values;
evaluating the component intensity values for the plurality of data points to group the data points into one or more data clusters representative of discrete allelic classifications;
generating a likelihood function that describes the grouping of a selected data point using its component intensity value; and
associating an allelic classification with each data point using the likelihood function. - View Dependent Claims (35, 36, 37, 38, 39, 40, 41)
-
-
42. A computer readable medium having stored thereon instructions which cause a general purpose computer to perform the steps of:
-
acquiring experimental information for a plurality of samples wherein the experimental information comprises a first data component associated with a first allele and a second data component associated with a second allele;
evaluating the experimental information for each of the plurality of samples to identify one or more data clusters, each cluster associated with a discrete allelic combination and determined, in part, by comparing the first data component relative to the second data component;
generating a likelihood model that predicts the probability that a selected sample will reside within a particular data cluster based upon its experimental information; and
applying the likelihood model to each of the plurality of samples to determine its associated allelic composition. - View Dependent Claims (43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58)
-
-
59. A computer readable medium having stored thereon instructions which cause a general purpose computer to perform the steps of:
-
identifying a sample set comprising a plurality of data points, each data point having an angular value representative of an association between a first and a second intensity component;
generating a likelihood model and associated parameter set wherein the angular values of the data points are used in determining the appropriate parameters to be used in the likelihood model and wherein the efficacy of the likelihood model is assessed by evaluating the probability the likelihood model properly identifies selected data points in the sample set;
applying the likelihood model to the plurality of data points within the sample set and grouping the data points into discrete clusters; and
associating a selected classification with each discrete cluster and its component data points. - View Dependent Claims (60, 61, 62, 63, 64, 65)
-
-
66. A computer readable medium having stored thereon instructions which cause a general purpose computer to perform the steps of:
-
identifying a sample set comprising a plurality of data points each having at least two component experimental values;
evaluating the component experimental values for the plurality of data points to group the data points into one or more data clusters representative of discrete allelic classifications;
generating a likelihood function that describes the grouping of a selected data point using its component experimental value; and
associating an allelic classification with each data point using the likelihood function. - View Dependent Claims (67, 68)
-
-
69. A computer-based system for performing allelic classification, the system comprising:
-
a database for storing experimental information for a plurality of samples, the experimental information reflecting the allelic composition of each sample;
a program which performs the operations of;
retrieving experimental information for the plurality of samples from the database wherein the experimental information comprises a first data component associated with a first allele and a second data component associated with a second allele;
evaluating the experimental information for each of the plurality of samples to identify one or more data clusters, each cluster associated with a discrete allelic combination and determined, in part, by comparing the first experimental component relative to the experimental component;
generating a likelihood model comprising a model-fit probability assessment that estimates confidence in the likelihood model itself and assesses how well a selected sample and its respective experimental information fit the model, the model further used to predict the probability that a selected sample is associated with a particular data cluster based upon its experimental information; and
applying the likelihood model to each of the plurality of samples to determine its associated allelic composition. - View Dependent Claims (70, 71, 72, 73, 74, 75, 76, 77, 78, 79)
-
-
80. A computer-based system for performing allelic classification, the system comprising:
-
a database for storing experimental information for a plurality of samples, the experimental information reflecting the allelic composition of each sample; and
a program which performs the operations of;
identifying a sample set comprising a plurality of data points, each data point having an angular value representative of an association between a first and a second intensity component;
generating a likelihood model and associated parameter set wherein the angular values of the data points are used in determining the appropriate parameters to be used in the likelihood model and wherein the efficacy of the likelihood model is assessed by evaluating the probability the likelihood model properly identifies selected data points in the sample set;
applying the likelihood model to the plurality of data points within the sample set and grouping the data points into discrete clusters; and
associating a selected classification with each discrete cluster and its component data points. - View Dependent Claims (81, 82, 83)
-
Specification