Statistical deconvoluting of mixtures
First Claim
1. A computer-based method of encoding features of data objects, and of identifying and correlating individual said features to a response characteristic that is a trait of interest of the data object, applicable to data objects in a data set that is characterized in being a mixture of data object classes, each data object class containing one or more of said data objects, and wherein multiple data objects present a same or similar value of the trait of interest, but classes of data objects produce the response characteristic that is a trait of interest through different underlying mechanisms,comprising the steps of:
- (a) assembling a set of descriptors and converting said set of descriptors into the form of a bit string such that each descriptor reflects the presence or absence of any given potentially useful feature of interest in a data object of interest;
(b) examining each data object for presence or absence of each of said descriptors;
(c) assembling the results of step (b) into a vector for each data object, noting the presence or absence of each feature of interest in said data object;
(d) assembling all vectors generated in step (c) into a matrix with each row of the matrix corresponding to a data object and each column corresponding to a feature of interest;
(e) dividing the data in said matrix into two daughter sets on the basis of presence or absence of a given feature of interest from said set of descriptors; and
(f) repeating step (e) until each member of said matrix has been identified in terms of presence or absence of any given feature of interest from said set of descriptors and each of said members has been assigned to a terminal node.
2 Assignments
0 Petitions
Accused Products
Abstract
Statistical classification of activities of molecules is a computer implemented methodology of QSAR employing visualization of molecular features and statistical techniques for correlating features of molecules with their observed biological properties. Each molecule is described by noting the presence (1) or absence (0) of a feature of interest. The identification of specific features coded by 1'"'"'s or 0'"'"'s is accomplished by recursive partitioning. The data sets are planned or unplanned. The method is also applicable to classification of individuals in biological populations on the basis of their genetic makeup.
-
Citations
73 Claims
-
1. A computer-based method of encoding features of data objects, and of identifying and correlating individual said features to a response characteristic that is a trait of interest of the data object, applicable to data objects in a data set that is characterized in being a mixture of data object classes, each data object class containing one or more of said data objects, and wherein multiple data objects present a same or similar value of the trait of interest, but classes of data objects produce the response characteristic that is a trait of interest through different underlying mechanisms,
comprising the steps of: -
(a) assembling a set of descriptors and converting said set of descriptors into the form of a bit string such that each descriptor reflects the presence or absence of any given potentially useful feature of interest in a data object of interest;
(b) examining each data object for presence or absence of each of said descriptors;
(c) assembling the results of step (b) into a vector for each data object, noting the presence or absence of each feature of interest in said data object;
(d) assembling all vectors generated in step (c) into a matrix with each row of the matrix corresponding to a data object and each column corresponding to a feature of interest;
(e) dividing the data in said matrix into two daughter sets on the basis of presence or absence of a given feature of interest from said set of descriptors; and
(f) repeating step (e) until each member of said matrix has been identified in terms of presence or absence of any given feature of interest from said set of descriptors and each of said members has been assigned to a terminal node. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
-
2. A computer-based apparatus system for allowing a user thereof to encode features of data objects, and to identify and correlate individual said features to a response characteristic that is a trait of interest of the data object, applicable to data objects in a data set that is characterized in being a mixture of data object classes, each data object class containing one or more of said data objects, and wherein multiple data objects present a same or similar trait of interest, but classes of data objects produce the response characteristic that is a trait of interest through different underlying mechanisms, comprising:
-
(a) input means responsive to operator commands enabling an operator to specify a set of descriptors that are subsequently converted into a bit-string, such that each descriptor reflects the presence or absence of a potentially useful feature of interest in a data object of interest;
(b) storage means for storing the assembled set of (a);
(c) memory means for executing programmed steps that examine each data object for presence or absence of each of said descriptors;
(d) means for assembling the results of (c) into a virtual matrix with each row of the matrix corresponding to an object and each column corresponding to a feature of interest;
(e) means for assigning each data object in said matrix recursively into one of two defined categories on the basis of presence or absence of a given feature of interest from said set of descriptors and repeating such analysis until each member of said mixture has been identified in terms of presence or absence of features of interest from said set of descriptors and assigned to a terminal node; and
(f) output means for visually displaying, using computer graphics, a relationship of said descriptors with said data objects and classes.
-
-
3. A computer software system having a set of instructions for controlling a general purpose digital computer in performing a desired function comprising:
-
a set of instructions formed into each of a plurality of modules, each module comprising;
(a) an input process responsive to operator commands enabling an operator to specify a set of descriptors and convert said descriptors into a bit string such that each descriptor reflects the presence or absence of a potentially useful feature of interest of a data object of interest, wherein each data object is a member of a data set that is characterized in being a mixture of data object classes, each data object class containing one or more of said data objects, and wherein multiple data objects present a same or similar trait of interest, but classes of data objects produce the response characteristic that is a trait of interest through different underlying mechanisms;
(b) a data storage process for storing the assembled set of (a);
(c) a computational process for executing programmed steps that examine each member of said mixture for presence or absence of each of said descriptors;
(d) a computational process for assembling the results of (c) into a vector for each data object and a matrix for all vectors;
(e) a computational process for assigning each data object in said matrix into one of two defined categories on the basis of presence or absence of a given feature of interest from said set of descriptors and repeating such analysis until each member of said mixture has been identified in terms of presence or absence of each feature of interest from said set of descriptors and assigned to a terminal node;
(f) a data storage process; and
(g) an output process for visually displaying, using computer graphics, a relationship of said descriptors with said data objects and classes.
-
-
4. A computer-based method of encoding mixture features of planned mixtures or of inadvertent mixtures, or of a combination of planned or inadvertent mixtures, and of identifying and correlating individual said features to a response characteristic of the mixture object, wherein said mixture object is in a data set wherein multiple mixture objects comprising the data set present the same trait of interest through a common underlying mechanism;
-
comprising the steps of;
(a) assembling a set of descriptors and converting said set of descriptors into the form of a bit string such that each descriptor reflects the presence or absence of a potentially useful feature of interest in a mixture object;
(b) examining each mixture object for presence or absence of each of said descriptors;
(c) assembling the results of step (b) into a vector for each mixture object, noting the presence or absence of each feature of interest in said mixture object;
(d) assembling all vectors generated in step (c) into a matrix with each row corresponding to a mixture object and each column corresponding to a feature of interest;
(e) dividing the mixture objects in said matrix into two defined daughter nodes on the basis of presence or absence of a given feature of interest from said set of descriptors; and
(f) repeating step (e) until each mixture object of said matrix has been identified in terms of presence or absence of given features of interest from said set of descriptors and assigned to a terminal node. - View Dependent Claims (37)
-
-
5. A computer-based apparatus system for allowing a user thereof to encode features of planned mixtures or of inadvertent mixtures, or of a combination of planned or inadvertent mixtures, and to identify and correlate individual said features to a response characteristic of the mixture object, wherein said mixture object is in a data set wherein multiple mixture objects comprising the data set present the same trait of interest through a common underlying mechanism, comprising:
-
(a) input means responsive to operator commands enabling an operator to specify a set of descriptors that are subsequently converted into a bit string, such that each descriptor reflects the presence or absence of a potentially useful feature of interest in a mixture object of interest;
(b) storage means for storing the assembled set of (a);
(c) memory means for executing programmed steps that examine each mixture object for presence or absence of each of said descriptors;
(d) means for assembling the results of (c) into a virtual matrix with each row corresponding to a mixture object and each column corresponding to a feature;
(e) means for assigning each mixture object in said matrix recursively into one of two defined categories on the basis of presence or absence of a given feature of interest from said set of descriptors and repeating such analysis until each mixture object of said matrix population has been classified in terms of presence or absence of given features of interest from said set of descriptors and assigned to a terminal node; and
(f) output means for visually displaying, using computer graphics, the relationships of said descriptors with said mixture classes and mixture objects.
-
-
6. A computer software system having a set of instructions for controlling a general purpose digital computer in performing a desired function comprising:
-
a set of instructions formed into each of a plurality of modules, each module comprising;
(a) an input process responsive to operator commands enabling an operator to specify a set of descriptors and convert said descriptors into a bit string such that each descriptor reflects the presence or absence of a potentially useful feature of interest in a mixture object of interest, wherein each mixture object is a member of a data set where each mixture object presents a same trait of interest through a common underlying mechanism;
(b) a data storage process for storing the assembled set of (a);
(c) a computational process for executing programmed steps that examine each member object of said data set for presence or absence of each of said descriptors;
(d) a computational process for assembling the results of (c) into a vector for each mixture object and a virtual matrix with each row corresponding to a mixture object and each column corresponding to a feature;
(e) a computational process for analyzing the data in said matrix into one of two defined categories on the basis of presence or absence of a given feature of interest from said set of descriptors and repeating such analysis until each member of said mixture has been identified in terms of presence or absence of each feature of interest from said set of descriptors and assigned to a terminal node;
(f) a data storage process; and
(g) an output process for visually displaying, using computer graphics, a relationship of said descriptors with said mixture objects and classes.
-
-
7. A computer-based method of encoding mixture features of planned mixtures or of inadvertent mixtures, or of a combination of planned or inadvertent mixtures, and of identifying and correlating individual said features to a response characteristic that is a trait of interest of the mixture object, wherein said mixture object is in a data set that is characterized in being a mixture of mixture object classes, each class containing one or more of said mixture objects, and wherein multiple mixture objects present a same trait of interest, but classes of mixture objects produce the response characteristic which is a trait of interest through different underlying mechanisms,
comprising the steps of: -
(a) assembling a set of descriptors and converting said set of descriptors into the form of a bit string such that each descriptor reflects the presence or absence of a potentially useful feature of interest in a mixture object of interest;
(b) examining each mixture object for presence or absence of each of said descriptors;
(c) assembling the results of step (b) into a vector for each mixture object, noting the presence or absence of each feature in said data object;
(d) assembling all vectors generated in step (c) into a matrix with each row corresponding to a mixture object and each column corresponding to a feature;
(e) dividing the mixture objects in said matrix into two defined daughter nodes on the basis of presence or absence of a given feature of interest from said set of descriptors; and
(f) repeating step (e) until each mixture object of said matrix has been identified in terms of presence or absence of given features of interest from said set of descriptors and assigned to a terminal node.
-
-
8. A computer-based apparatus system for allowing a user thereof to encode features of planned mixtures or of inadvertent mixtures, or of a combination of planned or inadvertent mixtures, and to identify and correlate individual said features to a response characteristic that is a trait of interest of the mixture object, applicable to mixture objects in a data set that is characterized in being a mixture of mixture object classes, each class containing one or more of said mixture objects, and wherein multiple mixture objects present a same trait of interest, but classes of mixture objects produce the response characteristic that is a trait of interest through different underlying mechanisms, comprising:
-
(a) input means responsive to operator commands enabling an operator to specify a set of descriptors that are subsequently converted into a bit string, such that each descriptor reflects the presence or absence of a potentially useful feature of interest in a mixture object of interest;
(b) storage means for storing the assembled set of (a);
(c) memory means for executing programmed steps that examine each mixture object for presence or absence of each of said descriptors;
(d) means for assembling the results of (c) into a virtual matrix with each row corresponding to a mixture object and each column corresponding to a feature;
(e) means for assigning each mixture object in said matrix recursively into one of two defined categories on the basis of presence or absence of a given feature of interest from said set of descriptors and repeating such analysis until each mixture object of said matrix has been classified in terms of presence or absence of given features of interest from said set of descriptors and assigned to a terminal node; and
(f) output means for visually displaying, using computer graphics, the relationships of said descriptors with said mixture objects and classes.
-
-
9. A computer software system having a set of instructions for controlling a general purpose digital computer in performing a desired function comprising:
-
a set of instructions formed into each of a plurality of modules, each module comprising;
(a) an input process responsive to operator commands enabling an operator to specify a set of descriptors and convert said descriptors into a bit string such that each descriptor reflects the presence or absence of a potentially useful feature of interest in a mixture object of interest, wherein each mixture object is a member of a data set that is characterized in being a mixture of classes, each class containing one or more of said mixture objects, and wherein multiple mixture objects present the same trait of interest, but classes of mixture objects produce the response characteristic that is a trait of interest through different underlying mechanisms;
(b) a data storage process for storing the assembled set of (a);
(c) a computational process for executing programmed steps that examine each mixture object of said matrix for presence or absence of each of said descriptors;
(d) a computational process for assembling the results of (c) into a vector for each mixture object and a virtual matrix with each row corresponding to a mixture object and each column corresponding to a feature;
(e) a computational process for assigning each mixture object in said matrix into one of two defined categories on the basis of presence or absence of a given feature of interest from said set of descriptors and repeating such analysis until each member of said matrix has been classified in terms of presence or absence of given features of interest from said set of descriptors and assigned to a terminal node;
(f) a data storage process; and
(g) an output process for visually displaying, using computer graphics, a relationship of said descriptors with said mixture objects and classes.
-
-
10. A computer-based method of analyzing biological potency of individual chemical structure features out of a plural mixture of chemical compounds wherein a created data set is characterized in being a mixture of data objects, each data object itself being a mixture of active and/or inactive chemical compounds, which active chemical compounds exhibit a trait of interest, wherein the underlying mechanisms of activity may be through a single or multiple mechanisms, comprising the steps of:
-
(a) assembling a set of descriptors such that each descriptor captures a chemically useful feature of one or more members of a mixture of chemical compounds such that one member is captured if individual chemical compounds are being decoded, two members are captures if pairs of chemical compounds are being decoded, three members are captured if triples of chemical compounds are being decoded and so on;
(b) examining each member, pair or triple, or so forth, of said mixture of chemical compounds for presence or absence of each of said features of interest;
(c) assembling the results of step (b) into a descriptor vector;
d) comparing the features of the individual compound, pair, triple and so forth, to the features of a terminal node of choice and determining a resident terminal node;
(e) repeating step (d) until each compound, pair, triple and so forth of said set of mixtures of chemical compounds has been identified and characterized in relation to the terminal node it would reside within.
-
-
38. A computer-based method of encoding, decoding and identifying individual chemical compounds out of a chemical mixture, comprising the steps of:
-
(a) assembling the results of previously conducted screening of the chemical mixture for a biological activity of interest;
(b) assembling a set of descriptors such that each descriptor captures a chemically useful feature of one or more members of a chemical mixture;
(c) examining each combination of members of said chemical mixture for presence or absence of each of said descriptors;
(d) correlating presence or absence of said chemical descriptors with an assigned terminal node, thereby identifying predicted activity; and
(e) analyzing subsequent chemical mixtures for chemical structure, comparing their chemical structure against said predicted activity and extrapolating biological reactivity of such subsequent chemical mixtures therefrom. - View Dependent Claims (39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53)
-
-
54. A computer-based method of encoding, identifying and correlating individual genetic features of a genetic polymorphism out of a plural populational mixture of individual subjects so as to identify useful diagnoses and therapies of individuals and in the identification of genes and gene products useful in defining biological targets of interest, comprising the steps of:
-
(a) assembling a set of descriptors such that each descriptor captures a genetically useful feature, allele, alleles, or marker, of one or more members of a mixture population of individuals having a phenotype of interest;
(b) examining each member of said population of individuals for presence or absence of each of said genetic features;
(c) assembling the results of step (b) into a matrix;
(d) dividing the data in said matrix into one of two defined categories on the basis of presence or absence of a given genetic features from said set of genetic features;
(e) repeating step (d) until each member of said population of individuals has been identified and characterized in terms of presence or absence of each genetic feature; and
(f) correlating presence or absence of said genetic features with known phenotypes of each of said mixture population of individuals, thereby deriving a relationship between genotype and phenotype, said relationship useful in diagnosis and therapy of individuals and also useful for identification of gene products, said gene products useful for selecting drug targets or said gene products useful for determining the genetic origiSn of a disease. - View Dependent Claims (55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73)
-
Specification