Process for discriminating between biological states based on hidden patterns from biological data
First Claim
1. A method of classifying a biological state from biological data by the detection of discriminatory patterns where the discriminatory pattern describes the biological state.
5 Assignments
0 Petitions
Accused Products
Abstract
The invention describes a process for determining a biological state through the discovery and analysis of hidden or non-obvious, discriminatory biological data patterns. The biological data can be from health data, clinical data, or from a biological sample, (e.g., a biological sample from a human, e.g., serum, blood, saliva, plasma, nipple aspirants, synovial fluids, cerebrospinal fluids, sweat, urine, fecal matter, tears, bronchial lavage, swabbings, needle aspirantas, semen, vaginal fluids, pre-ejaculate.), etc. which is analyzed to determine the biological state of the donor. The biological state can be a pathologic diagnosis, toxicity state, efficacy of a drug, prognosis of a disease, etc. Specifically, the invention concerns processes that discover hidden discriminatory biological data patterns (e.g., patterns of protein expression in a serum sample that classify the biological state of an organ) that describe biological states.
31 Citations
65 Claims
- 1. A method of classifying a biological state from biological data by the detection of discriminatory patterns where the discriminatory pattern describes the biological state.
-
2. A method of classifying a biological state from biological data by the steps of:
-
a. detecting a discriminatory pattern that is a subset of a larger set of data in a data stream, said discrimination defined by success in a learning set of data;
b. applying said discriminatory pattern to classify known or test data samples; and
c. using said discriminatory pattern to classify unknown data samples, wherein the discriminatory pattern is indicative of the biological state and is discriminatory even when individual data points are not. - View Dependent Claims (39, 40, 41, 42)
-
-
3. A method of classifying a biological state in biological data by the detection of discriminatory patterns using a vector space having multiple predetermined diagnostic clusters defining a known biological state comprising the steps of:
-
a. forming a normalized data stream that describes the biological data;
b. abstracting the data stream to calculate a sample vector that characterizes the data stream;
c. identifying the diagnostic cluster, if any, within which the sample vector rests;
d. assigning to the biological data the diagnosis of the identified diagnostic cluster or, if no cluster is identified, assigning to the biological data the diagnosis of a typical sample, NOS; and
e. using said discriminatory pattern to classify unknown data samples, wherein the discriminatory pattern describes the biological state and is discriminatory even when individual data points are not.
-
-
37. The method of claim 37, wherein the carcinoma is a prostatic carcinoma.
-
48. A method of diagnosing the disease of an organ of an individual which comprises:
-
a. analyzing a biological sample from the subject and calculating from the analysis a normalized vector, having at least 4 scalars and not more than 20 scalars, that is characteristic of the sample;
b. providing a vector space of between 4 and 20 dimensions occupied by a data cluster map comprising at least 6 equal-sized, non-overlapping data clusters, a multiplicity of which data clusters are associated with a disease diagnosis and a multiplicity of which data clusters are associated with a normal samples and no data cluster of said map is associated with more than one diagnosis;
c. calculating in which, if any, of the data clusters of the data cluster map the characteristic vector rests; and
d. assigning to the sample the disease diagnosis associated with the data cluster in which the characteristic vector rests or, if the vector rests in no cluster assigning a classification of non-normal. - View Dependent Claims (50, 53, 54, 55)
-
-
49. A method of diagnosing the stage of a disease of an organ of an individual which comprises:
-
a. analyzing a biological sample from the subject and calculating from the analysis a normalized vector, having at least 4 scalars and not more than 20 scalars, that is characteristic of the sample;
b. providing a vector space of between 4 and 20 dimensions occupied by a data cluster map comprising at least 6 equal-sized, non-overlapping data clusters, a multiplicity of which data clusters are associated with a disease diagnosis and a multiplicity of which data clusters are associated with a normal sample s and no data cluster of said map is associated with more than one diagnosis;
c. calculating in which, if any, of the data clusters of the data cluster map the characteristic vector rests; and
d. assigning to the sample the disease diagnosis associated with the data cluster in which the characteristic vector rests or, if the vector rests in no cluster assigning a classification of non-normal. - View Dependent Claims (51, 52)
-
-
56. A method of diagnosing a primary malignancy of an organ of a subject which comprises:
-
a. analyzing a biological sample from the subject and calculating from the analysis a normalized vector, having at least 4 scalars, that is characteristic of the sample;
b. providing a vector space of occupied by a data cluster map comprising at least 6 equal-sized, non-overlapping data clusters, a multiplicity of which data clusters are associated with a malignant diagnosis and a multiplicity of which data clusters are associated with a benign diagnosis and no data cluster of said map is associated with more than one diagnosis, wherein at least one scalar measures a product that is a contextual diagnostic product and wherein the size of the data cluster is defined by a Euclidean metric;
c. calculating in which, if any, of the data clusters of the data cluster map the characteristic vector rests; and
d. assigning to the sample the diagnosis associated with the data cluster in which the characteristic vector rests or if the vector rest in no data cluster assigning a diagnosis of non-normal, non-malignant. - View Dependent Claims (57, 58)
-
-
59. A computer software product that specifies computer executable code to execute a program comprising the following steps:
-
a. inputting a normalized data stream that describes a biological sample with a sample identifier;
b. inputting a set of diagnostic clusters, each cluster associated with a diagnosis of a known biological state;
c. abstracting the data stream to calculate a sample vector that characterizes the data stream;
d. identifying the diagnostic cluster, if any, within which the sample vector falls;
e. assigning to the sample the diagnosis of the identified diagnostic cluster or, if no cluster is identified assigning to the sample the diagnosis of non-normal, non-malignant; and
f. outputting the assigned diagnosis and the sample identifier. - View Dependent Claims (60)
-
-
61. A computer software product that specifies computer executable code to execute a program comprising the following steps:
-
a. inputting a set of instructional data streams, each data stream describing a biological sample with a known biological state;
b. inputting an operator specified number of points and an operator specified cluster size;
c. selecting an initial set of random logical chromosomes that specify the location of the pre-specified number of points of the data stream;
d. calculating a vector for each chromosome and for each data stream by abstracting the data stream at locations specified by the chromosome;
e. determining a fitness of each chromosome by finding the locations in the vector space of a multiplicity of non-overlapping data clusters of the pre-specified size that maximize the number of vectors that rest in clusters having a uniform status, wherein the larger the number of such vectors the higher the fitness;
f. optimizing the set of logical chromosomes by an iterative process comprising reiteration of steps (d) and (e), terminating logical chromosomes with low fitness, replicating logical chromosomes of high fitness, recombination and random modification of the chromosomes;
g. terminating the iterative process; and
h. outputting an optimized logical chromosome, and the locations of the data clusters that maximize the fitness of the optimized chromosome, so that a diagnostic algorithm that embodies the outputted logical chromosome and data clusters can be implemented. - View Dependent Claims (62)
-
- 63. A diagnostic model to determine a biological state of interest, wherein the diagnostic algorithm is characterized by having multiple diagnostic clusters of predetermined equal size in a vector space of a fixed number of dimensions.
Specification