Prediction by collective likelihood from emerging patterns
First Claim
1. A method of determining whether a test sample, having test data T, is categorized in one of a number n of classes wherein n is 2 or more, comprising:
- extracting a plurality of emerging patterns from a training data set D that has at least one instance of each of said n classes of data;
creating n lists, wherein;
an ith list of said n lists contains a frequency of occurrence, ƒ
i(m), of each emerging pattern EPi(m) from said plurality of emerging patterns that has a non-zero occurrence in an ith class of data;
using a fixed number, k, of emerging patterns, wherein k is substantially less than a total number of emerging patterns in the plurality of emerging patterns, calculating n scores;
wherein;
an ith score of said n scores is derived from the frequencies of k emerging patterns in said ith list that also occur in said test data; and
deducing which of said n classes of data the test data is categorized in, by selecting the highest of said n scores.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, method and computer program product for determining whether a test sample is in a first or a second class of data (for example: cancerous or normal), comprising: extracting a plurality of emerging patterns from a training data set, creating a first and second list containing respectively, a frequency of occurrence of each emerging pattern that has a non-zero occurrence in the first and in the second class of data; using a fixed number of emerging patterns, calculating a first and second score derived respectively from the frequencies of emerging patterns in the first list that also occur in the test data, and from the frequencies of emerging patterns in the second list that also occur in the test data; and deducing whether the test sample is categorized in the first or the second class of data by selecting the higher of the first and the second score.
-
Citations
75 Claims
-
1. A method of determining whether a test sample, having test data T, is categorized in one of a number n of classes wherein n is 2 or more, comprising:
-
extracting a plurality of emerging patterns from a training data set D that has at least one instance of each of said n classes of data;
creating n lists, wherein;
an ith list of said n lists contains a frequency of occurrence, ƒ
i(m), of each emerging pattern EPi(m) from said plurality of emerging patterns that has a non-zero occurrence in an ith class of data;
using a fixed number, k, of emerging patterns, wherein k is substantially less than a total number of emerging patterns in the plurality of emerging patterns, calculating n scores;
wherein;
an ith score of said n scores is derived from the frequencies of k emerging patterns in said ith list that also occur in said test data; and
deducing which of said n classes of data the test data is categorized in, by selecting the highest of said n scores. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 69, 70, 72, 74)
-
-
11. A method of determining whether a test sample, having test data T, is categorized in a first class or a second class, comprising:
-
extracting a plurality of emerging patterns from a training data set D that has at least one instance of a first class of data and at least one instance of a second class of data;
creating a first list and a second list wherein;
said first list contains a frequency of occurrence, ƒ
i(m), of each emerging pattern EP1(M) from said plurality of emerging patterns that has a non-zero occurrence in said first class of data; and
said second list contains a frequency of occurrence, ƒ
2(m), of each emerging pattern EP2(m) from said plurality of emerging patterns that has a non-zero occurrence in said second class of data;
using a fixed number, k, of emerging patterns, wherein k is substantially less than a total number of emerging patterns in the plurality of emerging patterns, calculating;
a first score derived from the frequencies of k emerging patterns in said first list that also occur in said test data, and a second score derived from the frequencies of k emerging patterns in said second list that also occur in said test data; and
deducing whether the test data is categorized in the first class of data or in the second class of data by selecting the higher of said first score and said second score. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
47. A computer program product for determining whether a test sample, for which there exists test data, is categorized in a first class or a second class, wherein the computer program product is for use in conjunction with a computer system, the computer program product comprising:
a computer readable storage medium and a computer program mechanism embedded therein, the computer program mechanism comprising;
at least one statistical analysis tool;
at least one sorting tool; and
control instructions for;
accessing a data set that has at least one instance of a first class of data and at least one instance of a second class of data;
extracting a plurality of emerging patterns from said data set;
creating a first list and a second list wherein, for each of said plurality of emerging patterns;
said first list contains a frequency of occurrence, ƒ
i(1), of each emerging pattern i from said plurality of emerging patterns that has a non-zero occurrence in said first class of data, andsaid second list contains a frequency of occurrence, ƒ
i(2), of each emerging pattern i from said plurality of emerging patterns that has a non-zero occurrence in said second class of data;
using a fixed number, k, of emerging patterns, wherein k is substantially less than a total number of emerging patterns in the plurality of emerging patterns, calculating;
a first score derived from the frequencies of k emerging patterns in said first list that also occur in said test data, and a second score derived from the frequencies of k emerging patterns in said second list that also occur in said test data; and
deducing whether the test sample is categorized in the first class of data or in the second class of data by selecting the higher of the first score and the second score. - View Dependent Claims (48, 49, 50, 51, 52, 53, 54, 55, 56, 75)
-
57. A system for determining whether a test sample, for which there exists test data, is categorized in a first class or a second class, the system comprising:
-
at least one memory, at least one processor and at least one user interface, all of which are connected to one another by at least one bus;
wherein said at least one processor is configured to;
access a data set that has at least one instance of a first class of data and at least one instance of a second class of data;
extract a plurality of emerging patterns from said data set;
create a first list and a second list wherein, for each of said plurality of emerging patterns;
said first list contains a frequency of occurrence, ƒ
i(1), of each emerging pattern i from said plurality of emerging patterns that has a non-zero occurrence in said first class of data, andsaid second list contains a frequency of occurrence, ƒ
i(2), of each emerging pattern i from said plurality of emerging patterns that has a non-zero occurrence in said second class of data;
use a fixed number, k, of emerging patterns, wherein k is substantially less than a total number of emerging patterns in the plurality of emerging patterns, to calculate;
a first score derived from the frequencies of k emerging patterns in said first list that also occur in said test data, and a second score derived from the frequencies of k emerging patterns in said second list that also occur in said test data; and
deduce whether the test sample is categorized in the first class of data or in the second class of data by selecting the higher of the first score and the second score. - View Dependent Claims (58, 59, 60, 61, 62, 63, 64, 65, 66)
-
-
67. A method of determining whether a sample cell is cancerous, comprising:
-
extracting a plurality of emerging patterns from a data set that comprises gene expression data for a plurality of cancerous cells and a gene expression data for a plurality of normal cells;
creating a first list and a second list wherein;
said first list contains a frequency of occurrence, ƒ
i(1), of each emerging pattern i from said plurality of emerging patterns that has a non-zero occurrence in said cancerous cells, andsaid second list contains a frequency of occurrence, ƒ
i(2), of each emerging pattern i from said plurality of emerging patterns that has a non-zero occurrence in said normal cells;
using a fixed number, k, of emerging patterns, wherein k is substantially less than a total number of emerging patterns in the plurality of emerging patterns, calculating;
a first score derived from the frequencies of k emerging patterns in said first list that also occur in said test data, and a second score derived from the frequencies of k emerging patterns in said second list that also occur in said test data; and
deducing whether the sample cell is cancerous if said first score is higher than said second score.
-
-
68. A method of determining whether a test sample, having test data T, is categorized in one of a number of classes, substantially as hereinbefore described with reference to and as illustrated in the accompanying drawings.
-
71. A computer program product for determining whether a test sample, for which there exists test data, is categorized in one of a number of classes, constructed and arranged to operate substantially as hereinbefore described with reference to and as illustrated in the accompanying drawings.
-
73. A system for determining whether a test sample, for which there exists test data, is categorized in one of a number of classes, constructed and arranged to operate substantially as hereinbefore described with reference to and as illustrated in the accompanying drawings.
Specification