Method and apparatus for data mining to discover associations and covariances associated with data
First Claim
1. An automated method of discovering information relating to a collection of input data, the method comprising the steps of:
- obtaining the collection of input data, wherein the collection of input data comprises data items;
discovering information relating to the collection of input data based on a computation of a mutual information measure in accordance with at least a portion of the data items, wherein expected values of the mutual information measure are expressed as linear combinations of an incomplete Riemann zeta function; and
outputting at least a portion of results associated with the computation of the mutual information measure, wherein at least a portion of the results represent the discovered information relating to the collection of input data.
1 Assignment
0 Petitions
Accused Products
Abstract
Data mining techniques are provided which are effective and efficient for discovering useful information from an amorphous collection or data set of records. For example, the present invention provides for the mining of data, e.g., of several or many records, to discover interesting associations between entries of qualitative text, and covariances between data of quantitative numerical types, in records. Although not limited thereto, the invention has particular application and advantage when the data is of a type such as clinical, pharmacogenomic, forensic, police and financial records, which are characterized by many varied entries, since the problem is then said to be one of “high dimensionality” which has posed mathematical and technical difficulties for researchers. This is especially true when considering strong negative associations and negative covariance, i.e., between items of data which may so rarely come together that their concurrence is never seen in any record, yet the fact that this is not expected is of potential great interest.
-
Citations
23 Claims
-
1. An automated method of discovering information relating to a collection of input data, the method comprising the steps of:
-
obtaining the collection of input data, wherein the collection of input data comprises data items;
discovering information relating to the collection of input data based on a computation of a mutual information measure in accordance with at least a portion of the data items, wherein expected values of the mutual information measure are expressed as linear combinations of an incomplete Riemann zeta function; and
outputting at least a portion of results associated with the computation of the mutual information measure, wherein at least a portion of the results represent the discovered information relating to the collection of input data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. Apparatus for discovering information relating to a collection of input data, the apparatus comprising:
-
at least one processor operative to;
(i) obtain the collection of input data, wherein the collection of input data comprises data items;
(ii) discover information relating to the collection of input data based on a computation of a mutual information measure in accordance with at least a portion of the data items, wherein expected values of the mutual information measure are expressed as linear combinations of an incomplete Riemann zeta function; and
(iii) output at least a portion of results associated with the computation of the mutual information measure, wherein at least a portion of the results represent the discovered information relating to the collection of input data; and
memory, coupled to the at least one processor, for storing at least a portion of results associated with one or more of the obtaining, discovering and outputting operations. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. An article of manufacture for discovering information relating to a collection of input data, comprising a machine readable medium containing one or more programs which when executed implement the steps of:
-
obtaining the collection of input data, wherein the collection of input data comprises data items;
discovering information relating to the collection of input data based on a computation of a mutual information measure in accordance with at least a portion of the data items, wherein expected values of the mutual information measure are expressed as linear combinations of an incomplete Riemann zeta function; and
outputting at least a portion of results associated with the computation of the mutual information measure, wherein at least a portion of the results represent the discovered information relating to the collection of input data.
-
Specification