SYSTEM AND METHOD FOR MEASURING THE QUALITY OF DOCUMENT SETS
First Claim
1. A computer implemented method for identifying interesting characteristics within a collection of information, the method comprising the acts of:
- analyzing a collection of information for at least one identifying characteristic;
measuring distinctiveness based on a statistical distribution of the at least one identifying characteristic;
identifying a variation in the measurement of distinctiveness with respect to at least one additional dimension;
grouping at least one element of the collection of information based on the identified variation of the measurement of distinctiveness.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are described that calculate the interestingness of a set of one or more records in a database, either absolutely (i.e., compared to an overall collection of records) or relative to some other set of records. In one embodiment, the measure is a relative entropy value that has been normalized. Various applications of the measure are described in the context of an information retrieval system. These applications include, for example, guiding query interpretation, guiding view selection and summarization, intelligent ranges, event detection, concept triggers and interpreting user actions, hierarchy discovery, and adaptive data mining.
48 Citations
48 Claims
-
1. A computer implemented method for identifying interesting characteristics within a collection of information, the method comprising the acts of:
-
analyzing a collection of information for at least one identifying characteristic; measuring distinctiveness based on a statistical distribution of the at least one identifying characteristic; identifying a variation in the measurement of distinctiveness with respect to at least one additional dimension; grouping at least one element of the collection of information based on the identified variation of the measurement of distinctiveness. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A computer-readable medium having computer-readable instructions stored thereon that define instructions that, as a result of being executed by a computer, instruct the computer to perform a method for identifying interesting characteristics within a collection of information, the method comprising the acts of:
-
analyzing a collection of information for at least one identifying characteristic; measuring distinctiveness based on a statistical distribution of the at least one identifying characteristic; identifying a variation in the measurement of distinctiveness with respect to at least one additional dimension; grouping at least one element of the collection of information based on the identified variation of the measurement of distinctiveness.
-
-
26. A system for identifying interesting characteristics within a collection of information, the system comprising:
-
an analysis engine adapted to determine at least one identifying characteristic within a collection of information; a measurement engine adapted to determine a measurement of distinctiveness based on a statistical distribution of the at least one identifying characteristic; a tracking engine adapted to evaluate the measurement of distinctiveness with respect to an additional dimension; an organization engine adapted to organize at least one element of the collection of information based on a variation of the measurement of distinctiveness over the additional dimension. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48)
-
Specification