SYSTEM AND METHOD FOR MEASURING THE QUALITY OF DOCUMENT SETS
First Claim
1. A method for measuring the distinctiveness of a set, the method comprising acts of:
- analyzing the set to obtain a statistical distribution of at least one identifying characteristic within the set;
generating a measurement of distinctiveness for the set based on the statistical distribution of the at least one identifying characteristic; and
normalizing the measurement of the distinctiveness of the set.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are described that calculate the interestingness of a set of one or more records in a database, either absolutely (i.e., compared to an overall collection of records) or relative to some other set of records. In one embodiment, the measure is a relative entropy value that has been normalized. Various applications of the measure are described in the context of an information retrieval system. These applications include, for example, guiding query interpretation, guiding view selection and summarization, intelligent ranges, event detection, concept triggers and interpreting user actions, hierarchy discovery, and adaptive data mining.
-
Citations
84 Claims
-
1. A method for measuring the distinctiveness of a set, the method comprising acts of:
-
analyzing the set to obtain a statistical distribution of at least one identifying characteristic within the set; generating a measurement of distinctiveness for the set based on the statistical distribution of the at least one identifying characteristic; and normalizing the measurement of the distinctiveness of the set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. In an information retrieval system, a computer-implemented method for information processing, comprising:
-
analyzing a set of documents to obtain a statistical distribution based on values associated with the set of documents, the set of documents having a given size; computing a value of a function that measures distinctiveness of the obtained statistical distribution relative to a baseline statistical distribution; normalizing the value relative to a distribution of values of the function over a space of document sets, wherein each document set in the space has a size that is comparable to the given size; and outputting a response derived from the normalized value. - View Dependent Claims (29, 30)
-
-
31. A computer-readable medium having computer-readable instructions stored thereon that define instructions that, as a result of being executed by a computer, instruct the computer to perform a method for measuring the distinctiveness of a set, the method comprising the acts of:
-
analyzing the set to obtain a statistical distribution of at least one identifying characteristic within the set; generating a measurement of distinctiveness for the set based on the statistical distribution of the at least one identifying characteristic; and normalizing the measurement of the distinctiveness of the set.
-
-
32. A system for measuring the distinctiveness of a set, the system comprising:
-
an analysis component adapted to obtain a statistical distribution of at least one identifying characteristic within a set; a measurement component adapted to generate a measurement of distinctiveness for the set based on the statistical distribution of the at least one identifying characteristic; and a normalization component adapted to normalize the statistical distribution of the at least one identifying characteristic of the measured set. - View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53)
-
-
54. A method for comparing the distinctiveness of a plurality of sets within a collection of information, the method comprising the acts of:
-
sampling, randomly, at least one set; determining a statistical distribution of at least one identifying characteristic associated with elements of the at least one set; generating a relative measurement of distinctiveness based on the statistical distributions of the at least one identifying characteristic associated with the elements of the at least one set and another set. - View Dependent Claims (55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68)
-
-
69. A computer-readable medium having computer-readable instructions stored thereon that define instructions that, as a result of being executed by a computer, instruct the computer to perform a method for comparing the distinctiveness of a plurality of sets generated through interaction with a collection of information, the method comprising the acts of:
-
sampling, randomly, at least one set; determining a statistical distribution of at least one identifying characteristic associated with elements of the at least one set; generating a relative measurement of distinctiveness based on the statistical distributions of the at least one identifying characteristic associated with the elements of the at least one set and another set. - View Dependent Claims (71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84)
-
-
70. A system for comparing the distinctiveness of a plurality of sets generated through interaction with a collection of information, the system comprising:
-
a sampling component adapted to randomly sample at least one set; an analysis component adapted to determine a statistical distribution of at least one identifying characteristic associated with elements of the at least one set; a measurement component adapted to determine a relative measurement of distinctiveness based on the statistical distributions of the at least one identifying characteristic associated with the elements of the at least one set and another set.
-
Specification