System and method for measuring the quality of document sets
First Claim
1. A computer implemented method for comparing the distinctiveness of a plurality of sets within a collection of information, the method comprising:
- sampling, by a computer system, from the collection of information to generate at least one set;
establishing, automatically, at least one identifying characteristic within the at least one set;
determining a statistical distribution of the at least one identifying characteristic associated with the at least one set; and
generating, by the computer system, a relative measurement of distinctiveness based on the statistical distribution of the at least one identifying characteristic associated with the at least one set and at least one other set, wherein the generating the relative measure of distinctiveness comprises accounting for a set size of a measured set based on a measurement of distinctiveness for a comparison set and a size for the comparison set, and normalizing the relative measurement of distinctiveness based on the set size of the measured set and the size for the comparison set.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods are described that calculate the interestingness of a set of one or more records in a database, either absolutely (i.e., compared to an overall collection of records) or relative to some other set of records. In one embodiment, the measure is a relative entropy value that has been normalized. Various applications of the measure are described in the context of an information retrieval system. These applications include, for example, guiding query interpretation, guiding view selection and summarization, intelligent ranges, event detection, concept triggers and interpreting user actions, hierarchy discovery, and adaptive data mining.
-
Citations
30 Claims
-
1. A computer implemented method for comparing the distinctiveness of a plurality of sets within a collection of information, the method comprising:
-
sampling, by a computer system, from the collection of information to generate at least one set; establishing, automatically, at least one identifying characteristic within the at least one set; determining a statistical distribution of the at least one identifying characteristic associated with the at least one set; and generating, by the computer system, a relative measurement of distinctiveness based on the statistical distribution of the at least one identifying characteristic associated with the at least one set and at least one other set, wherein the generating the relative measure of distinctiveness comprises accounting for a set size of a measured set based on a measurement of distinctiveness for a comparison set and a size for the comparison set, and normalizing the relative measurement of distinctiveness based on the set size of the measured set and the size for the comparison set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A non-transitory computer-readable medium having computer-readable instructions stored thereon that define instructions that, as a result of being executed by a computer, instruct the computer to compare the distinctiveness of a plurality of sets generated through interaction with a collection of information, the comparing comprising:
-
sampling from the collection of information to generate at least one set; establishing, automatically, at least one identifying characteristic within the at least one set; determining a statistical distribution of the at least one identifying characteristic associated with elements of the at least one set; generating a relative measurement of distinctiveness based on the statistical distributions of the at least one identifying characteristic associated the at least one set and at least one other set, wherein the generating the relative measure of distinctiveness comprises accounting for a set size of a measured set based on a measurement of distinctiveness for a comparison set and a set size for the comparison set, and normalizing the relative measurement of distinctiveness based on the set size of the measured set and the size for the comparison set.
-
-
17. A system for comparing the distinctiveness of a plurality of sets generated through interaction with a collection of information, the system comprising:
-
at least one processor operatively connected to a memory adapted to execute system components; a sampling component configured to sample from the collection of information to generate at least one set, wherein the sampling component is further configured to establish, automatically, at least one identifying characteristic within the at least one set; an analysis component configured to determine a statistical distribution of at least one identifying characteristic associated with the at least one set; a measurement component configured to determine a relative measurement of distinctiveness based on the statistical distributions of the at least one identifying characteristic associated with the at least one set and at least one other set, wherein the measurement component is further configured to account for a set size of a measured set based on a measurement of distinctiveness for a comparison set and a set size for the comparison set, and normalize the relative measurement of distinctiveness based on the set size of the measured set and the size for the comparison set. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification