System and method for measuring the quality of document sets
First Claim
1. In an information retrieval system, a computer-implemented method for information processing, comprising:
- accessing, by a computer system, a set of documents obtained from the information retrieval system;
establishing, automatically by the computer system, at least one identifying characteristic within the set of documents;
analyzing, by the computer system, the set of documents to obtain a statistical distribution based on values associated with the set of documents, the set of documents having a given size;
computing a value of a function that measures distinctiveness of the obtained statistical distribution relative to a baseline statistical distribution of values associated with a baseline set of documents;
normalizing the value relative to a distribution of values of the function that measures distinctiveness over a space of document sets, wherein a respective value of the function that measures distinctiveness corresponds to a respective document set within the space of document sets, wherein each document set in the space has a size that is comparable to the given size, and the act of normalizing the value includes an act of performing a computation on the value that accounts for the given size of the set of documents; and
outputting a response derived from the normalized value.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are described that calculate the interestingness of a set of one or more records in a database, either absolutely (i.e., compared to an overall collection of records) or relative to some other set of records. In one embodiment, the measure is a relative entropy value that has been normalized. Various applications of the measure are described in the context of an information retrieval system. These applications include, for example, guiding query interpretation, guiding view selection and summarization, intelligent ranges, event detection, concept triggers and interpreting user actions, hierarchy discovery, and adaptive data mining.
-
Citations
52 Claims
-
1. In an information retrieval system, a computer-implemented method for information processing, comprising:
-
accessing, by a computer system, a set of documents obtained from the information retrieval system; establishing, automatically by the computer system, at least one identifying characteristic within the set of documents; analyzing, by the computer system, the set of documents to obtain a statistical distribution based on values associated with the set of documents, the set of documents having a given size; computing a value of a function that measures distinctiveness of the obtained statistical distribution relative to a baseline statistical distribution of values associated with a baseline set of documents; normalizing the value relative to a distribution of values of the function that measures distinctiveness over a space of document sets, wherein a respective value of the function that measures distinctiveness corresponds to a respective document set within the space of document sets, wherein each document set in the space has a size that is comparable to the given size, and the act of normalizing the value includes an act of performing a computation on the value that accounts for the given size of the set of documents; and outputting a response derived from the normalized value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A system for information processing, the system comprising:
-
at least one processor operatively connected to a memory adapted to execute system components, and wherein the system further comprises; an access component adapted to access a set of documents obtained from an information retrieval system, wherein the access component is further configured to establish, automatically, at least one identifying characteristic within the set of documents; an analysis component adapted to obtain a statistical distribution based on values associated with the set of documents, the set of documents having a given size; a measurement component adapted to compute value of a function that measures distinctiveness of the obtained statistical distribution relative to a baseline statistical distribution of values associated with a baseline set of documents; a normalization component adapted to normalize the value relative to a distribution of values of the function that measures distinctiveness over a space of document sets, wherein a respective value of the function that measures distinctiveness corresponds to a respective document set within the space of document sets, wherein each document set in the space has a size that is comparable to the given size, wherein the normalization component is further adapted to perform a computation on the value that accounts for the given size of the set of documents; and an output component adapted to generate a response derived from the normalized value. - View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51)
-
-
52. A non-transitory computer-readable medium having computer-readable instructions stored thereon that define instructions that, as a result of being executed by a computer, instruct the computer to perform a method for information processing, the method comprising:
-
accessing a set of documents obtained from the information retrieval system; establishing, automatically, at least one identifying characteristic within the set of documents; analyzing the set of documents to obtain a statistical distribution based on values associated with the set of documents, the set of documents having a given size; computing a value of a function that measures distinctiveness of the obtained statistical distribution relative to a baseline statistical distribution of values associated with a baseline set of documents; normalizing the value relative to a distribution of values of the function that measures distinctiveness over a space of document sets, wherein a respective value of the function that measures distinctiveness corresponds to a respective document set within the space of document sets, wherein each document set in the space has a size that is comparable to the given size, and the act of normalizing the value includes an act of performing a computation on the value that accounts for the given size of the set of documents; and outputting a response derived from the normalized value.
-
Specification