System and method for measuring the quality of document sets
First Claim
1. A computer implemented method for measuring the distinctiveness of a result generated from a collection of electronically stored information, wherein the result is comprised of elements associated with the electronically stored collection of information, the method comprising:
- accessing, by a computer system, a result generated from a collection of information;
establishing, automatically, by the computer system at least one identifying characteristic within the result;
analyzing, by the computer system, the result to automatically obtain a statistical distribution of the at least one identifying characteristic within the result;
generating, by the computer system, a measurement of distinctiveness for the result based on the statistical distribution of the at least one identifying characteristic within the result; and
comparing, by the computer system, the statistical distribution of the at least one identifying characteristic within the result against a baseline statistical distribution of the at least one identifying characteristic.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are described that calculate the interestingness of a set of one or more records in a database, either absolutely (i.e., compared to an overall collection of records) or relative to some other set of records. In one embodiment, the measure is a relative entropy value that has been normalized. Various applications of the measure are described in the context of an information retrieval system. These applications include, for example, guiding query interpretation, guiding view selection and summarization, intelligent ranges, event detection, concept triggers and interpreting user actions, hierarchy discovery, and adaptive data mining.
91 Citations
57 Claims
-
1. A computer implemented method for measuring the distinctiveness of a result generated from a collection of electronically stored information, wherein the result is comprised of elements associated with the electronically stored collection of information, the method comprising:
-
accessing, by a computer system, a result generated from a collection of information; establishing, automatically, by the computer system at least one identifying characteristic within the result; analyzing, by the computer system, the result to automatically obtain a statistical distribution of the at least one identifying characteristic within the result; generating, by the computer system, a measurement of distinctiveness for the result based on the statistical distribution of the at least one identifying characteristic within the result; and comparing, by the computer system, the statistical distribution of the at least one identifying characteristic within the result against a baseline statistical distribution of the at least one identifying characteristic. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A computer implemented method, operative in an information retrieval system, for improving a user'"'"'s interaction with data stored and accessible from the system, comprising:
-
relating, by the computer system, a first and second sets of items in the information retrieval system to generate a salience measure, wherein the act of relating includes acts of; identifying, automatically by the computer system, at least one identifying characteristic within the first and second sets of items, and calculating a statistical distribution of the at least one identifying characteristic to generate the salience measure; and using, by the computer system, the salience measure to guide a subsequent user interaction with the information retrieval system, wherein the act of using the salience measure includes acts of; generating, by the computer system, at least one refinement to present to the user during the user'"'"'s interaction with the data stored in the system, and presenting the at least one refinement in a user interface on a host computer system accessed by the user. - View Dependent Claims (30, 31, 32, 33, 34)
-
-
35. A non transitory computer-readable medium having computer-readable instructions stored thereon that define instructions that, as a result of being executed by a computer, instruct the computer to perform a method measuring the distinctiveness of a result generated from a collection of information, wherein the result is comprised of elements associated with the collection of information, the method comprising the acts of:
-
accessing a result generated from a collection of information; establishing, automatically, at least one identifying characteristic within the result; analyzing the result to automatically obtain a statistical distribution of at least one identifying characteristic within the result; generating a measurement of distinctiveness for the result based on the statistical distribution of the at least one identifying characteristic; and comparing, by the computer system, the statistical distribution of the at least one identifying characteristic within the result against a baseline statistical distribution of the at least one identifying characteristic.
-
-
36. A system for measuring the distinctiveness of a result generated from a collection of information, wherein the result is comprised of elements associated with the collection of information, the system comprising:
-
an analysis component adapted to automatically obtain a statistical distribution of at least one identifying characteristic within a result; a measurement component adapted to generate a measurement of distinctiveness for the result based on the statistical distribution of the at least one identifying characteristic; and a comparison component adapted to compare the measured statistical distribution against a baseline statistical distribution, wherein the comparison component is further adapted to determine that the compared statistical distribution exceeds a threshold for distinctiveness. - View Dependent Claims (37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57)
-
Specification