System and method for measuring the quality of document sets
First Claim
1. A computer implemented method for optimizing results returned from interaction with a collection of information, the method comprising:
- establishing criteria associated with at least one operation on a collection of information, wherein the criteria is based, at least in part, on a comparison of a measurement of the distinctiveness of a set of results and a distinctiveness score threshold;
establishing a rule that comprises the criteria and the at least one operation;
determining the set of results from interaction with a collection of information, wherein the set of results comprises a plurality of documents retrieved from the collection of information, and wherein each of the plurality of documents further comprise a unit of storage of digital data;
determining, by a computer system, a measurement of distinctiveness for the set of results based on a statistical distribution of at least one identifying characteristic, wherein the distinctiveness of the set of results is measured in relation to the collection of information, and wherein the determining the measurement of distinctiveness comprises;
identifying the at least one identifying characteristic within an evaluated set, anddetermining a measure of distinctiveness of the evaluated set within the collection of information;
modifying, by the computer system, the set of results according to the at least one operation by applying the rule to the set of results in response to determining that the set of results matches the criteria based, at least in part, on the comparison of the measurement of distinctiveness for the set of results and the distinctiveness score threshold; and
outputting a modified result.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are described that calculate the interestingness of a set of one or more records in a database, either absolutely (i.e., compared to an overall collection of records) or relative to some other set of records. In one embodiment, the measure is a relative entropy value that has been normalized. Various applications of the measure are described in the context of an information retrieval system. These applications include, for example, guiding query interpretation, guiding view selection and summarization, intelligent ranges, event detection, concept triggers and interpreting user actions, hierarchy discovery, and adaptive data mining.
-
Citations
51 Claims
-
1. A computer implemented method for optimizing results returned from interaction with a collection of information, the method comprising:
-
establishing criteria associated with at least one operation on a collection of information, wherein the criteria is based, at least in part, on a comparison of a measurement of the distinctiveness of a set of results and a distinctiveness score threshold; establishing a rule that comprises the criteria and the at least one operation; determining the set of results from interaction with a collection of information, wherein the set of results comprises a plurality of documents retrieved from the collection of information, and wherein each of the plurality of documents further comprise a unit of storage of digital data; determining, by a computer system, a measurement of distinctiveness for the set of results based on a statistical distribution of at least one identifying characteristic, wherein the distinctiveness of the set of results is measured in relation to the collection of information, and wherein the determining the measurement of distinctiveness comprises; identifying the at least one identifying characteristic within an evaluated set, and determining a measure of distinctiveness of the evaluated set within the collection of information; modifying, by the computer system, the set of results according to the at least one operation by applying the rule to the set of results in response to determining that the set of results matches the criteria based, at least in part, on the comparison of the measurement of distinctiveness for the set of results and the distinctiveness score threshold; and outputting a modified result. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 49)
-
-
23. A method for information processing, comprising:
-
generating a salience measure that measures a quality of a set of documents, wherein the set of documents comprises a plurality of documents retrieved from an information retrieval system, based on their ambiguity relative to similarly sized sets of documents retrieved from the information retrieval system; and using a distinctiveness value generated from the salience measure to take a given action in the information retrieval system, wherein the distinctiveness value is measured in relation to the collection of information, and wherein the using the distinctiveness value comprises establishing criteria associated with the given action on a collection of information, wherein the criteria is based, at least in part, on a comparison of the distinctiveness value and a distinctiveness score threshold, establishing a rule that comprises the criteria and the at least one operation, and applying the rule to the set of documents to take the given action. - View Dependent Claims (24, 25)
-
-
26. A non-transitory computer-readable medium having computer-readable instructions stored thereon that define instructions that, as a result of being executed by a computer, instruct the computer to optimize results returned from interaction with a collection of information, the optimizing comprising:
-
establishing criteria associated with at least one operation on a collection of information, wherein the criteria is based, at least in part, on a comparison of a measurement of the distinctiveness of a set of results and a distinctiveness score threshold; establishing a rule that comprises the criteria and the at least one operation; determining the set of results from interaction with a collection of information, wherein the set of results comprises a plurality of documents retrieved from the collection of information, and wherein each of the plurality of documents further comprise a unit of storage of digital data; determining a measurement of distinctiveness for the set of results based on a statistical distribution of at least one identifying characteristic, wherein the distinctiveness of the set of results is measured in relation to the collection of information, and wherein the determining the measurement of distinctiveness comprises; identifying the at least one identifying characteristic within an evaluated set, and determining a measure of distinctiveness of the evaluated set within the collection of information; modifying the set of results according to the at least one operation by applying the rule to the set of results in response to determining that the set of results matches the criteria based, at least in part, on the comparison of the measurement of distinctiveness for the set of results and the distinctiveness score threshold; and outputting a modified result. - View Dependent Claims (50, 51)
-
-
27. A system for optimizing results returned from interaction with a collection of information, the system comprising:
-
at least one processor operatively connected to a memory, wherein the at least one processor when executing provides; a rules engine adapted to establish criteria associated with at least one operation on a collection of information, wherein execution of the operation is based on a comparison of a measurement of a distinctiveness of the set of results and a distinctiveness score threshold, and wherein the rules engine is further configured to establish a rule that comprises the criteria and the at least one operation; a measurement engine adapted to measure the distinctiveness of a set of results, wherein the distinctiveness of the set of results is measured in relation to the collection of information, and wherein the measurement engine is further adapted to; determine a measurement of distinctiveness for the set of results based on a statistical distribution of at least one identifying characteristic identify the at least one identifying characteristic within an evaluated set, and determine a measure of distinctiveness of the evaluated set within the collection of information; a retrieval engine adapted to return a set of results from a collection of information in response to interaction with the collection of information, wherein the set of results comprises a plurality of documents retrieved from the collection of information, wherein each of the plurality of documents further comprise a unit of storage of digital data; a modification engine adapted to modify the set of results according to the at least one operation by applying the rule to the set of results in response to a determination that the set of results matches the established criteria, wherein the determination is based, at least in part, on the comparison of the measurement of distinctiveness for the set of results and the distinctiveness score threshold; and an output engine adapted to output the modified result. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48)
-
Specification