System and method for measuring the quality of document sets
First Claim
1. A computer implemented method for presenting a view of a result obtained from interaction with a collection of information, the method comprising:
- accessing, by a computer system, at least one result set in response to interaction with a collection of information;
determining, by the computer system, at least one identifying characteristic within the at least one result set returned from the interaction with a collection of information;
determining, by the computer system, a statistical distribution of the at least one identifying characteristic within the at least one result set;
generating, by the computer system, a measurement of distinctiveness for the at least one result set based, at least in part, on the statistical distribution of the at least one identifying characteristic within the at least one result set, wherein the distinctiveness of the at least one result set is measured in relation to the collection of information, and wherein the generating comprises determining the measurement of distinctiveness from a statistical distribution of at least one identifying characteristic in the at least one result set against a baseline statistical distribution, and wherein the baseline statistical distribution is determined at a time of the interaction or thereafter;
modifying, by the computer system, the at least one result set based at least in part on the measurement of distinctiveness for the at least one result set; and
returning the modified result set;
wherein the modifying the at least one result set comprises determining a contribution of the at least one identifying characteristic to the measurement of distinctiveness and highlighting the at least one identifying characteristic within the at least one result set based on the determined contribution.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are described that calculate the interestingness of a set of one or more records in a database, either absolutely (i.e., compared to an overall collection of records) or relative to some other set of records. In one embodiment, the measure is a relative entropy value that has been normalized. Various applications of the measure are described in the context of an information retrieval system. These applications include, for example, guiding query interpretation, guiding view selection and summarization, intelligent ranges, event detection, concept triggers and interpreting user actions, hierarchy discovery, and adaptive data mining.
-
Citations
47 Claims
-
1. A computer implemented method for presenting a view of a result obtained from interaction with a collection of information, the method comprising:
-
accessing, by a computer system, at least one result set in response to interaction with a collection of information; determining, by the computer system, at least one identifying characteristic within the at least one result set returned from the interaction with a collection of information; determining, by the computer system, a statistical distribution of the at least one identifying characteristic within the at least one result set; generating, by the computer system, a measurement of distinctiveness for the at least one result set based, at least in part, on the statistical distribution of the at least one identifying characteristic within the at least one result set, wherein the distinctiveness of the at least one result set is measured in relation to the collection of information, and wherein the generating comprises determining the measurement of distinctiveness from a statistical distribution of at least one identifying characteristic in the at least one result set against a baseline statistical distribution, and wherein the baseline statistical distribution is determined at a time of the interaction or thereafter; modifying, by the computer system, the at least one result set based at least in part on the measurement of distinctiveness for the at least one result set; and returning the modified result set; wherein the modifying the at least one result set comprises determining a contribution of the at least one identifying characteristic to the measurement of distinctiveness and highlighting the at least one identifying characteristic within the at least one result set based on the determined contribution. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 44, 45)
-
-
22. A non-transitory computer-readable medium having computer-readable instructions stored thereon that define instructions that, as a result of being executed by a computer, present a view of a result returned from a collection of information:
-
optimizing a view of at least one result set, wherein the optimizing comprises; determining at least one identifying characteristic within the at least one result set returned in response to interaction with a collection of information; determining a statistical distribution of the at least one identifying characteristic within the at least one result set; measuring the distinctiveness of the at least one result set based, at least in part, on the statistical distribution of the at least one identifying characteristic within the at least one result set, wherein the distinctiveness of the at least one result set is measured in relation to the collection of information, and wherein the generating comprises determining the measurement of distinctiveness from a statistical distribution of at least one identifying characteristic in the at least one result set against a baseline statistical distribution, and wherein the baseline statistical distribution is determined at a time of the interaction or thereafter; modifying the at least one result set based at least in part on the measure of distinctiveness; and outputting the modified result; wherein the modifying the at least one result set comprises determining a contribution of the at least one identifying characteristic to the measure of distinctiveness and highlighting the at least one identifying characteristic within the at least one result set based on the determined contribution. - View Dependent Claims (46, 47)
-
-
23. A system for presenting an improved view of a result returned from a collection of information, the system comprising:
-
at least one processor operatively connected to a memory, the processor when executing provides; an analysis engine adapted to determine at least one identifying characteristic within at least one result set returned in response to interaction with a collection of information; a distinctiveness engine adapted to determine a measurement of the distinctiveness of at least one result set based, at least in part, on a statistical distribution of the at least one identifying characteristic within the at least one result set, wherein the distinctiveness of the at least one result set is measured in relation to the collection of information, and wherein the generating comprises determining the measurement of distinctiveness from a statistical distribution of at least one identifying characteristic in the at least one result set against a baseline statistical distribution, and wherein the baseline statistical distribution is determined at a time of the interaction or thereafter; and a summarization engine adapted to modify the at least one result set based at least in part on the determined measurement of the distinctiveness of the result set; wherein the summarization engine is further configured to determine a contribution of the at least one identifying characteristic to the measurement of distinctiveness and highlight the at least one identifying characteristic within the at least one result set based on the determined contribution. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43)
-
Specification