System and method for measuring the quality of document sets
First Claim
1. A computer implemented method for identifying interesting characteristics within a collection of information, the method comprising the acts of:
- analyzing, by a computer system a collection of information for at least one identifying characteristic;
measuring, by the computer system, distinctiveness of a set based on a statistical distribution of the at least one identifying characteristic, wherein the set comprises a plurality of elements within the collection of information, and wherein the plurality of elements are associated with the at least one identifying characteristic;
identifying, by the computer system, a variation in the measurement of distinctiveness of the set with respect to at least one additional dimension, wherein the act of identifying the variation in the measurement of distinctiveness of the set includes determining a change in the measurement of distinctiveness for the set relative to a range of values for the at least one additional dimension; and
grouping, by the computer system, at least one element of the collection of information based on the identified variation of the measurement of distinctiveness of the set.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are described that calculate the interestingness of a set of one or more records in a database, either absolutely (i.e., compared to an overall collection of records) or relative to some other set of records. In one embodiment, the measure is a relative entropy value that has been normalized. Various applications of the measure are described in the context of an information retrieval system. These applications include, for example, guiding query interpretation, guiding view selection and summarization, intelligent ranges, event detection, concept triggers and interpreting user actions, hierarchy discovery, and adaptive data mining.
-
Citations
48 Claims
-
1. A computer implemented method for identifying interesting characteristics within a collection of information, the method comprising the acts of:
-
analyzing, by a computer system a collection of information for at least one identifying characteristic; measuring, by the computer system, distinctiveness of a set based on a statistical distribution of the at least one identifying characteristic, wherein the set comprises a plurality of elements within the collection of information, and wherein the plurality of elements are associated with the at least one identifying characteristic; identifying, by the computer system, a variation in the measurement of distinctiveness of the set with respect to at least one additional dimension, wherein the act of identifying the variation in the measurement of distinctiveness of the set includes determining a change in the measurement of distinctiveness for the set relative to a range of values for the at least one additional dimension; and grouping, by the computer system, at least one element of the collection of information based on the identified variation of the measurement of distinctiveness of the set. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A non-transitory computer-readable medium having computer-readable instructions stored thereon that define instructions that, as a result of being executed by a computer, instruct the computer to perform a method for identifying interesting characteristics within a collection of information, the method comprising the acts of:
-
analyzing a collection of information for at least one identifying characteristic; measuring distinctiveness of a set based on a statistical distribution of the at least one identifying characteristic, wherein the set comprises a plurality of elements within the collection of information, and wherein the plurality of elements are associated with the at least one identifying characteristic; identifying a variation in the measurement of distinctiveness of the set with respect to at least one additional dimension, wherein the act of identifying the variation in the measurement of distinctiveness of the set includes determining a change in the measurement of distinctiveness for the set relative to a range of values for the at least one additional dimension; and grouping at least one element of the collection of information based on the identified variation of the measurement of distinctiveness.
-
-
26. A system for identifying interesting characteristics within a collection of information, the system comprising:
-
at least one processor operatively connected to a memory, the processor configured to execute system engines from the memory; an analysis engine adapted to determine at least one identifying characteristic within a collection of information; a measurement engine adapted to determine a measurement of distinctiveness of a set based on a statistical distribution of the at least one identifying characteristic, wherein the set comprises a plurality of elements within the collection of information, and wherein the plurality of elements are associated with the at least one identifying characteristic; a tracking engine adapted to evaluate the measurement of distinctiveness of the set with respect to an additional dimension, wherein the tracking engine is further adapted to identify a variation in the measurement of distinctiveness of the set relative to a range of values for the additional dimension; an organization engine adapted to organize at least one element of the collection of information based on the variation of the measurement of distinctiveness of the set over the range of values for the additional dimension. - View Dependent Claims (27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48)
-
Specification