System and method for concept visualization
First Claim
1. A computer implemented method for identifying interesting characteristics within a collection of information, the method comprising:
- analyzing one or more groups from the collection of information;
determining, automatically, at least one identifying characteristic within the one or more groups;
measuring, by a computer system, distinctiveness of the one or more groups based on a statistical distribution of the at least one identifying characteristic;
normalizing the measurement of distinctiveness to account for a size of a group of the one or more groups and a size of at least one other group by determining an amount by which the measurement of distinctiveness exceeds a mean score for one or more groups of a similar or identical size; and
organizing, by the computer system, the one or more groups based on the measurement of distinctiveness.
0 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are described that calculate the interestingness of a set of one or more records in a database, either absolutely (i.e., compared to an overall collection of records) or relative to some other set of records. In one embodiment, the measure is a relative entropy value that has been normalized. Various applications of the measure are described in the context of an information retrieval system. These applications include, for example, guiding query interpretation, guiding view selection and summarization, intelligent ranges, event detection, concept triggers and interpreting user actions, hierarchy discovery, and adaptive data mining.
54 Citations
30 Claims
-
1. A computer implemented method for identifying interesting characteristics within a collection of information, the method comprising:
-
analyzing one or more groups from the collection of information; determining, automatically, at least one identifying characteristic within the one or more groups; measuring, by a computer system, distinctiveness of the one or more groups based on a statistical distribution of the at least one identifying characteristic; normalizing the measurement of distinctiveness to account for a size of a group of the one or more groups and a size of at least one other group by determining an amount by which the measurement of distinctiveness exceeds a mean score for one or more groups of a similar or identical size; and organizing, by the computer system, the one or more groups based on the measurement of distinctiveness. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer implemented method for presenting interesting characteristics associated with a collection of information, the method comprising:
-
accepting one or more queries against the collection of information, wherein any execution of the one or more queries generates at least one result, the result of each query including a result set; measuring, by a computer system, distinctiveness of one or more result sets based on a statistical distribution of at least one identifying characteristic associated with the contents of a result set; normalizing the measured distinctiveness of the one or more result sets to account for a size of a result set of the one or more results sets and a size of at least one other result set by determining an amount by which the measurement of distinctiveness exceeds a mean score for one or more result sets of a similar or identical size; and communicating to a user interface a display of at least one characteristic of the one or more result sets and the measured distinctiveness of the one or more result sets. - View Dependent Claims (12, 13, 14)
-
-
15. A system for identifying interesting characteristics within a collection of information, the system comprising:
-
at least one processor operatively connected to a memory, the processor configured to execute system engines from the memory; an analysis engine configured to analyze one or more groups from the collection of information, wherein the analysis engine is further configured to determine automatically at least one identifying characteristic within a collection of information; a measurement engine configured determine a measurement of distinctiveness one or more groups from the collection of information based on a statistical distribution of the at least one identifying characteristic; a normalization engine configured to normalize the measurement of distinctiveness to account for a size of a group of the one or more groups and a size of at least one other group by determining an amount by which the measurement of distinctiveness exceeds a mean score for one or more groups of a similar or identical size; and an organization engine configured organize the one or more groups based on the measurement of distinctiveness. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A system for presenting interesting characteristics associated with a collection of information, the system comprising:
-
at least one processor operatively connected to a memory, wherein the processor is configured to execute system engines from the memory; a query engine configured to accept one or more queries against the collection of information and generate a result, wherein the result of each query includes a result set; a measurement engine configured to measure distinctiveness of one or more result sets based on a statistical distribution of at least one identifying characteristic associated with the contents of a result set; a normalization engine configured to normalize the measured distinctiveness of the one or more result sets to account for a size of a result set of the one or more results sets and a size of at least one other result set by determining an amount by which the measurement of distinctiveness exceeds a mean score for one or more result sets of a similar or identical size; and a user interface engine configured to display at least one characteristic of the one or more result sets and the measured distinctiveness of the one or more result sets. - View Dependent Claims (26, 27, 28)
-
-
29. A non-transitory computer-readable medium having instructions stored thereon that, when executed by a processor, cause the processor to identify interesting characteristics within a collection of information, the identifying comprising:
-
analyzing one or more groups from the collection of information; determining, automatically, at least one identifying characteristic within the one or more groups; measuring, by a computer system, distinctiveness of the one or more groups based on a statistical distribution of the at least one identifying characteristic; normalizing the measurement of distinctiveness to account for a size of a group of the one or more groups and a size of at least one other group by determining an amount by which the measurement of distinctiveness exceeds a mean score for one or more groups of a similar or identical size; and organizing, by the computer system, the one or more groups based on the measurement of distinctiveness.
-
-
30. A non-transitory computer-readable medium having instructions stored thereon that, when executed by a processor, cause the processor to present interesting characteristics associated with a collection of information, the presenting comprising:
-
accepting one or more queries against the collection of information, wherein any execution of the one or more queries generates at least one result, the result of each query including a result set; measuring, by a computer system, distinctiveness of one or more result sets based on a statistical distribution of at least one identifying characteristic associated with the contents of a result set; normalizing the measured distinctiveness of the one or more result sets to account for a size of a result set of the one or more results sets and a size of at least one other result set by determining an amount by which the measurement of distinctiveness exceeds a mean score for one or more result sets of a similar or identical size; and communicating to a user interface a display of at least one characteristic of the one or more result sets and the measured distinctiveness of the one or more result sets.
-
Specification