System and method for measuring the quality of document sets
First Claim
Patent Images
1. A method for organizing a database, the method comprising:
- accessing, by a computer system, a set comprising a plurality of database records stored in a database on a computer system, wherein the plurality of database records comprise at least one facet, and wherein the database is organized based on an existing structure;
establishing, by the computer system, at least one identifying characteristic for the set comprising the plurality of database records and a respective at least one facet;
analyzing, by the computer system, the database to determine a statistical distribution of the at least one identifying characteristic;
generating, by the computer system, a measurement of distinctiveness for the set, based on the statistical distribution of the at least one identifying characteristic;
identifying, by the computer system, at least one distinctive group of database records within the database based on the measurement of distinctiveness;
generating, by the computer system, a descriptor associated with the identified at least one distinctive group of database records; and
organizing, by the computer system, the database based on the descriptor, wherein the act of organizing the database based on the descriptor includes modifying the existing structure of the database to include the organization of the database based on the descriptor.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are described that calculate the interestingness of a set of one or more records in a database, either absolutely (i.e., compared to an overall collection of records) or relative to some other set of records. In one embodiment, the measure is a relative entropy value that has been normalized. Various applications of the measure are described in the context of an information retrieval system. These applications include, for example, guiding query interpretation, guiding view selection and summarization, intelligent ranges, event detection, concept triggers and interpreting user actions, hierarchy discovery, and adaptive data mining.
-
Citations
47 Claims
-
1. A method for organizing a database, the method comprising:
-
accessing, by a computer system, a set comprising a plurality of database records stored in a database on a computer system, wherein the plurality of database records comprise at least one facet, and wherein the database is organized based on an existing structure; establishing, by the computer system, at least one identifying characteristic for the set comprising the plurality of database records and a respective at least one facet; analyzing, by the computer system, the database to determine a statistical distribution of the at least one identifying characteristic; generating, by the computer system, a measurement of distinctiveness for the set, based on the statistical distribution of the at least one identifying characteristic; identifying, by the computer system, at least one distinctive group of database records within the database based on the measurement of distinctiveness; generating, by the computer system, a descriptor associated with the identified at least one distinctive group of database records; and organizing, by the computer system, the database based on the descriptor, wherein the act of organizing the database based on the descriptor includes modifying the existing structure of the database to include the organization of the database based on the descriptor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A non-transitory computer-readable medium having computer-readable instructions stored thereon that define instructions that, as a result of being executed by a computer, instruct the computer to perform a method for organizing a database, the method comprising the acts of:
-
accessing a set comprising a plurality of database records stored in a database on a computer system, wherein the plurality of database records comprise at least one facet, and wherein the database is organized based on an existing structure; establishing at least one identifying characteristic for the set comprising the plurality of database records and a respective at least one facet; analyzing the database to determine a statistical distribution of the at least one identifying characteristic; generating a measurement of distinctiveness for the set, based on the statistical distribution of the at least one identifying characteristic; identifying at least one distinctive group of database records within the database based on the measurement of distinctiveness; generating a descriptor associated with the identified at least one distinctive group of database records; and organizing the database based on the descriptor, wherein the act of organizing the database based on the descriptor includes modifying the existing structure of the database to include the organization of the database based on the descriptor.
-
-
27. A system for organizing a database, the system comprising:
-
at least one processor operatively connected to a memory for executing system components; an access component adapted to access a set comprising a plurality of database records in a database stored in the memory, wherein the plurality of database records comprise at least one facet, and wherein the database is organized based on an existing structure; an analysis component adapted to; establish at least one identifying characteristic for the set comprising the plurality of database records and a respective at least one facet, determine a statistical distribution of the at least one identifying characteristic, and determine a measurement of distinctiveness for the set based on a the statistical distribution of at least one identifying characteristic; a generation component adapted to generate a descriptor for at least one distinctive group of database records of the database based on the measurement of distinctiveness; and an organization component adapted to organize the database based on the descriptor, wherein the organization component is further adapted to modify the existing structure of the database based on the descriptor. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47)
-
Specification