System and method for querying multiple, distributed databases by selective sharing of local relative significance information for terms related to the query
First Claim
1. In an information retrieval apparatus including a database of documents, each document having a plurality of terms and a unique document indicia, the information retrieval apparatus further including a programmed processor adapted to receive a query containing at least one term and to compute in response to the query a document score for each of a selected plurality of documents, the document score being a function of the terms of the query, a computer memory readable by the processor and comprising:
- a first table that specifies for each document in the database a set of terms, each term in the set associated with a scalar measure of a contribution the term makes to the document score of the document, the scalar measure of contribution for a term being a function of an inverse document frequency of the term, and a frequency of the term within the document, the terms in each set ordered by the scalar measure of contribution, the sets ordered with respect to indicia of the documents, such that the information retrieval apparatus can determine for any document a selected number of terms that most significantly contribute to the document score.
7 Assignments
0 Petitions
Accused Products
Abstract
A system, method, and various software products provide improved information retrieval performance from multiple document databases by retrieving from the multiple document databases in response to a user query, a set of documents that globally satisfy the query, even though each database maintains independent document indices, term frequency information, and scoring functions. The global search result approximates, to any desired degree of error, the search results that would have been obtained had the multiple document databases been globally indexed. This is done by sharing at the time the query is executed, a small subset of information about the local relative significance of terms related to the user'"'"'s query, and from this information, determining a global relative significance of such terms. From the global relative significance, the individual document databases determine their query results, which are then merged into a global set of documents satisfying the query. The shared local relative significance information may be the inverse document frequency of each of a number of terms related to the query, or it may be the total frequency of each of such terms. The global relative significance may correspondingly be a global inverse document frequency, or a global term frequency from which the global inverse document frequency is calculated.
-
Citations
38 Claims
-
1. In an information retrieval apparatus including a database of documents, each document having a plurality of terms and a unique document indicia, the information retrieval apparatus further including a programmed processor adapted to receive a query containing at least one term and to compute in response to the query a document score for each of a selected plurality of documents, the document score being a function of the terms of the query, a computer memory readable by the processor and comprising:
a first table that specifies for each document in the database a set of terms, each term in the set associated with a scalar measure of a contribution the term makes to the document score of the document, the scalar measure of contribution for a term being a function of an inverse document frequency of the term, and a frequency of the term within the document, the terms in each set ordered by the scalar measure of contribution, the sets ordered with respect to indicia of the documents, such that the information retrieval apparatus can determine for any document a selected number of terms that most significantly contribute to the document score. - View Dependent Claims (2)
-
3. In an information retrieval apparatus including a database of documents, each document having a plurality of terms and a unique document indicia, the information retrieval apparatus further including a programmed processor adapted to receive a query containing at least one term and to compute in response to the query a document score for each of a selected plurality of documents, the document score being a function of the terms of the query, a computer memory readable by the processor and comprising:
a first table that specifies for each document in the database a set of terms, each term in the set associated with a frequency of the term in the document, the terms in each set ordered by a scalar measure of a contribution that each term makes to the document score of the document, the scalar measure of contribution for a term being a function of an inverse document frequency of the term, and a frequency of the term within the document, the sets ordered with respect to indicia of the documents, such that the information retrieval apparatus can determine for any document a selected number of terms that most significantly contribute to the document score by retrieving frequency information from the set of terms for the document and computing the contribution from the frequency information. - View Dependent Claims (4)
-
5. In a computer that is communicatively coupled to a plurality of databases, each database maintaining a set of documents arranged and indexed independently of the other databases, each term having a local relative significance within each database that is independent of the local relative significance of the term in the other databases, a computer memory readable by a processing device of the computer and configuring and controlling the processing device to perform the steps of:
-
requesting from each of the databases for each of a first list of terms, including terms of the query, the local relative significance of the term in the database; combining the first lists of terms from the databases, and determining for each of the terms a global relative significance of the term from the local relative significance of the term in each of the databases; receiving from each of the databases a local search result of documents locally satisfying the query executed on the database using the determined global relative significance of selected terms in place of the local relative significance of the selected terms in that database; and combining the local search results into a global search to produce the list of documents satisfying the query with respect to the multiple databases.
-
-
6. A computer implemented method of querying multiple databases with a query having at least one term to produce a list of documents from all of the databases that satisfy the query, each term having a total frequency in each database, each database having a total number of documents and an independently determined inverse document frequency value (IDF) for each unique term in the database, the IDF being a function of a number of documents in the database including the term, the method comprising:
-
requesting from each of the databases the total frequency of each of a first list of terms, and determining therefrom a global frequency for each of the first list of terms; determining a global number of documents in all of the databases from the total number of documents in each database; in each database computing a global IDF for each term of the query from the global frequency of each term, and the global number of documents, and executing the query to produce a list of documents in the database that satisfy the query using the global IDFs of the terms of the query; and
,merging the lists of documents from the database to produce a list of documents globally satisfying the query with respect to the multiple databases.
-
-
7. In a computer that is communicatively coupled to a plurality of databases, each database maintaining a set of documents arranged and indexed independently of the other databases, each term having a local relative significance within each database that is independent of the local relative significance of the term in the other databases, a computer memory readable by a processing device of the computer and configuring and controlling the processing device to perform the steps of:
-
requesting from each of the databases the local IDF of each of a first list of terms, and determining therefrom a global IDF for each of the first list of terms, the global IDF of a term being a function of a total number of documents in all of the databases, and total number of documents containing the term; receiving from each database a list of documents in the database that satisfy the query executed on the database while substituting the global IDFs of at least the terms of the query for the local IDFs of the terms in the database; and
,merging the lists of documents from the database to produce a consolidated list of documents globally satisfying the query with respect to the multiple databases.
-
-
8. A computer implemented method of querying multiple databases with a query having at least one term to produce a list of documents from all of the databases that satisfy the query, each database maintaining a set of documents arranged and indexed independently of the other databases, each term having a local relative significance within each database that is independent of the local relative significance of the term in the other databases, the method comprising,
requesting from each of the databases for each of a first list of terms, including terms of the query, the local relative significance of the term in the database; -
combining the first lists of terms from the databases, and determining for each the terms, a global relative significance of the term from the local relative significance of the term in each of the databases; in each database, after determining the global relative significance of each of the terms, executing the query on the database using the global relative significance of selected terms in place of the local relative significance of the selected terms in that database, to produce a local search result of documents locally satisfying the query; combining the local search results into a global search result to produce the list of documents globally satisfying the query with respect to the databases. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A computer implemented method of querying multiple databases with a query having at least one term to produce a list of documents from all of the databases that satisfy the query, each database having an independently determined inverse document frequency value (IDF) for each unique term in the database, the IDF being a function of a number of documents in the database including the term, the method comprising:
-
requesting from each of the databases the local IDF of each of a first list of terms, and determining therefrom a global IDF for each of the first list of terms, the global IDF of a term being a function of a total number of documents in all of the databases, and total number of documents containing the term; in each database, executing the query to produce a list of documents in the database that satisfy the query accounting for the global IDFs of the terms of the query; and
,merging the lists of documents from the database to produce a list of documents globally satisfying the query with respect to the multiple databases. - View Dependent Claims (27, 28, 29, 30)
-
-
31. A computer readable memory including an application programming interface for a database management system communicatively coupled to a client application and a database of documents, the memory storing:
-
a first method invocable by the client application that receives from the client application a query containing at least one term, and returns to the client application a list of terms, each of the list of terms contributing to a document score of at least one document satisfying the query; a second method invocable by the client application that receives from the client application a second list of terms, and returns to the client application for each of the second terms a local relative significance of the term; a third method invocable by the client application that receives from the client application a query containing at least one term, and that returns to the client application a set of document identifiers of documents satisfying the query executed using a global relative significance of selected terms in place of the local relative significance of the selected terms, the global relative significance of each selected term computed from the local relative significance of the term in each of a plurality of document databases. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38)
-
Specification