Efficient retrieval algorithm by query term discrimination
First Claim
Patent Images
1. A method for use in information retrieval, the method comprising:
- for each of a plurality of terms, selecting a predetermined number of top scoring documents for the term to form a corresponding document set for the term;
receiving a query comprising a plurality of query terms;
ranking the plurality of query terms received in the query based at least in part on the corresponding document sets for each of the plurality of query terms, wherein the ranking comprises using an inverse document frequency algorithm;
selecting a number of ranked query terms from the plurality of query terms, wherein each selected ranked query term comprises its corresponding document set and each document in a respective document set comprises a document identification number;
forming a union set based on the document sets associated with the selected number of ranked query terms; and
for a document identification number in the union set, scanning a document set corresponding to an unselected query term for a matching document identification number, wherein the unselected query term is included in the query comprising the plurality of query terms.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and system for use in information retrieval includes, for each of a plurality of terms, selecting a predetermined number of top scoring documents for the term to form a corresponding document set for the term. When a plurality of terms are received, optionally as a query, the system ranks, using an inverse document frequency algorithm, the plurality of terms for importance based on the document sets for the plurality of terms. Then a number of ranked terms are selected based on importance and a union set is formed based on the document sets associated with the selected number of ranked terms.
-
Citations
19 Claims
-
1. A method for use in information retrieval, the method comprising:
-
for each of a plurality of terms, selecting a predetermined number of top scoring documents for the term to form a corresponding document set for the term; receiving a query comprising a plurality of query terms; ranking the plurality of query terms received in the query based at least in part on the corresponding document sets for each of the plurality of query terms, wherein the ranking comprises using an inverse document frequency algorithm; selecting a number of ranked query terms from the plurality of query terms, wherein each selected ranked query term comprises its corresponding document set and each document in a respective document set comprises a document identification number; forming a union set based on the document sets associated with the selected number of ranked query terms; and for a document identification number in the union set, scanning a document set corresponding to an unselected query term for a matching document identification number, wherein the unselected query term is included in the query comprising the plurality of query terms. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. An offline method for use in online information retrieval, the method comprising:
-
for each of a plurality of terms, selecting a predetermined number of top scoring documents for the term to form a corresponding document set for the term; and storing the document sets for subsequent access responsive to an online query; receiving a query comprising a plurality of query terms; ranking the plurality of query terms using an inverse document frequency algorithm; selecting at least two ranked query terms from the plurality of query terms, wherein each selected, ranked query term comprises a corresponding document set of top scoring documents, wherein the selecting the at least two ranked query terms leaves at least one unselected query term from the plurality of query terms; forming a union set based on the document sets associated with the at least two ranked query terms; merging the union set with a document set corresponding to the at least one unselected query term; and outputting results based on the merging. - View Dependent Claims (16, 17)
-
-
18. An online information retrieval method comprising:
-
receiving a query that comprises a plurality of terms; accessing documents or information about documents; based on the accessing, ranking the plurality of terms using an inverse document frequency algorithm; selecting a number of ranked terms, wherein each selected ranked term comprises a corresponding document set and each document in a respective document set comprises a document identification number; forming a union set based on the document sets associated with the selected number of ranked terms; and for a document identification number in the union set, scanning a document set corresponding to an unselected term for a matching document identification number, wherein the unselected term is included in the query comprising the plurality of terms. - View Dependent Claims (19)
-
Specification