Efficient Retrieval Algorithm by Query Term Discrimination
First Claim
1. A method for use in information retrieval, the method comprising:
- for each of a plurality of terms, selecting a predetermined number of top scoring documents for the term to form a corresponding document set for the term;
receiving a plurality of terms, optionally as a query;
ranking the plurality of terms for importance based at least in part on the document sets for the plurality of terms wherein the ranking comprises using an inverse document frequency algorithm;
selecting a number of ranked terms based on importance wherein each selected, ranked term comprises its corresponding document set wherein each document in a respective document set comprises a document identification number;
forming a union set based on the document sets associated with the selected number of ranked terms; and
for a document identification number in the union set, scanning a document set corresponding to an unselected term for a matching document identification number.
3 Assignments
0 Petitions
Accused Products
Abstract
An exemplary method for use in information retrieval includes, for each of a plurality of terms, selecting a predetermined number of top scoring documents for the term to form a corresponding document set for the term; receiving a plurality of terms, optionally as a query; ranking the plurality of terms for importance based at least in part on the document sets for the plurality of terms where the ranking comprises using an inverse document frequency algorithm; selecting a number of ranked terms based on importance where each selected, ranked term comprises its corresponding document set wherein each document in a respective document set comprises a document identification number; forming a union set based on the document sets associated with the selected number of ranked terms; and, for a document identification number in the union set, scanning a document set corresponding to an unselected term for a matching document identification number. Various other exemplary systems, methods, devices, etc. are also disclosed.
33 Citations
20 Claims
-
1. A method for use in information retrieval, the method comprising:
-
for each of a plurality of terms, selecting a predetermined number of top scoring documents for the term to form a corresponding document set for the term; receiving a plurality of terms, optionally as a query; ranking the plurality of terms for importance based at least in part on the document sets for the plurality of terms wherein the ranking comprises using an inverse document frequency algorithm; selecting a number of ranked terms based on importance wherein each selected, ranked term comprises its corresponding document set wherein each document in a respective document set comprises a document identification number; forming a union set based on the document sets associated with the selected number of ranked terms; and for a document identification number in the union set, scanning a document set corresponding to an unselected term for a matching document identification number. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. An offline method for use in online information retrieval, the method comprising:
-
for each of a plurality of terms, selecting a predetermined number of top scoring documents for the term to form a corresponding document set for the term; and storing the document sets for subsequent access responsive to an online query. - View Dependent Claims (16, 17)
-
-
18. An online information retrieval method comprising:
-
receiving a query that comprises a plurality of terms; accessing documents or information about documents; based on the accessing, ranking the plurality of terms for importance; selecting a number of ranked terms based on importance wherein each selected, ranked term comprises a corresponding document set wherein each document in a respective document set comprises a document identification number; forming a union set based on the document sets associated with the selected number of ranked terms; and for a document identification number in the union set, scanning a document set corresponding to an unselected term for a matching document identification number. - View Dependent Claims (19, 20)
-
Specification