Efficient retrieval algorithm by query term discrimination
First Claim
1. In a computing environment, a method comprising:
- ranking search terms of a query based on one or more importance criterion;
choosing a subset of the search terms based on the ranking; and
locating documents in an index that is indexed by terms including terms that correspond to the search terms, including using the subset to determine a reduced number of rows in the index to search for relevant documents.
3 Assignments
0 Petitions
Accused Products
Abstract
Described is an efficient retrieval mechanism that quickly locates documents (e.g., corresponding to online advertisements) based on query term discrimination. A topmost subset (e.g., two) of search terms is selected according to their ranked importance, e.g., as ranked by inverted document frequency. The topmost terms are then used to narrow the number of rows of an inverted query index that are searched to find document identifiers and associated scores, such as computed offline by a BM25 algorithm. For example, for each document identifier of each important term, a fast search within each of the narrowed subset of rows (that also contain that document identifier) may be performed by comparing document identifiers to jump a pointer within each other row, followed by a binary search to locate a particular document. The scores of the set of particular documents may then be used to rank their relative importance for returning as results.
-
Citations
20 Claims
-
1. In a computing environment, a method comprising:
-
ranking search terms of a query based on one or more importance criterion; choosing a subset of the search terms based on the ranking; and locating documents in an index that is indexed by terms including terms that correspond to the search terms, including using the subset to determine a reduced number of rows in the index to search for relevant documents. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. In a computing environment, a system comprising:
-
a ranking mechanism that ranks terms of an incoming query based on one or more importance criteria into a set of ranked terms; and a merge mechanism that searches an index of terms, each term having a set of one or more document identifiers identifying a data structure that contains that term, each document identifier associated with a score, the merge mechanism configured to search the index by choosing a topmost subset of the ranked terms based on their ranked importance, and to search for documents in a subset of rows of the index in which the subset of the rows is selected by having a relationship with at least one document identifier corresponding to a term of the topmost subset. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A computer-readable medium having computer-executable instructions, which when executed perform steps, comprising:
-
(a) parsing a query into terms; (b) ranking the terms based on one or more importance criteria into a set of ranked terms; (c) selecting a subset of the ranked terms as a most important term set; (d) selecting as a selected row of an inverted query index a row that corresponds to the most important term set; (e) for each document identifier in each selected row, finding other rows in the index that have a matching document identifier; (f) moving a pointer in each of the other rows based upon a comparison of the document identifier in the selected row with the document identifier in each of the other rows, followed by a binary search, to locate a particular document in each other row; and (g) using a score associated with each particular document identifier to rank the corresponding documents for returning as a result set. - View Dependent Claims (17, 18, 19, 20)
-
Specification