Efficient retrieval algorithm by query term discrimination
First Claim
1. A system comprising:
- a ranking mechanism that ranks terms of a query based on one or more importance criteria into a set of ranked terms; and
a merge mechanism that searches an index of terms, at least one term having a set of one or more document identifiers associated with a score and identifying a data structure that contains the at least one term, the merge mechanism configured to;
search the index by choosing a topmost subset of the ranked terms based on a ranked importance of individual ranked terms; and
search for documents in a subset of rows of the index, the subset of rows selected by having a relationship with at least one document identifier corresponding to a term of the topmost subset of the ranked terms, wherein the merge mechanism is configured to search for the documents by locating one or more particular documents within at least one row of the subset of rows by jumping within the at least one row to one or more jump points corresponding to associated document identifiers in the at least one row, wherein the jumping is based on a comparison of one or more document identifiers within the at least one row and a reference document identifier within another row corresponding to the topmost subset of the ranked terms; and
at least one computing device configured to implement one or both of the ranking mechanism or the merge mechanism.
3 Assignments
0 Petitions
Accused Products
Abstract
Described is an efficient retrieval mechanism that quickly locates documents (e.g., corresponding to online advertisements) based on query term discrimination. A topmost subset (e.g., two) of search terms is selected according to their ranked importance, e.g., as ranked by inverted document frequency. The topmost terms are then used to narrow the number of rows of an inverted query index that are searched to find document identifiers and associated scores, such as computed offline by a BM25 algorithm. For example, for each document identifier of each important term, a fast search within each of the narrowed subset of rows (that also contain that document identifier) may be performed by comparing document identifiers to jump a pointer within each other row, followed by a binary search to locate a particular document. The scores of the set of particular documents may then be used to rank their relative importance for returning as results.
-
Citations
16 Claims
-
1. A system comprising:
-
a ranking mechanism that ranks terms of a query based on one or more importance criteria into a set of ranked terms; and a merge mechanism that searches an index of terms, at least one term having a set of one or more document identifiers associated with a score and identifying a data structure that contains the at least one term, the merge mechanism configured to; search the index by choosing a topmost subset of the ranked terms based on a ranked importance of individual ranked terms; and search for documents in a subset of rows of the index, the subset of rows selected by having a relationship with at least one document identifier corresponding to a term of the topmost subset of the ranked terms, wherein the merge mechanism is configured to search for the documents by locating one or more particular documents within at least one row of the subset of rows by jumping within the at least one row to one or more jump points corresponding to associated document identifiers in the at least one row, wherein the jumping is based on a comparison of one or more document identifiers within the at least one row and a reference document identifier within another row corresponding to the topmost subset of the ranked terms; and at least one computing device configured to implement one or both of the ranking mechanism or the merge mechanism. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-readable storage medium having computer executable-instructions, which when executed perform steps comprising:
-
(a) parsing a query into terms; (b) ranking the terms based on one or more importance criteria into a set of ranked terms; (c) selecting a subset of the set of ranked terms as a most important term set; (d) selecting as a selected row of an inverted query index a row that corresponds to the most important term set; (e) for a document identifier in the selected row, finding other rows in the inverted query index that have a matching document identifier; (f) moving a pointer in individual other rows based upon a comparison of the document identifier in the selected row with the individual other rows'"'"' matching document identifiers, followed by one or more binary searches, to locate particular documents in the other rows; and (g) using a score associated with the other rows'"'"' matching document identifiers to rank the particular documents for returning as a result set. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A method comprising:
-
parsing a query into terms; ranking the terms based on one or more importance criteria into a set of ranked terms; selecting a subset of the set of ranked terms as a term set; selecting, as a selected row, a row of an inverted query index that corresponds to the term set; for a document identifier in the selected row, finding other rows in the inverted query index that have a matching document identifier; moving a pointer in individual other rows based upon a comparison of the document identifier in the selected row with the individual other rows'"'"' matching document identifiers, followed by one or more binary searches, to locate particular documents in the other rows; and using a score associated with the other rows'"'"' matching document identifiers to rank the particular documents for returning as a result set, wherein at least one of parsing the query, ranking the terms, selecting the subset of the ranked terms, selecting the row, finding the other rows, moving the pointer, or using the score is implemented by at least one computing device. - View Dependent Claims (13, 14, 15, 16)
-
Specification