Efficient forward ranking in a search engine
First Claim
1. A computer-implemented method for using a forward index to extract information for ranking documents based on a search query, the method comprising:
- receiving a search query;
parsing the search query to identify one or more atoms;
creating a token map of query tokens using the one or more atoms parsed from the search query;
for a first document, identifying, in a first entry of a forward index, document tokens in a token stream corresponding to the first document that match the query tokens in the token map;
for the document tokens that match the query tokens based on the one or more atoms, updating a token position data structure, wherein the token position data structure includes token positions in the token stream corresponding to the first document of each of the document tokens that match the query tokens, wherein the token position data structure stores the token positions in the token stream in association with the one or more atoms in the search query;
accessing the updated token position data structure to extract ranking information from the first entry of the forward index, wherein the ranking information is extracted from the first entry of the forward index via the updated token position data structure based on the token positions in the token stream; and
executing ranking calculations for documents associated with the search query based on the ranking information extracted from the forward index via the updated token position data structure.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and computer storage media are provided for generating entries for documents in a forward index. A document and its document identification are received, in addition to static features that are query-independent. The document is parsed into tokens to form a token stream corresponding to the document. Relevant data used to calculate rankings of document is identified and a position of the data is determined. The entry is then generated from the document identification, the token stream of the document, the static features, and the positional information of the relevant data. The entry is stored in the forward index.
105 Citations
20 Claims
-
1. A computer-implemented method for using a forward index to extract information for ranking documents based on a search query, the method comprising:
-
receiving a search query; parsing the search query to identify one or more atoms; creating a token map of query tokens using the one or more atoms parsed from the search query; for a first document, identifying, in a first entry of a forward index, document tokens in a token stream corresponding to the first document that match the query tokens in the token map; for the document tokens that match the query tokens based on the one or more atoms, updating a token position data structure, wherein the token position data structure includes token positions in the token stream corresponding to the first document of each of the document tokens that match the query tokens, wherein the token position data structure stores the token positions in the token stream in association with the one or more atoms in the search query; accessing the updated token position data structure to extract ranking information from the first entry of the forward index, wherein the ranking information is extracted from the first entry of the forward index via the updated token position data structure based on the token positions in the token stream; and executing ranking calculations for documents associated with the search query based on the ranking information extracted from the forward index via the updated token position data structure. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. One or more hardware computer-storage media storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform a method for using a forward index to extract information for ranking documents based on a search query, the method comprising:
-
receiving a search query; parsing the search query to identify one or more atoms; creating a token map of query tokens using the one or more atoms parsed from the search query; for a first document, identifying, in a first entry of a forward index, document tokens in a token stream corresponding to the first document that match the query tokens in the token map; for the document tokens that match the query tokens based on the one or more atoms, updating a token position data structure, wherein the token position data structure includes token positions in the token stream corresponding to the first document of each of the document tokens that match the query tokens, wherein the token position data structure stores the token positions in the token stream in association with the one or more atoms in the search query; accessing the updated token position data structure to extract ranking information from the first entry of the forward index, wherein the ranking information is extracted from the first entry of the forward index via the updated token position data structure based on the token positions in the token stream; and executing ranking calculations for documents associated with the search query based on the ranking information extracted from the forward index via the updated token position data structure. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A system for using a forward index to extract information for ranking documents based on a search query, the system comprising:
-
an index generator having one or more hardware processors and one or more hardware computer-storage media; and a forward index coupled with the index generator, wherein the index generator is configured for; receiving a search query; parsing the search query to identify one or more atoms; creating a token map of query tokens using the one or more atoms parsed from the search query; for a first document, identifying, in a first entry of a forward index, document tokens in a token stream corresponding to the first document that match the query tokens in the token map; for the document tokens that match the query tokens based on the one or more atoms, updating a token position data structure wherein the token position data structure includes token positions in the token stream corresponding to the first document of each of the document tokens that match the query tokens, wherein the token position data structure stores the token positions in the token stream in association with the one or more atoms in the search query; accessing the updated token position data structure to extract ranking information from the first entry of the forward index, wherein the ranking information is extracted from the first entry of the forward index via the updated token position data structure based on the token positions in the token stream; and executing ranking calculations for documents associated with the search query based on the ranking information extracted from the forward index via the updated token position data structure. - View Dependent Claims (19, 20)
-
Specification