Efficient forward ranking in a search engine
First Claim
1. A computer-implemented method for generating an entry in a forward index, the method being performed by one or more computing devices including at least one processor and one or more computer storage media, the method comprising:
- receiving a document and a corresponding document identification;
receiving one or more static features associated with the document, wherein the one or more static features are unrelated to a search query;
parsing the document into tokens to form a token stream of the document;
determining positional information from a position in the document of one or more relevant data, wherein the positional information is a relative location of an atom in the documentidentifying one or more context streams corresponding to the document, wherein the one or more context streams represent individual sections of the document;
calculating stream offsets for the one or more context streams parsed from the document, wherein each stream offset provides a specific location of a context stream of the document;
generating the entry from the document identification, the stream offsets, the token stream of the document, the static features, and the positional information, wherein the document identification is a pointer to the starting point of the document that corresponds to the token stream of the document, the stream offsets, the static features, and the positional information for the document; and
storing the entry in the forward index.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and computer storage media are provided for generating entries for documents in a forward index. A document and its document identification are received, in addition to static features that are query-independent. The document is parsed into tokens to form a token stream corresponding to the document. Relevant data used to calculate rankings of document is identified and a position of the data is determined. The entry is then generated from the document identification, the token stream of the document, the static features, and the positional information of the relevant data. The entry is stored in the forward index.
-
Citations
14 Claims
-
1. A computer-implemented method for generating an entry in a forward index, the method being performed by one or more computing devices including at least one processor and one or more computer storage media, the method comprising:
-
receiving a document and a corresponding document identification; receiving one or more static features associated with the document, wherein the one or more static features are unrelated to a search query; parsing the document into tokens to form a token stream of the document; determining positional information from a position in the document of one or more relevant data, wherein the positional information is a relative location of an atom in the document identifying one or more context streams corresponding to the document, wherein the one or more context streams represent individual sections of the document; calculating stream offsets for the one or more context streams parsed from the document, wherein each stream offset provides a specific location of a context stream of the document; generating the entry from the document identification, the stream offsets, the token stream of the document, the static features, and the positional information, wherein the document identification is a pointer to the starting point of the document that corresponds to the token stream of the document, the stream offsets, the static features, and the positional information for the document; and storing the entry in the forward index. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. One or more computer storage media having stored thereon a data structure for storing data representing a forward index that is used to rank search results based on a search query, the data structure comprising:
-
a first data field containing document identification information that identifies a particular document; a second data field containing a compressed token stream of the document, wherein the compressed token stream is a second token stream based on a first token stream, the compressed token stream is a compressed version of the token stream comprising context streams of the document selected from the first token stream, wherein the one or more context streams represent individual portions of the document; a third data field containing document-specific data representing static features of the document that are used to rank the document when a query is received; and a fourth data field containing positional information that indicates the position of one or more relevant data associated with the document that is frequently used to calculate a ranking of the document. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A computer-implemented method for generating an entry in a forward index, the method being performed by one or more computing devices including at least one processor and one or more computer storage media, the method comprising:
-
receiving a document and a corresponding document identification, wherein the document identification is configured to point to a starting location of the document in the forward index; receiving one or more static features associated with the document, wherein the one or more static features are supplementary features associated with document and are unrelated to a potential search query; parsing the document into tokens to form a token stream of the document; creating a compressed token stream based on the formed token stream, wherein the compressed token stream is a second token stream based on the token stream, the compressed token stream is a compressed version of the token stream comprising context streams of the document selected from the first token stream, wherein the one or more context streams represent individual sections of the document; upon creating the compressed token stream, calculating stream offsets for the context streams in the compressed token, wherein each stream offset provides a specific location of a context stream of the document; determining positional information from a position in the document of one or more relevant data, wherein the positional information is the relative location of an atom in the document generating the entry from the document identification, the token stream of the document, the compressed token stream, the stream offsets, the static features, and the positional information, wherein the document identification is a pointer to the starting location of the document that corresponds to at least the compressed token stream; and storing the entry in the forward index. - View Dependent Claims (13, 14)
-
Specification