Architecture for an indexer with fixed width sort and variable width sort
First Claim
Patent Images
1. A method for indexing data, comprising:
- receiving different sections of a document at different times, wherein the different sections include a context section and an anchor text section;
generating sort keys for each token of multiple tokens in the different sections, wherein the sort keys are used to create posting lists that simultaneously are ordered by token and by document identifier for each token, wherein a sort key includes a token type, a token, a document identifier, a document section, and an offset in a document; and
for each of the multiple tokens;
determining if a data field associated with the token is a fixed width or a variable width, wherein the data field is fixed width for storing document content and variable width for storing document metadata;
when the data field is a fixed width, designating the token as one for which fixed width sort is to be performed; and
when the data field is a variable length, designating the token as one for which a variable width sort is to be performed.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed is a technique for indexing data. A token is received. It is determined whether a data field associated with the token is a fixed width. When the data field is a fixed width, the token is designated as one for which fixed width sort is to be performed. When the data field is a variable length, the token is designated as one for which a variable width sort is to be performed.
198 Citations
18 Claims
-
1. A method for indexing data, comprising:
-
receiving different sections of a document at different times, wherein the different sections include a context section and an anchor text section; generating sort keys for each token of multiple tokens in the different sections, wherein the sort keys are used to create posting lists that simultaneously are ordered by token and by document identifier for each token, wherein a sort key includes a token type, a token, a document identifier, a document section, and an offset in a document; and for each of the multiple tokens; determining if a data field associated with the token is a fixed width or a variable width, wherein the data field is fixed width for storing document content and variable width for storing document metadata; when the data field is a fixed width, designating the token as one for which fixed width sort is to be performed; and when the data field is a variable length, designating the token as one for which a variable width sort is to be performed. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer system, including logic for indexing data, comprising:
-
a processor; and
receiving a token;receiving different sections of a document at different times, wherein the different sections include a context section and an anchor text section; generating sort keys for each token of multiple tokens in the different sections, wherein the sort keys are used to create posting lists that simultaneously are ordered by token and by document identifier for each token, wherein a sort key includes a token type, a token, a document identifier, a document section, and an offset in a document; and for each of the multiple tokens; determining if a data field associated with the token is a fixed width or a variable width, wherein the data field is fixed width for storing document content and variable width for storing document metadata; when the data field is a fixed width, designating the token as one for which fixed width sort is to be performed; and when the data field is a variable length, designating the token as one for which a variable width sort is to be performed. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. An article of manufacture comprising one of hardware logic and a computer readable storage medium including a program for indexing data, wherein the hardware logic or program causes operations to be performed, the operations comprising:
-
receiving different sections of a document at different times, wherein the different sections include a context section and an anchor text section;
generating sort keys for each token of multiple tokens in the different sections, wherein the sort keys are used to create posting lists that simultaneously are ordered by token and by document identifier for each token, wherein a sort key includes a token type, a token, a document identifier, a document section, and an offset in a document; andfor each of the multiple tokens; determining if a data field associated with the token is a fixed width or a variable width, wherein the data field is fixed width for storing document content and variable width for storing document metadata; when the data field is a fixed width, designating the token as one for which fixed width sort is to be performed; and when the data field is a variable length, designating the token as one for which a variable width sort is to be performed. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification