Architecture for an indexer
First Claim
1. A method for indexing data, comprising:
- for each token in a set of documents that each have an anchor text section and a context text section, using a computer including a processor,generating a sort key that includes a document identifier that indicates whether a section of a document associated with the sort key is the anchor text section or the context section, wherein the anchor text section and the context text section have a same document identifier;
determining whether a data field associated with the token is a fixed width;
when the data field is a fixed width, designating the token as one for which fixed width sort is to be performed; and
when the data field is a variable length, designating the token as one for which a variable width sort is to be performed;
performing the fixed width sort on one of dual code paths and the variable width sort on the other of dual code paths; and
in response to performing the fixed width sort and the variable width sort, for each document, using the sort keys to bring together the anchor text section and the context section of that document based on the same document identifier associated with the anchor text section and the context section.
0 Assignments
0 Petitions
Accused Products
Abstract
Disclosed is a technique for indexing data. For each token in a set of documents, a sort key is generated that includes a document identifier that indicates whether a section of a document associated with the sort key is an anchor text section or a context section, wherein the anchor text section and the context text section have a same document identifier; it is determined whether a data field associated with the token is a fixed width; when the data field is a fixed width, the token is designated as one for which fixed width sort is to be performed; and, when the data field is a variable length, the token is designated as one for which a variable width sort is to be performed. The fixed width sort and the variable width sort are performed. For each document, the sort keys are used to bring together the anchor text section and the context section of that document.
-
Citations
21 Claims
-
1. A method for indexing data, comprising:
-
for each token in a set of documents that each have an anchor text section and a context text section, using a computer including a processor, generating a sort key that includes a document identifier that indicates whether a section of a document associated with the sort key is the anchor text section or the context section, wherein the anchor text section and the context text section have a same document identifier; determining whether a data field associated with the token is a fixed width; when the data field is a fixed width, designating the token as one for which fixed width sort is to be performed; and when the data field is a variable length, designating the token as one for which a variable width sort is to be performed; performing the fixed width sort on one of dual code paths and the variable width sort on the other of dual code paths; and in response to performing the fixed width sort and the variable width sort, for each document, using the sort keys to bring together the anchor text section and the context section of that document based on the same document identifier associated with the anchor text section and the context section. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer system for indexing data, comprising:
hardware logic performing; for each token in a set of documents that each have an anchor text section and a context text section, generating a sort key that includes a document identifier that indicates whether a section of a document associated with the sort key is the anchor text section or the context section, wherein the anchor text section and the context text section have a same document identifier; determining whether a data field associated with the token is a fixed width; when the data field is a fixed width, designating the token as one for which fixed width sort is to be performed; and when the data field is a variable length, designating the token as one for which a variable width sort is to be performed; performing the fixed width sort on one of dual code paths and the variable width sort on the other of dual code paths; and in response to performing the fixed width sort and the variable width sort, for each document, using the sort keys to bring together the anchor text section and the context section of that document based on the same document identifier associated with the anchor text section and the context section. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. An article of manufacture comprising one of hardware logic and a computer readable medium including a program for indexing data, wherein the hardware logic or program causes operations to be performed, the operations comprising:
-
for each token in a set of documents that each have an anchor text section and a context text section, generating a sort key that includes a document identifier that indicates whether a section of a document associated with the sort key is the anchor text section or the context section, wherein the anchor text section and the context text section have a same document identifier; determining whether a data field associated with the token is a fixed width; when the data field is a fixed width, designating the token as one for which fixed width sort is to be performed; and when the data field is a variable length, designating the token as one for which a variable width sort is to be performed; performing the fixed width sort on one of dual code paths and the variable width sort on the other of dual code paths; and in response to performing the fixed width sort and the variable width sort, for each document, using the sort keys to bring together the anchor text section and the context section of that document based on the same document identifier associated with the anchor text section and the context section. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification