Annotating token sequences within documents
First Claim
Patent Images
1. A method for annotating token sequences within a plurality of documents comprising:
- receiving a base inverse index for unique tokens within the plurality of documents, where the base inverse index comprises a set of the unique tokens within the plurality of documents and a set of location lists for each unique token; and
,creating indices for a set of the token sequences within the plurality of documents from the base inverse index, to annotate the token sequences.
1 Assignment
0 Petitions
Accused Products
Abstract
Token sequences within a number of documents are annotated. First, a base inverse index for unique tokens within the documents is received. The base inverse index includes a set of the unique tokens within the documents and a set of location lists for each unique token. Second, indices are created for a set of the token sequences within the documents from the base inverse index, to annotate the token sequences.
-
Citations
20 Claims
-
1. A method for annotating token sequences within a plurality of documents comprising:
-
receiving a base inverse index for unique tokens within the plurality of documents, where the base inverse index comprises a set of the unique tokens within the plurality of documents and a set of location lists for each unique token; and
,creating indices for a set of the token sequences within the plurality of documents from the base inverse index, to annotate the token sequences. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for annotating each of a plurality of tokens within a plurality of documents comprising:
-
receiving a base inverse index for the plurality of documents, the base inverse index having an ordered list of unique tokens and a set of location lists for each unique token, each location list being an ordered list of pointers to the plurality of documents; for each of a plurality of derived entities, each derived entity being a sequence of tokens, determining an index as a consecutive intersection of all of a plurality of location lists of pointers within the derived entity, such that the index contains location lists of pointers to all occurrences of the sequence of tokens of the derived entity within the plurality of documents; and
,merging the location lists of pointers for all the derived entities to result in a final location list, such that the documents are annotated with the tokens of the derived entities. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. An article of manufacture comprising:
-
a tangible computer-readable medium; and
,means in the medium for annotating each of a plurality of tokens within a plurality of documents based on a base inverse index for the plurality of documents.
-
-
20. A computerized system comprising:
-
a computer-readable medium storing; a plurality of documents having a plurality of tokens; a base inverse index previously generated for the documents; a mechanism to annotate each token within the documents based on the base inverse index, such that annotation of the plurality of documents occurs at a same time.
-
Specification