SCALABLE LOOKUP-DRIVEN ENTITY EXTRACTION FROM INDEXED DOCUMENT COLLECTIONS
First Claim
1. A method for ad-hoc entity extraction, comprising:
- filtering a first set of documents to generate a second set of documents that includes documents of the first set having a match with at least one entity string in a list of entity strings; and
performing entity recognition on the second set of documents.
3 Assignments
0 Petitions
Accused Products
Abstract
A set of documents is filtered for entity extraction. A list of entity strings is received. A set of token sets that covers the entity strings in the list is determined. An inverted index generated on a first set of documents is queried using the set of token sets to determine a set of document identifiers for a subset of the documents in the first set. A second set of documents identified by the set of document identifiers is retrieved from the first set of documents. The second set of documents is filtered to include one or more documents of the second set that each includes a match with at least one entity string of the list of entity strings. Entity recognition may be performed on the filtered second set of documents.
-
Citations
20 Claims
-
1. A method for ad-hoc entity extraction, comprising:
-
filtering a first set of documents to generate a second set of documents that includes documents of the first set having a match with at least one entity string in a list of entity strings; and performing entity recognition on the second set of documents. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for ad-hoc entity extraction, comprising:
-
a document filter configured to filter a first set of documents to generate a second set of documents that includes documents of the first set having a match with at least one entity string in a list of entity strings; and an entity recognition module configured to perform entity recognition on the second set of documents. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A computer program product comprising a computer-readable medium having computer program logic recorded thereon for enabling a processor-based system to perform ad-hoc entity extraction according to a method that comprises:
-
filtering a first set of documents to generate a second set of documents that includes documents of the first set having a match with at least one entity string in a list of entity strings; and performing entity recognition on the second set of documents. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification