×

Scalable lookup-driven entity extraction from indexed document collections

  • US 8,782,061 B2
  • Filed: 06/24/2008
  • Issued: 07/15/2014
  • Est. Priority Date: 06/24/2008
  • Status: Active Grant
First Claim
Patent Images

1. A method for filtering a set of documents, comprising:

  • receiving a list of entity strings;

    determining a set of token sets that covers the entity strings in the list, the number of tokens in the set of token sets being less than the number of words of the entity strings in the list of entity strings;

    querying an inverted index generated on a first set of documents using the set of token sets to determine a set of document identifiers for a subset of the documents in the first set;

    retrieving from the first set of documents a second set of documents, which is a subset of the first set of documents, identified by the set of document identifiers; and

    filtering the second set of documents to include one or more documents of the second set that each include a match with at least one entity string of the list of entity strings.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×