×

Identifying common co-occurring elements in lists

  • US 8,463,782 B1
  • Filed: 04/11/2011
  • Issued: 06/11/2013
  • Est. Priority Date: 07/10/2007
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method comprising:

  • traversing a corpus of documents to identify a plurality of lists within the documents, wherein each list comprises structured data delimited from other data in a document, and wherein each list specifies an enumeration of elements;

    selecting a pair of terms based on determining that both terms of the pair are contained in a first quantity of lists that are included in the documents in the corpus, wherein the first quantity is more than a first predetermined quantity, and wherein each list in the first quantity of lists includes more than a second predetermined quantity of terms;

    determining a first value that represents a quantity of documents in the corpus that include a list that contains both terms of the pair;

    determining a second value that represents a quantity of the documents in the set corpus that include a list that contains at least one of the terms of the pair;

    when both terms of the pair are contained in the first quantity of lists that are included in the documents in the corpus, determining a correlation value from the first value and the second value;

    determining that the correlation value satisfies a threshold; and

    designating, by one or more computers, the pair of terms as potentially non-synonymous terms by adding the pair of terms to a blacklist, based on determining that the correlation value satisfies the threshold, wherein the blacklist is accessed for synonym determination.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×