×

Document-based synonym generation

  • US 8,161,041 B1
  • Filed: 02/10/2011
  • Issued: 04/17/2012
  • Est. Priority Date: 02/07/2007
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method comprising:

  • receiving a pair of words comprising a first word and a second word, where each word appears in a collection of documents;

    generating a word-form score for the pair of words based on a consistency of the pair of words with word-form rules, wherein a word-form rule indicates how words with a common portion can vary;

    computing a probability that the first word occurs within a first number of words of the second word in the one or more documents in the collection;

    computing a probability that the first word occurs within a second number of words of the second word in the one or more documents in the collection, wherein the second number is greater than the first number;

    generating a closeness score for the pair of words by dividing the first number by the second number;

    computing a relative frequency of occurrence for the first word and the second word in the collection of documents;

    generating a correlation between occurrences of a first word in the title or the anchor of the documents and occurrences of a second word in a same document; and

    determining that the first word and the second word are synonyms based at least on the correlation, the relative frequency of the first word and the second word, the closeness score, and the word-form score.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×