×

Paraphrase acquisition

  • US 7,937,265 B1
  • Filed: 09/27/2005
  • Issued: 05/03/2011
  • Est. Priority Date: 09/27/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • receiving textual input in data processing apparatus;

    identifying, by operation of the data processing apparatus, a plurality of ngrams, each ngram being a sequence of words within the textual input;

    dividing, by operation of the data processing apparatus, each identified ngram into three portions;

    a beginning constant portion containing a first number of words at the beginning of the ngram, an ending constant portion containing a second number of words at the end of the ngram, and a middle portion containing the words of the ngram between the beginning constant portion and the ending constant portion;

    determining, by operation of the data processing apparatus, an anchor for each ngram, the anchor comprising the beginning constant portion and the ending constant portion of the ngram; and

    identifying, by operation of the data processing apparatus, a plurality of potential paraphrase pairs, wherein if the anchor of a first ngram is the same as the anchor of a second ngram in the plurality of ngrams, the middle portion of the first ngram and the middle portion of the second ngram is identified as being a potential paraphrase pair, wherein the middle portion of the first ngram is textually different from the middle portion of the second ngram.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×