×

Phrase extraction using subphrase scoring

  • US 8,166,045 B1
  • Filed: 03/30/2007
  • Issued: 04/24/2012
  • Est. Priority Date: 03/30/2007
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method of extracting a set of valid phrases from a plurality of documents, the method comprising:

  • for each document;

    identifying a plurality of candidate phrases contained in the document, wherein a candidate phrase includes multiple consecutive words that appear in the document, wherein identifying a candidate phrase includes scanning though words of the document to identify the multiple consecutive words of the candidate phrase contained in the document;

    scoring each candidate phrase in the document to produce a document phrase score for the candidate phrase for the document, the document phrase score being based on each instance of the candidate phrase that appears in the document,wherein scoring the candidate phrases in the document to produce the document phrase score comprises;

    scoring a plurality of instances of the candidate phrase in the document to produce a plurality of instance phrase scores for the candidate phrase for the document, the instance phrase scores being based on a location of the instance of the candidate phrase within the document and being based on a position of the instance of the candidate phrase relative to a sequence of words containing the instance of the candidate phrase; and

    combining the plurality of instance phrase scores of the candidate phrase in the document into the document phrase score;

    for each candidate phrase;

    creating, via a processor, a combined score for the candidate phrase based on a plurality of different document phrase scores for the candidate phrase for respective different documents; and

    determining whether the candidate phrase is a valid phrase based on the combined score for the candidate phrase and based on the document phrase scores for the candidate phrase.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×