×

Phrase extraction using subphrase scoring

  • US 9,355,169 B1
  • Filed: 09/13/2012
  • Issued: 05/31/2016
  • Est. Priority Date: 03/30/2007
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method of extracting a set of phrases from a plurality of documents, the method comprising:

  • for each document;

    identifying a plurality of candidate phrases occurring within the document, wherein a candidate phrase includes two or more consecutive words that are determined to occur in the document, andscoring candidate phrases in the document to produce respective document phrase scores for the candidate phrases for the document, the document phrase score for a candidate phrase being based on attributes of individual occurrences of the candidate phrase in the document, with at least some candidate phrases appearing repeatedly having a higher document phrase score than candidate phrases appearing once;

    for a candidate phrase of the plurality of the candidate phrases;

    creating, via a processor, a combined score for the candidate phrase based on a plurality of different document phrase scores for the candidate phrase for respective different documents; and

    selecting the candidate phrase for inclusion in the extracted set based on the combined score for the candidate phrase and based on the document phrase scores for the candidate phrase.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×