×

Document key phrase extraction method

  • US 8,935,260 B2
  • Filed: 05/12/2009
  • Issued: 01/13/2015
  • Est. Priority Date: 05/12/2009
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method of extracting key phrases from a document comprising:

  • accessing a repository comprising hyperlinked subjects, the repository comprising first and second data structures representing the relationship between said hyperlinked subjects using different representation criteria;

    pruning the first data structure by removing hyperlinks between subjects based on a further relationship between said subjects in the second data structure;

    matching phrases in said document to said subjects in the pruned first data structure;

    further pruning the pruned first data structure by removing unmatched subjects that are not hyperlinked to matched subjects;

    determining a ranking for each matched subject; and

    selecting key phrases using the determined subject rankings, wherein the first data structure is a directional graph comprising the subjects as nodes and the hyperlinks between subjects as edges between nodes;

    the second data structure is a directional graph comprising organized subject categories; and

    the further relationship comprises the shortest distance between respective categories to which respective subjects belong in the second data structure, the hyperlink between said subjects being removed if the shortest distance exceeds a threshold value.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×