×

Extracting topically related keywords from related documents

  • US 8,463,786 B2
  • Filed: 06/10/2010
  • Issued: 06/11/2013
  • Est. Priority Date: 06/10/2010
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented process for extracting topically related keywords from topically related documents, comprising:

  • using a computer to perform the following process actions;

    accessing a set of topically related documents;

    identifying a number of candidate keywords from the set of related documents, wherein a candidate keyword can be an individual term or a multiple word phrase;

    forming a weighted keyword candidate-document matrix using the candidate keywords;

    partitioning the keyword candidate-document matrix into multiple groups of keyword candidates;

    identifying dense clusters of keyword candidates in each of the groups of keyword candidates whose density exceeds a prescribed density threshold, said keyword candidate cluster density for a group of keyword candidates being based on co-occurrences of keyword candidates belonging to the group in documents from the set of related documents; and

    for each of the identified dense clusters,designating the keyword candidates associated with that cluster as topically related keywords, andextracting the topically related keywords from the set of topically related documents.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×