Information exploration systems and method

  • US 7,676,463 B2
  • Filed: 11/15/2005
  • Issued: 03/09/2010
  • Est. Priority Date: 11/15/2005
  • Status: Active Grant
  • ×
    • Pin Icon | RPX Insight
    • Pin
First Claim
Patent Images

1. A computer-implemented information exploration method that comprises:

  • processing a set of documents with a computer to identify a hierarchy of clusters of documents,wherein the processing comprises;

    calculating a pseudo-document vector for each document in the set of documents; and

    computing the hierarchy of clusters from the pseudo-document vectors; and

    selecting with a computer one or more phrases from the set of documents as representative phrases for each cluster from a root cluster to leaf clusters in the hierarchy of clusters,wherein the selecting comprises;

    constructing a phrase-to-leaf node index for the hierarchy of clusters, wherein the phrase-to-leaf node index includes a list of phrases that occur in at least a predetermined number of documents of at least one of the leaf clusters, and for each phrase in the list of phrases, the phrase-to-leaf node index identifies each of the leaf clusters containing the phrase; and

    wherein constructing the phrase-to-leaf node index further comprises;

    constructing a suffix tree for each leaf cluster in the hierarchy of clusters;

    constructing a phrase index for each leaf cluster that includes a list of phrases shared by at least the predetermined number of documents in the leaf cluster; and

    combining the phrase indices of the leaf clusters in the hierarchy of cluster to construct the phrase-to-leaf node index.

View all claims

    Thank you for your feedback