×

System and method for automatically discovering a hierarchy of concepts from a corpus of documents

  • US 7,085,771 B2
  • Filed: 05/17/2002
  • Issued: 08/01/2006
  • Est. Priority Date: 05/17/2002
  • Status: Expired due to Term
First Claim
Patent Images

1. A computer-implemented method for automatically discovering a hierarchy of concepts from a corpus of documents, the concept hierarchy arranges concepts into multiple levels of abstraction, the method comprising:

  • a. extracting signatures from the corpus of documents, wherein a signature comprises a noun or a noun phrase;

    b. identifying similarity between the signatures using a refined distribution, wherein the refined distribution is obtained by computing and iteratively refining similarity measures between the signatures;

    c. hierarchically clustering related signatures to generate concepts, wherein a concept is a cluster of related nouns and noun phrases;

    d. hierarchically arranging the concepts to obtain a concept hierarchy;

    e. labeling the concepts arranged in the concept hierarchy; and

    f. creating an interface for the concept hierarchy.

View all claims
  • 10 Assignments
Timeline View
Assignment View
    ×
    ×