×

Taxonomy generation for document collections

  • US 6,446,061 B1
  • Filed: 06/30/1999
  • Issued: 09/03/2002
  • Est. Priority Date: 07/31/1998
  • Status: Expired due to Term
First Claim
Patent Images

1. A computer-executable method of generating a content taxonomy of a multitude of documents (210) stored on a computer system, said method comprising:

  • a subset-selection-step (201), for selecting a subset of said multitude of documents;

    a taxonomy-generation-step (202 to 205), for generating a taxonomy for said subset, wherein said taxonomy is a tree-structured taxonomy-hierarchy, and wherein said subset is divided into a set of clusters with largest intra-similarity, and wherein each of said clusters of largest intra-similarity is assigned to a leaf-node of said taxonomy-hierarchy as outer-clusters, and wherein inner-nodes of said taxonomy-hierarchy order said subset, starting with said outer-clusters, into inner-clusters with increasing cluster size and decreasing similarity, and wherein said taxonomy-generation-step further comprises a first-feature-extraction-step (202) for extracting for each document of said subset its features, and for computing its feature statistics in a feature-vector (212) as a representation of said document; and

    a routing-selection-step (206), for computing, for each unprocessed document of said multitude of documents not belonging to said subset, similarities with said outer-clusters, and for assigning said document to the leaf-node of said taxonomy-hierarchy being the outer-cluster with largest similarty.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×