×

Document categorisation system

  • US 7,971,150 B2
  • Filed: 09/25/2001
  • Issued: 06/28/2011
  • Est. Priority Date: 09/25/2000
  • Status: Active Grant
First Claim
Patent Images

1. A data categorization computer system comprising:

  • data processing logic having;

    a clusterer module configured to apply unsupervised learning to first items of electronic data stored on a computer-readable storage medium to generate a set of clusters of related ones of said first items of electronic data based on features extracted from said first items of electronic data, said features including at least one of n-grams, words and phrases, and the clusters representing respective item categories;

    an interactive cluster editor configured to display the set of clusters to a user and to modify the set of clusters based on input from the user to provide a set of training clusters;

    a filter module configured to use the set of training clusters as training data for supervised learning to generate categorization data representing models that distinguish respective ones of said clusters from the other clusters;

    a classifier configured to use the categorization data to categorize second items of electronic data stored on the computer-readable storage medium into the training clusters; and

    a trend analyzer for determining trends of item categories over time.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×