×

Method and apparatus for populating a predefined concept hierarchy or other hierarchical set of classified data items by minimizing system entrophy

  • US 7,320,000 B2
  • Filed: 12/04/2002
  • Issued: 01/15/2008
  • Est. Priority Date: 12/04/2002
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented method for automating classification of a new data item when adding the new data item to an hierarchically organized hierarchy set of classified data items, wherein nodes of the hierarchy set correspond to classes of data items, said method comprising:

  • inputting into a computer system classes of documents comprising (i) a collection of classified documents within a hierarchy of classes and (ii) class labels associated with said collection of classified documents;

    training said computer system using said collection of classified documents, wherein the training process comprises;

    selecting, from an input set of said collection of classified documents, a set of tokens for use as classification attributes; and

    building a dictionary comprising said classification attributes;

    modeling a distribution of said classification attributes across said classes of documents by using a set of random variables and values associated with said set of random variables to represent said classification attributes in said dictionary;

    calculating an initial entropy value at every node of each level of said hierarchy of classes using said set of random variables;

    inputting into said computer system a new data item to be classified into one of said classes in said hierarchy of classes;

    calculating entropy values for each of a plurality of possible classes into which said new data item could be classified;

    comparing the calculated entropy values with said initial entropy value at every node of each level of said hierarchy of classes in order to create a plurality of conditional entropy values;

    selecting a class having a lowest conditional entropy value; and

    classifying said new data item in the selected class.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×