×

Distributed hierarchical text classification framework

  • US 7,809,723 B2
  • Filed: 08/15/2006
  • Issued: 10/05/2010
  • Est. Priority Date: 06/26/2006
  • Status: Active Grant
First Claim
Patent Images

1. A method in a computing device with a processor for training a hierarchical classifier for classification of documents into a classification hierarchy, the method comprising:

  • providing the classification hierarchy in which classifications have sub-classifications except for leaf classifications;

    providing training data for training the classifiers, the training data including documents and classifications of the documents within the classification hierarchy, the classification of a document indicating that the document is in that classification and ancestor classifications of that classification, each classification having a number of documents;

    generating a classifier for each classification within the classification hierarchy by, for each classification within the classification hierarchy,determining a complexity for the classifier for the classification, the complexity of the classifier varying nonlinearly based on the number of documents within the classification;

    identifying by the processor one of a plurality of agents to train the classifier for that classification such that one agent is identified to train one classifier and some of the agents are identified to train multiple classifiers, the agents being identified to balance training load of the agents that is determined based on the determined complexity of the classifiers identified to be trained by each agent wherein the identifying of one of the agents includes;

    when a classifier has not yet been assigned to an agent, assigning the classifier to that agent; and

    when a classifier has already been assigned to each agent, assigning the classifier to an agent based on complexity of the classifier and complexities of classifiers assigned to each agent such that a classifier with the highest complexity is assigned to an agent that has been assigned classifiers with the smallest total complexity; and

    under control of the identified agent, training the classifier for that classification using the documents of the training data that are classified within that classification of the classification hierarchy;

    wherein each agent trains classifiers for a varying number of documents of the training data,wherein the classifiers trained by the multiple agents form the hierarchical classifier, andwherein the agent for a classifier is identified based on number of documents used.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×