×

Method for multi-class, multi-label categorization using probabilistic hierarchical modeling

  • US 7,139,754 B2
  • Filed: 02/09/2004
  • Issued: 11/21/2006
  • Est. Priority Date: 02/09/2004
  • Status: Active Grant
First Claim
Patent Images

1. A method for categorizing a set of objects, comprising:

  • defining a set of categories in which at least one category in the set is dependent on another category in the set;

    organizing the set of categories in a hierarchy that embodies any dependencies among the categories in the set;

    for each object, assigning to the object one or more categories l1 . . . lP, where l1∈

    {1
    . . . L} from a set {1 . . . L} of possible categories, wherein the assigned categories represent a subset of categories for which the object is relevant;

    defining a new set of labels z comprising all possible combinations of any number of the categories, z∈

    {{1},{2}, . . . {L},{1,2}, . . . {1,L},{2,3}, . . . {1,2,3}, . . . {1,2, . . . L}}, wherein if an object is relevant to several categories, the object must be assigned the unique label z corresponding to the subset of all relevant categories; and

    assigning to the object the several categories and the subcategories of the several categories;

    wherein an object comprises a document d generated by co-occurrence of words within the document;

    wherein the hierarchy is generated by;

    for each document d, choosing a document category α

    according to the probability P(α

    |d)∝

    P(d |α

    )P(α

    );

    selecting a label v according to the category-conditional probability P(v|α

    );

    selecting a word in the document according to a label-specific word distribution P(w|v); and

    restricting P(v|α

    ) to give positive probability only to labels that are above the category in the hierarchy.

View all claims
  • 8 Assignments
Timeline View
Assignment View
    ×
    ×