Method for multi-class, multi-label categorization using probabilistic hierarchical modeling
First Claim
1. A method for categorizing a set of objects, comprising:
- defining a set of categories in which at least one category in the set is dependent on another category in the set;
organizing the set of categories in a hierarchy that embodies any dependencies among the categories in the set;
for each object, assigning to the object one or more categories l1 . . . lP, where l1∈
{1 . . . L} from a set {1 . . . L} of possible categories, wherein the assigned categories represent a subset of categories for which the object is relevant;
defining a new set of labels z comprising all possible combinations of any number of the categories, z∈
{{1},{2}, . . . {L},{1,2}, . . . {1,L},{2,3}, . . . {1,2,3}, . . . {1,2, . . . L}}, wherein if an object is relevant to several categories, the object must be assigned the unique label z corresponding to the subset of all relevant categories; and
assigning to the object the several categories and the subcategories of the several categories;
wherein an object comprises a document d generated by co-occurrence of words within the document;
wherein the hierarchy is generated by;
for each document d, choosing a document category α
according to the probability P(α
|d)∝
P(d |α
)P(α
);
selecting a label v according to the category-conditional probability P(v|α
);
selecting a word in the document according to a label-specific word distribution P(w|v); and
restricting P(v|α
) to give positive probability only to labels that are above the category in the hierarchy.
8 Assignments
0 Petitions
Accused Products
Abstract
A method of categorizing objects in which there can be multiple categories of objects and each object can belong to more than one category is described. The method defines a set of categories in which at least one category is dependent on another category and then organizes the categories in a hierarchy that embodies any dependencies among them. Each object is assigned to one or more categories in the set. A set of labels corresponding to all combinations of any number of the categories is defined, wherein if an object is relevant to several categories, the object must be assigned the label corresponding to the subset of all relevant categories. Once the new labels are defined, the multi-category, multi-label problem has been reduced to a multi-category, single-label problem, and the categorization task is reduced down to choosing the single best label set for an object.
44 Citations
8 Claims
-
1. A method for categorizing a set of objects, comprising:
- defining a set of categories in which at least one category in the set is dependent on another category in the set;
organizing the set of categories in a hierarchy that embodies any dependencies among the categories in the set; for each object, assigning to the object one or more categories l1 . . . lP, where l1∈
{1 . . . L} from a set {1 . . . L} of possible categories, wherein the assigned categories represent a subset of categories for which the object is relevant;defining a new set of labels z comprising all possible combinations of any number of the categories, z∈
{{1},{2}, . . . {L},{1,2}, . . . {1,L},{2,3}, . . . {1,2,3}, . . . {1,2, . . . L}}, wherein if an object is relevant to several categories, the object must be assigned the unique label z corresponding to the subset of all relevant categories; andassigning to the object the several categories and the subcategories of the several categories; wherein an object comprises a document d generated by co-occurrence of words within the document; wherein the hierarchy is generated by; for each document d, choosing a document category α
according to the probability P(α
|d)∝
P(d |α
)P(α
);selecting a label v according to the category-conditional probability P(v|α
);selecting a word in the document according to a label-specific word distribution P(w|v); and restricting P(v|α
) to give positive probability only to labels that are above the category in the hierarchy. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- defining a set of categories in which at least one category in the set is dependent on another category in the set;
Specification