Very-large-scale automatic categorizer for web content
First Claim
1. A method of training a classifier system by utilizing previously classified data objects organized into a subject hierarchy of a plurality of nodes, the method comprising:
- selecting one node of the plurality of nodes;
aggregating those of the previously classified data objects corresponding to the selected node and any associated sub-nodes of the selected node, to form a content class of data objects;
aggregating those of the previously classified data objects corresponding to any associated sibling nodes of the selected node and any associated sub-nodes of the sibling nodes to form an anti-content class of data objects; and
extracting features from at least one of the content class of data objects and the anti-content class of data objects to facilitate characterization of said previously classified data objects.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for efficiently classifying and categorizing data objects such as electronic text, graphics, and audio based documents within very-large-scale hierarchical classification trees is provided. In accordance with one embodiment of the invention, a first node of a plurality of nodes of a subject hierarchy is selected. Previously classified data objects corresponding to a selected first node of a subject hierarchy as well as any associated sub-nodes of the selected node are aggregated to form a content class of data objects. Similarly, data objects corresponding to sibling nodes of the selected node and any associated sub-nodes of the sibling nodes are then aggregated to form an anti-content class of data objects. Features are then extracted from each of the content class of data objects and the anti-content class of data objects to facilitate characterization of said previously classified data objects.
-
Citations
44 Claims
-
1. A method of training a classifier system by utilizing previously classified data objects organized into a subject hierarchy of a plurality of nodes, the method comprising:
-
selecting one node of the plurality of nodes;
aggregating those of the previously classified data objects corresponding to the selected node and any associated sub-nodes of the selected node, to form a content class of data objects;
aggregating those of the previously classified data objects corresponding to any associated sibling nodes of the selected node and any associated sub-nodes of the sibling nodes to form an anti-content class of data objects; and
extracting features from at least one of the content class of data objects and the anti-content class of data objects to facilitate characterization of said previously classified data objects. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 36, 37, 38, 39, 40, 41, 42, 43, 44)
-
-
13. A method of classifying a data object, the method comprising:
-
selecting a first node of a hierarchically organized classifier having a plurality of nodes;
determining if the first node of said plurality of nodes is the parent of one or more child nodes;
upon determining that said first node is the parent of one or more child nodes, selecting a first of said one or more child nodes and classifying said data object at the first of said one or more child nodes to produce a confidence rating;
recursively selecting each of said one or more child nodes that remain and classifying the data object at each selected one or more child nodes to respectively produce a confidence rating for each selected one or more child nodes; and
assigning the data object to each node of said plurality of nodes having produced an acceptable confidence rating.
-
-
23. An apparatus comprising:
-
a storage medium having stored therein a plurality of programming instructions designed to implement a plurality of functions of a category name service for providing a category name to a data object, including first one or more functions to select a first node of a hierarchically organized classifier having a plurality of nodes and one or more previously classified data objects associated with each of said plurality of nodes, aggregate those of the previously classified data objects corresponding to the selected node and any associated sub-nodes of the selected node to form a content class of data objects, aggregate those of the previously classified data objects corresponding to any associated sibling nodes of the selected node and any associated sub-nodes of the sibling nodes to form an anti-content class of data objects, extract features from at least one sof the content class of data objects and the anti-content class of data objects to facilitate characterization of said previously classified data objects; and
a processor coupled to the storage medium to execute the programming instructions.
-
-
35. An apparatus comprising:
-
a storage medium having stored therein a plurality of programming instructions designed to implement a plurality of functions of a category name service for providing a category name to a data object, including first one or more functions to select a first node of a hierarchically organized classifier having a plurality of nodes, determine if the first node of said plurality of nodes is a parent of one or more child nodes, select a first of said one or more child nodes and classify said data object at the first of said one or more child nodes to produce a confidence rating if said first node is the parent of one or more child nodes, select each of said one or more child nodes that remain and classify the data object at each selected one or more child nodes to respectively produce a confidence rating for each selected one or more child nodes, assign the data object to each node of said plurality of nodes having produced an acceptable confidence rating; and
a processor coupled to the storage medium to execute the programming instructions.
-
Specification