METHOD AND APPARATUS TO BUILD A COMMON CLASSIFICATION SYSTEM ACROSS MULTIPLE CONTENT ENTITIES
First Claim
1. A method for classifying documents of a plurality of content entities into a hierarchical discipline structure in a content management system, the method comprising:
- accessing a set of taxonomic labels, the taxonomic labels collectively defining a hierarchical taxonomy;
receiving a plurality of documents, each document associated with one of the content entities;
extracting features of the received documents;
generating by a content classification system, a learned model for assigning taxonomic labels to documents associated with a representative content entity using the features extracted from documents associated with the representative content entity;
assigning, by the content classification system, one or more taxonomic labels to each document of the other content entities using the learned model applied to the features extracted from the respective document; and
classifying the documents of the plurality of content entities based on the assigned taxonomic labels.
2 Assignments
0 Petitions
Accused Products
Abstract
A content classification system classifies documents of a plurality of content entities into a hierarchical discipline structure. The content classification system receives a set of taxonomic labels collectively defining a hierarchical taxonomy and a plurality of documents. Each document is associated with one of the content entities. The content classification system extracts features from the received documents. A learned model is generated for assigning taxonomic labels to documents associated with a representative content entity using the features extracted from documents associated with the representative content entity. The content classification system assigns one or more taxonomic labels to each document of the other content entities using the learned model applied to the features extracted from the respective document. The documents of the plurality of content entities are classified based on the assigned taxonomic labels.
50 Citations
20 Claims
-
1. A method for classifying documents of a plurality of content entities into a hierarchical discipline structure in a content management system, the method comprising:
-
accessing a set of taxonomic labels, the taxonomic labels collectively defining a hierarchical taxonomy; receiving a plurality of documents, each document associated with one of the content entities; extracting features of the received documents; generating by a content classification system, a learned model for assigning taxonomic labels to documents associated with a representative content entity using the features extracted from documents associated with the representative content entity; assigning, by the content classification system, one or more taxonomic labels to each document of the other content entities using the learned model applied to the features extracted from the respective document; and classifying the documents of the plurality of content entities based on the assigned taxonomic labels. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory computer readable storage medium storing computer program instructions for classifying documents of a plurality of content entities into a hierarchical discipline structure, the computer program instructions when executed by a processor causing the processor to:
-
access a set of taxonomic labels, the taxonomic labels collectively defining a hierarchical taxonomy; receive a plurality of documents, each document associated with one of the content entities; extract features of the received documents; generate a learned model for assigning taxonomic labels to documents associated with a representative content entity using the features extracted from documents associated with the representative content entity; assign one or more taxonomic labels to each document of the other content entities using the learned model applied to the features extracted from the respective document; and classify the documents of the plurality of content entities based on the assigned taxonomic labels. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification