FRAMEWORK FOR LARGE-SCALE MULTI-LABEL CLASSIFICATION
First Claim
1. A method comprising:
- accessing an electronic document;
identifying one or more seed labels, the one or more seed labels representing respective one or more preliminary content topics associated with the electronic document;
generating a seed graph, nodes of the seed graph representing the one or more seed labels;
based on co-occurrence of labels in member profiles of an on-line social networking system and based on the one or more seed labels, deriving one or more additional labels;
using at least one processor, generating an expanded graph comprising a first set of nodes representing the one or more additional labels and a second set of nodes representing the one or more seed labels;
applying a clustering algorithm to the expanded graph to generate a labels graph; and
identifying nodes of the labels graph, as a set of resolved content topics associated with the electronic document.
2 Assignments
0 Petitions
Accused Products
Abstract
A framework for large-scale multi-label classification of an electronic document is described. An example multi-label classification system is configured to identify seed labels that represent respective one or more candidate content topics associated with the electronic document and determine additional labels based on the seed labels and label correlation data derived from member profiles maintained by an on-line social network system. The multi-label classification system then constructs a graph comprising nodes that correspond to the seed labels and the additional labels. A clustering algorithm is applied to the constructed graph to produce a labels graph. The labels graph is deemed to include nodes that correspond to topics discussed or referenced in the electronic document.
7 Citations
20 Claims
-
1. A method comprising:
-
accessing an electronic document; identifying one or more seed labels, the one or more seed labels representing respective one or more preliminary content topics associated with the electronic document; generating a seed graph, nodes of the seed graph representing the one or more seed labels; based on co-occurrence of labels in member profiles of an on-line social networking system and based on the one or more seed labels, deriving one or more additional labels; using at least one processor, generating an expanded graph comprising a first set of nodes representing the one or more additional labels and a second set of nodes representing the one or more seed labels; applying a clustering algorithm to the expanded graph to generate a labels graph; and identifying nodes of the labels graph, as a set of resolved content topics associated with the electronic document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented system comprising:
-
an access module, implemented using at least one processor, to access an electronic document; an identifying module, implemented using at least one processor, to identify one or more seed labels, the one or more seed labels representing respective one or more preliminary content topics associated with the electronic document; a graph generator, implemented using at least one processor, to generate a seed graph, nodes of the seed graph representing the one or more seed labels; an expanded nodes detector, implemented using at least one processor, to derive one or more additional labels based on co-occurrence of labels in member profiles of an on-line social networking system and based on the one or more seed labels; an expanded graph generator, implemented using at least one processor, to generate an expanded graph comprising a first set of nodes representing the one or more additional labels and a second set of nodes representing the one or more seed labels; a graph cutting module, implemented using at least one processor, to apply a clustering algorithm to the expanded graph to generate a labels graph, using the at least one processor; and a resolved labels module, implemented using at least one processor, to identify nodes of the labels graph as a set of resolved content topics associated with the electronic document, using the at least one processor. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A machine-readable non-transitory storage medium having instruction data to cause a machine to perform operations comprising:
-
accessing an electronic document; identifying one or more seed labels, the one or more seed labels representing respective one or more preliminary content topics associated with the electronic document; generating a seed graph, nodes of the seed graph representing the one or more seed labels; based on co-occurrence of labels in member profiles of an on-line social networking system and based on the one or more seed labels, deriving one or more additional labels; generating an expanded graph comprising a first set of nodes representing the one or more additional labels and a second set of nodes representing the one or more seed labels; applying a clustering algorithm to the expanded graph to generate a labels graph; and identifying nodes of the labels graph, as a set of resolved content topics associated with the electronic document.
-
Specification