Content identification
First Claim
Patent Images
1. A method, comprising:
- generating, by a device comprising a processor, a plurality of distinct clusters from training content, wherein the plurality of clusters represent features of content items in the training content;
identifying one or more conjunctions of the plurality of distinct clusters based on a respective probability of observing a feature of a cluster of the one or more conjunctions in a collection of the content items;
scoring an identified one or more conjunctions based on a conditional probability that the identified one or more conjunctions is associated with a label;
selecting, as a current conjunction, one of the scored identified one or more conjunctions that has a score that meets a defined condition; and
generating one or more higher-order child conjunctions for the current conjunction, wherein at least one of the one or more higher-order child conjunctions is a conjunction of the conjoined clusters of the current conjunction with one or more additional clusters not included in the conjoined clusters, and wherein the generating the one or more higher-order child conjunctions is performed if a stopping condition has not been reached.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems, computer program products, and methods can identify a training set of content, and generate one or more clusters from the training set of content, where each of the one or more clusters represent similar features of the training set of content. The one or more clusters can be used to generate a classifier. New content is identified and the classifier is used to associate at least one label with the new content.
17 Citations
30 Claims
-
1. A method, comprising:
-
generating, by a device comprising a processor, a plurality of distinct clusters from training content, wherein the plurality of clusters represent features of content items in the training content; identifying one or more conjunctions of the plurality of distinct clusters based on a respective probability of observing a feature of a cluster of the one or more conjunctions in a collection of the content items; scoring an identified one or more conjunctions based on a conditional probability that the identified one or more conjunctions is associated with a label; selecting, as a current conjunction, one of the scored identified one or more conjunctions that has a score that meets a defined condition; and generating one or more higher-order child conjunctions for the current conjunction, wherein at least one of the one or more higher-order child conjunctions is a conjunction of the conjoined clusters of the current conjunction with one or more additional clusters not included in the conjoined clusters, and wherein the generating the one or more higher-order child conjunctions is performed if a stopping condition has not been reached. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-readable storage medium storing instructions that, in response to execution, cause a device comprising a processor to perform operations, comprising:
-
generating a plurality of distinct clusters from training content, wherein the plurality of clusters represent features of content items in the training content; identifying one or more conjunctions of the plurality of distinct clusters based on a respective probability of observing a feature of a cluster of the one or more conjunctions in a collection of the content items; scoring an identified one or more conjunctions based on a conditional probability that the identified one or more conjunctions is associated with a label; selecting, as a current conjunction, one of the scored identified one or more conjunctions that has a score that meets a defined condition; and generating one or more higher-order child conjunctions for the current conjunction, wherein at least one of the one or more higher-order child conjunctions is a conjunction of the conjoined clusters of the current conjunction with one or more additional clusters not included in the conjoined clusters, and wherein the generating the one or more higher-order child conjunctions is performed if a stopping condition has not been reached. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system, comprising:
a processor, communicatively coupled to a memory, that executes or facilitates execution of instructions to at least; generate a plurality of distinct clusters from training content, wherein the plurality of clusters represent features of content items in the training content; identify one or more conjunctions of the plurality of distinct clusters based on a respective probability of observing a feature of a cluster of the one or more conjunctions in a collection of the content items; score an identified one or more conjunctions based on a conditional probability that the identified one or more conjunctions is associated with a label; select, as a current conjunction, one of the scored identified one or more conjunctions that has a score that meets a defined condition; and generate one or more higher-order child conjunctions for the current conjunction, wherein at least one of the one or more higher-order child conjunctions is a conjunction of the conjoined clusters of the current conjunction with one or more additional clusters not included in the conjoined clusters, and wherein the generating the one or more higher-order child conjunctions is performed if a stopping condition has not been reached. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
Specification