System and method for clustering content according to similarity
First Claim
1. A computer implemented method for clustering content according to similarity, the method comprising:
- receiving a set of features for a plurality of content items;
calculating, by a processor, a distance matrix for the plurality of content items based on data indicating user behavior relative to at least some of the content items, wherein the data includes information associated with one or more users accessing at least one of the content items;
labeling, by a processor, at least some of the content items as pairwise constraints based on the distance matrix; and
creating, by a processor, a boosted cluster by incorporating the pairwise constraints into a clustering algorithm.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for clustering content according to similarity are provided that identify and group similar content using a set of tags associated with the content. A topic model of a group of content is built, producing a probability distribution of topic membership for the content. Individual items of content are then clustered using a clustering algorithm, and a distance matrix from the probability distribution is built. Based on the distance matrix, individual items of content are labeled as “must-link” or “cannot-link” pairs with the group of content. The topic model is then embedded into successively smaller dimensions using a kernel method, until the clustering is stable with respect to both the behavioral and content domains.
15 Citations
18 Claims
-
1. A computer implemented method for clustering content according to similarity, the method comprising:
-
receiving a set of features for a plurality of content items; calculating, by a processor, a distance matrix for the plurality of content items based on data indicating user behavior relative to at least some of the content items, wherein the data includes information associated with one or more users accessing at least one of the content items; labeling, by a processor, at least some of the content items as pairwise constraints based on the distance matrix; and creating, by a processor, a boosted cluster by incorporating the pairwise constraints into a clustering algorithm. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for clustering content according to similarity, the system comprising:
-
a processor configured to; receive a set of features for a plurality of content items; calculate a distance matrix for the plurality of content based on data indicating user behavior relative to at least some of the content items, wherein the data includes information associated with one or more users accessing at least one of the content items; label content items as a pairwise constraint based on the distance matrix; and create a boosted cluster by incorporating the pairwise constraint into a clustering algorithm; and a tangible computer readable media configured store the boosted cluster. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification