Scalable user clustering based on set similarity
First Claim
1. A computer program product, encoded on an information carrier, comprising instructions operable to cause data processing apparatus to:
- obtain a respective interest set for each of multiple users, each interest set representing items in which the respective user has expressed interest through interaction with a data processing system;
for each of the multiple users, determine k hash values of the respective interest set, wherein the i-th hash value is a minimum value in the respective interest set under a corresponding i-th hash function, where i is an integer between 1 and k, and where k is an integer greater than or equal to 1; and
assign each of the multiple users to each of the respective k clusters established for the respective user, the i-th cluster being represented by the i-th hash value, wherein the assignment of each of the multiple users to k clusters is done without regard to the assignment of any of the other users to k clusters.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus, including systems and computer program products, to provide clustering of users in which users are each represented as a set of elements representing items, e.g., items selected by users using a system. In one aspect, a program operates to obtain a respective interest set for each of multiple users, each interest set representing items in which the respective user expressed interest; for each of the users, to determine k hash values of the respective interest set, wherein the i-th hash value is a minimum value under a corresponding i-th hash function; and to assign each of the multiple users to each of the respective k clusters established for the respective user, the i-th cluster being represented by the i-th hash value. The assignment of each of the users to k clusters is done without regard to the assignment of any of the other users to k clusters.
148 Citations
32 Claims
-
1. A computer program product, encoded on an information carrier, comprising instructions operable to cause data processing apparatus to:
-
obtain a respective interest set for each of multiple users, each interest set representing items in which the respective user has expressed interest through interaction with a data processing system;
for each of the multiple users, determine k hash values of the respective interest set, wherein the i-th hash value is a minimum value in the respective interest set under a corresponding i-th hash function, where i is an integer between 1 and k, and where k is an integer greater than or equal to 1; and
assign each of the multiple users to each of the respective k clusters established for the respective user, the i-th cluster being represented by the i-th hash value, wherein the assignment of each of the multiple users to k clusters is done without regard to the assignment of any of the other users to k clusters. - View Dependent Claims (2, 3)
-
-
4. A computer program product, encoded on an information carrier, comprising instructions operable to cause data processing apparatus to:
-
obtain an interest set for a user, the interest set representing items in which the user has expressed interest through interaction with a data processing system;
determine k hash values of the interest set, wherein the i-th hash value is a minimum value in the interest set under a corresponding i-th hash function, where i is an integer between 1 and k, and where k is an integer greater than or equal to 1; and
assign the user to each of k clusters, the i-th cluster being represented by the i-th hash value. - View Dependent Claims (5, 6)
-
-
7. A system, comprising:
-
a log of items selected by multiple users using a data processing system;
means for using a fingerprint function and the log of items to assign each of the multiple users to k clusters, where k is an integer greater than or equal to 1; and
a collaborative filtering computer program application operable to provide information to a first user of the multiple users based on the assignment of the first user to one or more of the k clusters. - View Dependent Claims (8)
-
-
9. A computer program product, encoded on an information carrier, comprising instructions operable to cause data processing apparatus to:
use an ordered set of k elements to identify a user of a data processing system, where k is an integer greater than 1, where each of the k elements corresponds to an element in an interest set, each element in the interest set representing an item in which the user has expressed interest through actions by the user using the data processing system. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
23. A method comprising:
-
obtaining a respective interest set for each of multiple users, each interest set representing items in which the respective user has expressed interest through interaction with a data processing system;
for each of the multiple users, determining k hash values of the respective interest set, wherein the i-th hash value is a minimum value in the respective interest set under a corresponding i-th hash function, where i is an integer between 1 and k, and where k is an integer greater than or equal to 1; and
assigning each of the multiple users to each of the respective k clusters established for the respective user, the i-th cluster being represented by the i-th hash value, wherein the assignment of each of the multiple users to k clusters is done without regard to the assignment of any of the other users to k clusters. - View Dependent Claims (24)
-
-
25. A method comprising:
-
obtaining an interest set for a user, the interest set representing items in which the user has expressed interest through interaction with a data processing system;
determining k hash values of the interest set, wherein the i-th hash value is a minimum value in the interest set under a corresponding i-th hash function, where i is an integer between 1 and k, and where k is an integer greater than or equal to 1; and
assigning the user to each of k clusters, the i-th cluster being represented by the i-th hash value. - View Dependent Claims (26, 27)
-
-
28. A method comprising:
using an ordered set of k elements to identify a user of a data processing system, where k is an integer greater than 1, where each of the k elements corresponds to an element in an interest set, each element in the interest set representing an item in which the user has expressed interest through actions by the user using the data processing system. - View Dependent Claims (29, 30, 31, 32)
Specification