Scalable user clustering based on set similarity
First Claim
1. A computer program product, encoded on a machine-readable storage device, comprising instructions that when executed by a processor cause a data processing apparatus to:
- obtain a respective interest set for each of multiple users, each interest set being a set of elements, each element representing a respective item in which the respective user has expressed interest through interaction with a data processing system;
for each of the multiple users, apply an i-th hash function to each element of the interest set of the user to obtain a respective function value corresponding to the respective element, for each integer i between 1 and k, the k hash functions being distinct each from the others, and determine, from the function values obtained from the k hash functions, k hash values of the respective interest set, wherein the i-th hash value of the respective interest set is a minimum value among the function values obtained by applying the i-th hash function to the elements of interest set of the user, and where k is an integer greater than or equal to 1; and
assign each of the multiple users to each of k clusters, the i-th cluster being represented by the i-th hash value of the respective interest set of the respective user, wherein the assignment of each of the multiple users to k clusters is done without regard to the assignment of any of the other users to k clusters.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus, including systems and computer program products, to provide clustering of users in which users are each represented as a set of elements representing items, e.g., items selected by users using a system. In one aspect, a program operates to obtain a respective interest set for each of multiple users, each interest set representing items in which the respective user expressed interest; for each of the users, to determine k hash values of the respective interest set, wherein the i-th hash value is a minimum value under a corresponding i-th hash function; and to assign each of the multiple users to each of the respective k clusters established for the respective user, the i-th cluster being represented by the i-th hash value. The assignment of each of the users to k clusters is done without regard to the assignment of any of the other users to k clusters.
38 Citations
33 Claims
-
1. A computer program product, encoded on a machine-readable storage device, comprising instructions that when executed by a processor cause a data processing apparatus to:
-
obtain a respective interest set for each of multiple users, each interest set being a set of elements, each element representing a respective item in which the respective user has expressed interest through interaction with a data processing system; for each of the multiple users, apply an i-th hash function to each element of the interest set of the user to obtain a respective function value corresponding to the respective element, for each integer i between 1 and k, the k hash functions being distinct each from the others, and determine, from the function values obtained from the k hash functions, k hash values of the respective interest set, wherein the i-th hash value of the respective interest set is a minimum value among the function values obtained by applying the i-th hash function to the elements of interest set of the user, and where k is an integer greater than or equal to 1; and assign each of the multiple users to each of k clusters, the i-th cluster being represented by the i-th hash value of the respective interest set of the respective user, wherein the assignment of each of the multiple users to k clusters is done without regard to the assignment of any of the other users to k clusters. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer program product, encoded on a machine-readable storage device, comprising instructions that when executed by a processor cause a data processing apparatus to:
-
obtain an interest set for a user, the interest set being a set of elements, each element representing a respective item in which the user has expressed interest through interaction with a data processing system; apply an i-th hash function to each element of the interest set to obtain a respective function value corresponding to the respective element, for each integer i between 1 and k, the k hash functions being distinct each from the others, and determine, from the function values obtained from the k hash functions, k hash values of the interest set, wherein the i-th hash value is a minimum value among the function values obtained by applying the i-th hash function to the elements of interest set of the user, where k is an integer greater than or equal to 1; and assign the user to each of k clusters, the i-th cluster being represented by the i-th hash value. - View Dependent Claims (7, 8)
-
-
9. A system, comprising:
-
one or more machine-readable storage media storing a log of items selected by multiple users using a data processing system, the log of items identifying, for each of the multiple users, multiple items that the respective user has selected through interaction with the data processing system; one or more computers configured to use the log to generate a respective interest set for each of the multiple users, each interest set being a set of elements, each element representing a respective item in which the respective user has selected through interaction with the data processing system; one or more computers configured to apply, for each of the multiple users, an i-th hash function to each element of the interest set of the user to obtain a respective function value corresponding to the respective element, for each integer i between 1 and k, the k hash functions being distinct each from the others, and to determine, from the function values obtained from the k hash functions, k hash values of the respective interest set, wherein the i-th hash value of the respective interest set is a minimum value among the function values obtained by applying the i-th hash function to the elements of interest set of the user, and where k is an integer greater than or equal to 1; one or more computers configured to assign each of the multiple users to each of k clusters, the i-th cluster being represented by the i-th hash value of the respective interest set of the respective user, wherein the assignment of each of the multiple users to k clusters is done without regard to the assignment of any of the other users to k clusters; and one or more computers configured to execute a collaborative filtering computer program application to provide information to a first user of the multiple users based on the assignment of the first user to one or more of the k clusters. - View Dependent Claims (10, 11, 12)
-
-
13. A computer-implemented method comprising:
-
obtaining a respective interest set for each of multiple users, each interest set being a set of elements, each element representing a respective item in which the respective user has expressed interest through interaction with a data processing system; for each of the multiple users, applying an i-th hash function to each element of the interest set of the user to obtain a respective function value corresponding to the respective element, for each integer i between 1 and k, the k hash functions being distinct each from the others, and determining, from the function values obtained from the k hash functions, k hash values of the respective interest set, wherein the i-th hash value of the respective interest set is a minimum value among the function values obtained by applying the i-th hash function to the elements of interest set of the user, where k is an integer greater than or equal to 1; and assigning each of the multiple users to each of k clusters established for the respective user, the i-th cluster being represented by the i-th hash value of the respective interest set of the respective user, wherein the assignment of each of the multiple users to k clusters is done without regard to the assignment of any of the other users to k clusters. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A computer-implemented method comprising:
-
obtaining an interest set for a user, the interest set being a set of elements, each element representing a respective item in which the user has expressed interest through interaction with a data processing system; applying an i-th hash function to each element of the interest set to obtain a respective function value corresponding to the respective element, for each integer i between 1 and k, the k hash functions being distinct each from the others, and determining, from the function values obtained from the k hash functions, k hash values of the interest set, wherein the i-th hash value is a minimum value among the function values obtained by applying the i-th hash function to the elements of interest set of the user, where k is an integer greater than or equal to 1; and assigning the user to each of k clusters, the i-th cluster being represented by the i-th hash value. - View Dependent Claims (19, 20, 21)
-
-
22. A system comprising:
-
one or more computers; and a machine-readable storage medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising; obtaining an interest set for a user, the interest set being a set of elements, each element representing a respective item in which the user has expressed interest through interaction with a data processing system; applying an i-th hash function to each element of the interest set to obtain a respective function value corresponding to the respective element, for each integer i between 1 and k, the k hash functions being distinct each from the others, and determine, from the function values obtained from the k hash functions, k hash values of the interest set, wherein the i-th hash value is a minimum value among the function values obtained by applying the i-th hash function to the elements of interest set of the user, where k is an integer greater than or equal to 1; and assigning the user to each of k clusters, the i-th cluster being represented by the i-th hash value. - View Dependent Claims (23, 24, 25)
-
-
26. A system comprising:
-
one or more computers; and a machine-readable storage medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations comprising; obtaining a respective interest set for each of multiple users, each interest set being a set of elements, each element representing a respective item in which the respective user has expressed interest through interaction with a data processing system; for each of the multiple users, applying an i-th hash function to each element of the interest set of the user to obtain a respective function value corresponding to the respective element, for each integer i between 1 and k, the k hash functions being distinct each from the others, and determine, from the function values obtained from the k hash functions, k hash values of the respective interest set, wherein the i-th hash value of the respective interest set is a minimum value among the function values obtained by applying the i-th hash function to the elements of interest set of the user, and where k is an integer greater than or equal to 1; and assigning each of the multiple users to each of k clusters, the i-th cluster being represented by the i-th hash value of the respective interest set of the respective user, wherein the assignment of each of the multiple users to k clusters is done without regard to the assignment of any of the other users to k clusters. - View Dependent Claims (27, 28, 29)
-
-
30. A computer program product, encoded on a machine-readable storage device, comprising instructions that when executed by a processor cause a data processing apparatus to:
-
store a log of items selected by multiple users using a data processing system, the log of items identifying, for each of the multiple users, multiple items that the respective user has selected through interaction with the data processing system; one or more computers configured to use the log to generate a respective interest set for each of the multiple users, each interest set being a set of elements, each element representing a respective item in which the respective user has selected through interaction with the data processing system; one or more computers configured to apply, for each of the multiple users, an i-th hash function to each element of the interest set of the user to obtain a respective function value corresponding to the respective element, for each integer i between 1 and k, the k hash functions being distinct each from the others, and to determine, from the function values obtained from the k hash functions, k hash values of the respective interest set, wherein the i-th hash value of the respective interest set is a minimum value among the function values obtained by applying the i-th hash function to the elements of interest set of the user, and where k is an integer greater than or equal to 1; one or more computers configured to assign each of the multiple users to each of k clusters, the i-th cluster being represented by the i-th hash value of the respective interest set of the respective user, wherein the assignment of each of the multiple users to k clusters is done without regard to the assignment of any of the other users to k clusters; and one or more computers configured to execute a collaborative filtering computer program application to provide information to a first user of the multiple users based on the assignment of the first user to one or more of the k clusters. - View Dependent Claims (31, 32, 33)
-
Specification