Fast collaborative filtering through sketch function based approximations
First Claim
Patent Images
1. A system that enables scalable collaborative filtering, the system comprising:
- a processor;
system memory; and
one or more computer storage media having stored thereon;
a collaborative filtering component configured to receive data associated with a plurality of user sessions and to generate a recommendation based on one or more similarities among the user sessions, the data associated with the user sessions including;
first data indicating a plurality of users;
second data indicating a plurality of items; and
third data indicating ratings of the items by the users;
an approximation component configured to approximate the one or more similarities that the collaborative filtering component uses to generate the recommendation, the approximation component comprising;
an adjustable accuracy component configured to receive input indicating a number of sketching functions;
a sketch component configured to generate sketching functions according to the number indicated by the input received by the adjustable accuracy component, the generated sketching functions including a first sketching function in a form f(x)=mx+b mod P, wherein m is a first randomly generated integer, b is a second randomly generated integer, and P is a randomly generated prime number, the sketch component configured to randomly generate the first integer, the second integer, and the prime number;
a mapping component configured to individually map the users to unique integers and to generate individual data sets linked to each of the items, each data set including one or more of the unique integers to indicate that the one or more users which are mapped to the one or more unique integers rated the linked item above a threshold rating;
a min-hashing component configured to;
generate hash values by selecting each data set and applying the generated sketching functions to the one or more unique integers in the selected data set;
for each sketching function applied to the selected data set, identify a min-hash value from among the hash values generated using the applied sketching function and the selected data set; and
associate each identified min-hash value with the item linked to the selected data set;
a matching component configured to;
compare the min-hash values associated with the items;
based on the comparison, calculate a total of min-hash value matches for pairs of items; and
generate an approximate Jaccard coefficient for each pair of items by dividing the total of min-hash value matches for the pair of items by the number of sketching functions generated and applied to the data sets linked to the items; and
a component configured to determine whether the data associated with a plurality of user sessions is sufficiently large as to warrant approximating the one or more similarities that the collaborative filtering component uses to generate the recommendation rather than calculating the one or more similarities.
2 Assignments
0 Petitions
Accused Products
Abstract
The claimed subject matter provides systems and/or methods that enable scalable collaborative filtering. A collaborative filtering component can receive data associated with a plurality of user sessions and data associated with at least one of a user and an item. Additionally, the collaborative filtering component can generate a recommendation based on a similarity. Further, an approximation component can approximate the similarity between at least one of the item and disparate items and the user and disparate users.
-
Citations
11 Claims
-
1. A system that enables scalable collaborative filtering, the system comprising:
-
a processor; system memory; and one or more computer storage media having stored thereon; a collaborative filtering component configured to receive data associated with a plurality of user sessions and to generate a recommendation based on one or more similarities among the user sessions, the data associated with the user sessions including; first data indicating a plurality of users; second data indicating a plurality of items; and third data indicating ratings of the items by the users; an approximation component configured to approximate the one or more similarities that the collaborative filtering component uses to generate the recommendation, the approximation component comprising; an adjustable accuracy component configured to receive input indicating a number of sketching functions; a sketch component configured to generate sketching functions according to the number indicated by the input received by the adjustable accuracy component, the generated sketching functions including a first sketching function in a form f(x)=mx+b mod P, wherein m is a first randomly generated integer, b is a second randomly generated integer, and P is a randomly generated prime number, the sketch component configured to randomly generate the first integer, the second integer, and the prime number; a mapping component configured to individually map the users to unique integers and to generate individual data sets linked to each of the items, each data set including one or more of the unique integers to indicate that the one or more users which are mapped to the one or more unique integers rated the linked item above a threshold rating; a min-hashing component configured to; generate hash values by selecting each data set and applying the generated sketching functions to the one or more unique integers in the selected data set; for each sketching function applied to the selected data set, identify a min-hash value from among the hash values generated using the applied sketching function and the selected data set; and associate each identified min-hash value with the item linked to the selected data set; a matching component configured to; compare the min-hash values associated with the items; based on the comparison, calculate a total of min-hash value matches for pairs of items; and generate an approximate Jaccard coefficient for each pair of items by dividing the total of min-hash value matches for the pair of items by the number of sketching functions generated and applied to the data sets linked to the items; and a component configured to determine whether the data associated with a plurality of user sessions is sufficiently large as to warrant approximating the one or more similarities that the collaborative filtering component uses to generate the recommendation rather than calculating the one or more similarities. - View Dependent Claims (2, 3)
-
-
4. By a computing system including a processor and system memory, a method that facilitates utilizing approximations in connection with collaborative filtering, the method comprising:
-
collecting data associated with a plurality of user sessions, the data associated with the user sessions including; first data indicating a plurality of users; second data indicating a plurality of items; and third data indicating interactions with the items by the users; receiving first input indicating a number of sketching functions; generating a plurality of sketching functions according to the number indicated by the first input, the generated sketching functions including a first randomly generated sketching function in a form f(x)=mx+b mod P, wherein m is a first randomly generated integer, b is a second randomly generated integer, and P is a randomly generated prime number; individually mapping the users to unique integers; generating individual data sets linked to each of the items, each data set for each item including one or more of the unique integers to indicate that the one or more users which are mapped to the one or more unique integers interacted with the item; generate hash values by selecting each data set and applying the generated sketching functions to the one or more unique integers in the data set; for each sketching function applied to a selected data set, identify a min-hash value from among the hash values generated using the applied sketching function and the selected data set, and associate the identified min-hash value with the item linked to the selected data set; compare the min-hash values associated with the items; based on the comparison, calculate a total of min-hash value matches for pairs of items; generate an approximate Jaccard coefficient for each pair of items by dividing the total of min-hash value matches for the pair of items by the number of sketching functions generated and applied to the data sets linked to the pair of items; and generating a recommendation based on at least one approximate Jaccard coefficient generated for a pair of items. - View Dependent Claims (5, 6, 7)
-
-
8. A system that enables scalable collaborative filtering using data associated with a plurality of user sessions, the data associated with the user sessions including first data indicating a plurality of users, second data indicating a plurality of items, and third data indicating interactions with the items by the users, the users being individually mapped to unique identifiers, the system comprising:
-
one or more processors; system memory; and one or more computer storage media having stored thereon computer-executable instructions for performing a method, the method including; receiving first input indicating a number of hashing functions; generating a plurality of hashing functions according to the number indicated by the first input, including randomly generating at least one hashing function in a form f(x) =mx+b mod P, wherein m is a first randomly generated integer, b is a second randomly generated integer, and P is a randomly generated prime number; generating individual data sets linked to each of the items, each data set for each item including one or more of the unique identifiers to indicate that the one or more users which are mapped to the one or more unique identifiers interacted with the item; generate hash values by selecting each data set and applying the generated hashing functions to the one or more unique identifiers in the data set; for each hashing function applied to a selected data set, identify a min-hash value from among the hash values generated using the applied hashing function and the selected data set, and associate the identified min-hash value with the item linked to the selected data set; compare the min-hash values associated with the items; based on the comparison, calculate a total of min-hash value matches for pairs of items; generate an approximate Jaccard coefficient for each pair of items by dividing the total of min-hash value matches for the pair of items by the number of hashing functions generated and applied to the data sets linked to the pair of items; and generating a recommendation based on at least one approximate Jaccard coefficient generated for a pair of items. - View Dependent Claims (9, 10, 11)
-
Specification