Data structures for collaborative filtering systems
First Claim
1. A computer-implemented method comprising:
- accessing an item list that includes one or more items that have been rated by a user; and
creating and storing a sketch of the item list, the sketch being a data structure storing a concise description of the item list, wherein creating each sketch includes;
selecting a hash function from a plurality of hash functions of a type;
using the hash function to generate a permutation of the item list, the permutation including a plurality of hashed values, each of the plurality of hashed values corresponding to at least one item in the item list;
storing a minimum value of the permutation in the sketch, the minimum value being a minimum of the plurality of hashed values;
repeatedly generating other permutations of the item list, using other hash functions selected from the plurality of hash functions of the type;
storing other minimum values of the other permutations in the sketch; and
storing item ratings in the sketch such that there is one stored item rating associated with each stored minimum value, the one stored item rating being a rating, made by the user, associated with an item represented by the stored minimum value.
2 Assignments
0 Petitions
Accused Products
Abstract
Data structures for collaborative filtering systems are described. In an embodiment sketches which extremely concisely represent a list of items that a user has rated are created and stored for use by a collaborative filtering system to recommend items. For example, the sketches are created by using several versions of a cryptographic hash function to permute the item list and store a minimal value from each permutation in the sketch together with a user rating. In examples the sketches are used to compute estimates of similarity measures between pairs of users such as rank correlations including Spearman'"'"'s Rho and Kendall'"'"'s Tau. For example, the similarity measures are used by a collaborative filtering system to accurately and efficiently recommend items to users. For example the sketches are so concise that massive amounts of data can be taken into account in order to give high quality recommendations in a practical manner.
17 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
accessing an item list that includes one or more items that have been rated by a user; and creating and storing a sketch of the item list, the sketch being a data structure storing a concise description of the item list, wherein creating each sketch includes; selecting a hash function from a plurality of hash functions of a type; using the hash function to generate a permutation of the item list, the permutation including a plurality of hashed values, each of the plurality of hashed values corresponding to at least one item in the item list; storing a minimum value of the permutation in the sketch, the minimum value being a minimum of the plurality of hashed values; repeatedly generating other permutations of the item list, using other hash functions selected from the plurality of hash functions of the type; storing other minimum values of the other permutations in the sketch; and storing item ratings in the sketch such that there is one stored item rating associated with each stored minimum value, the one stored item rating being a rating, made by the user, associated with an item represented by the stored minimum value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method comprising:
-
accessing a sketch for each user of a pair of users, each sketch being a data structure holding a description of a list of items rated by a user comprising a plurality of item identifiers and a rating for each item identifier;
each sketch being smaller than its associated item list, each sketch created based on a plurality of permutations of the list of items, and each permutation generated using a hash function selected from a plurality of hash functions of a same type, a size of the sketch being determined based on an amount of memory available;arranging a processor to identify sketch collisions between the sketches where item identifiers at corresponding positions in the sketches are the same; and arranging the processor to examine the ratings of each item that occurs on a sketch collision and to use those ratings to compute an estimate of the rank correlation. - View Dependent Claims (11, 12, 13)
-
-
14. A system comprising:
-
a memory holding a plurality of sketches, each sketch being a data structure holding a description of a list of items rated by a user comprising a plurality of item identifiers and a rating for each item identifier;
each sketch being smaller than its associated item list, each sketch created based on a plurality of permutations of the list of items, and each permutation generated using a hash function selected from a plurality of hash functions of a same type; anda processor arranged to; identify sketch collisions between pairs of the sketches where item identifiers at corresponding positions in the sketches are the same; examine the ratings of items that occur on sketch collisions and to use those ratings to compute estimates of a rank correlation between pairs of users; and predict the rating a target user would give to an unexamined item using at least some of the rank correlation estimates. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification