Data Structures for Collaborative Filtering Systems
First Claim
1. A computer-implemented method of creating sketches for use by a collaborative filtering system for recommending items to users comprising:
- accessing, for each user, an item list being a list of items that have been rated by that user;
creating and storing a sketch of each item list being a data structure holding a concise description of the item list;
arranging a sketch creation engine to create each sketch by using a hash function to generate a permutation of the item list, storing a minimum value of the permutation in the sketch, repeatedly generating permutations and storing a minimum value in the sketch for more hash functions, and also storing item ratings in the sketch such that there is one stored item rating associated with each stored minimum value.
2 Assignments
0 Petitions
Accused Products
Abstract
Data structures for collaborative filtering systems are described. In an embodiment sketches which extremely concisely represent a list of items that a user has rated are created and stored for use by a collaborative filtering system to recommend items. For example, the sketches are created by using several versions of a cryptographic hash function to permute the item list and store a minimal value from each permutation in the sketch together with a user rating. In examples the sketches are used to compute estimates of similarity measures between pairs of users such as rank correlations including Spearman'"'"'s Rho and Kendall'"'"'s Tau. For example, the similarity measures are used by a collaborative filtering system to accurately and efficiently recommend items to users. For example the sketches are so concise that massive amounts of data can be taken into account in order to give high quality recommendations in a practical manner.
16 Citations
20 Claims
-
1. A computer-implemented method of creating sketches for use by a collaborative filtering system for recommending items to users comprising:
-
accessing, for each user, an item list being a list of items that have been rated by that user; creating and storing a sketch of each item list being a data structure holding a concise description of the item list; arranging a sketch creation engine to create each sketch by using a hash function to generate a permutation of the item list, storing a minimum value of the permutation in the sketch, repeatedly generating permutations and storing a minimum value in the sketch for more hash functions, and also storing item ratings in the sketch such that there is one stored item rating associated with each stored minimum value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method at a collaborative filtering system of estimating a rank correlation between a pair of users comprising:
-
accessing a sketch for each user of the pair, each sketch being a data structure holding a description of a list of items rated by a user comprising a plurality of item identifiers and a rating for each item identifier;
each sketch being smaller than its associated item list;arranging a processing to identify sketch collisions between the sketches where item identifiers at corresponding positions in the sketches are the same; arranging the processor to examine the ratings of each item that occurs on a sketch collision and to use those ratings to compute an estimate of the rank correlation. - View Dependent Claims (11, 12, 13)
-
-
14. A collaborative filtering system comprising:
-
a memory holding a plurality of sketches, each sketch being a data structure holding a description of a list of items rated by a user comprising a plurality of item identifiers and a rating for each item identifier;
each sketch being smaller than its associated item list;a processor arranged to identify sketch collisions between pairs of the sketches where item identifiers at corresponding positions in the sketches are the same; the processor being arranged to examine the ratings of items that occur on sketch collisions and to use those ratings to compute estimates of a rank correlation between pairs of users; the processor being arranged to predict the rating a target user would give to an unexamined item using at least some of the rank correlation estimates. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification