×

Fast collaborative filtering through sketch function based approximations

  • US 7,624,095 B2
  • Filed: 11/15/2005
  • Issued: 11/24/2009
  • Est. Priority Date: 11/15/2005
  • Status: Expired due to Fees
First Claim
Patent Images

1. A system that enables scalable collaborative filtering, the system comprising:

  • a processor;

    system memory; and

    one or more computer storage media having stored thereon;

    a collaborative filtering component configured to receive data associated with a plurality of user sessions and to generate a recommendation based on one or more similarities among the user sessions, the data associated with the user sessions including;

    first data indicating a plurality of users;

    second data indicating a plurality of items; and

    third data indicating ratings of the items by the users;

    an approximation component configured to approximate the one or more similarities that the collaborative filtering component uses to generate the recommendation, the approximation component comprising;

    an adjustable accuracy component configured to receive input indicating a number of sketching functions;

    a sketch component configured to generate sketching functions according to the number indicated by the input received by the adjustable accuracy component, the generated sketching functions including a first sketching function in a form f(x)=mx+b mod P, wherein m is a first randomly generated integer, b is a second randomly generated integer, and P is a randomly generated prime number, the sketch component configured to randomly generate the first integer, the second integer, and the prime number;

    a mapping component configured to individually map the users to unique integers and to generate individual data sets linked to each of the items, each data set including one or more of the unique integers to indicate that the one or more users which are mapped to the one or more unique integers rated the linked item above a threshold rating;

    a min-hashing component configured to;

    generate hash values by selecting each data set and applying the generated sketching functions to the one or more unique integers in the selected data set;

    for each sketching function applied to the selected data set, identify a min-hash value from among the hash values generated using the applied sketching function and the selected data set; and

    associate each identified min-hash value with the item linked to the selected data set;

    a matching component configured to;

    compare the min-hash values associated with the items;

    based on the comparison, calculate a total of min-hash value matches for pairs of items; and

    generate an approximate Jaccard coefficient for each pair of items by dividing the total of min-hash value matches for the pair of items by the number of sketching functions generated and applied to the data sets linked to the items; and

    a component configured to determine whether the data associated with a plurality of user sessions is sufficiently large as to warrant approximating the one or more similarities that the collaborative filtering component uses to generate the recommendation rather than calculating the one or more similarities.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×