System and method for personalized search, information filtering, and for generating recommendations utilizing statistical latent class models
First Claim
1. A method in a computer system for training a latent class model comprising the steps:
- receiving data in the form of a list of tupels of entities;
receiving a list of parameters, including a number of dimensions to be used in the model training, a predetermined termination condition, and a predetermined fraction of hold out data;
splitting the dataset into training data and hold out data according to the predetermined fraction of hold out data;
applying Tempered Expectation Maximization to the data to train a plurality of latent class models according to the following steps;
computing tempered posterior probabilities for each tupel and each possible state of a corresponding latent class variable;
using these posterior probabilities, updating class conditional probabilities for items, descriptors and attributes, and users;
iterating the steps of computing tempered posterior probabilities and updating class conditional probabilities until the predictive performance on the hold-out data degrades; and
adjusting the temperature parameter and continuing at the step of computing tempered posterior probabilities until the predetermined termination condition is met; and
combining the trained models of different dimensionality into a single model by linearly combining their estimated probabilities.
1 Assignment
0 Petitions
Accused Products
Abstract
The disclosed system implements a novel method for personalized filtering of information and automated generation of user-specific recommendations. The system uses a statistical latent class model, also known as Probabilistic Latent Semantic Analysis, to integrate data including textual and other content descriptions of items to be searched, user profiles, demographic information, query logs of previous searches, and explicit user ratings of items. The disclosed system learns one or more statistical models based on available data. The learning may be reiterated once additional data is available. The statistical model, once learned, is utilized in various ways: to make predictions about item relevance and user preferences on un-rated items, to generate recommendation lists of items, to generate personalized search result lists, to disambiguate a users query, to refine a search, to compute similarities between items or users, and for data mining purposes such as identifying user communities.
557 Citations
13 Claims
-
1. A method in a computer system for training a latent class model comprising the steps:
-
receiving data in the form of a list of tupels of entities;
receiving a list of parameters, including a number of dimensions to be used in the model training, a predetermined termination condition, and a predetermined fraction of hold out data;
splitting the dataset into training data and hold out data according to the predetermined fraction of hold out data;
applying Tempered Expectation Maximization to the data to train a plurality of latent class models according to the following steps;
computing tempered posterior probabilities for each tupel and each possible state of a corresponding latent class variable;
using these posterior probabilities, updating class conditional probabilities for items, descriptors and attributes, and users;
iterating the steps of computing tempered posterior probabilities and updating class conditional probabilities until the predictive performance on the hold-out data degrades; and
adjusting the temperature parameter and continuing at the step of computing tempered posterior probabilities until the predetermined termination condition is met; and
combining the trained models of different dimensionality into a single model by linearly combining their estimated probabilities. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
splitting the training data into a plurality of blocks and updating the tempered posterior probabilities after the posterior probabilities have been computed for all observations in one block.
-
-
8. The method according to claim 1, wherein the received data consists only of items characterized by text.
-
9. The method according to claim 1, wherein the received data consists only of pairs of users and items.
-
10. The method according to claim 1, wherein the received data consists only of triplets of users, items, and ratings.
-
11. The method according to claim 1, wherein the received data consists only of:
- pairs of users and items; and
triplets of users;
items and ratings.
- pairs of users and items; and
-
12. The method according to claim 1, further comprising:
extracting hierarchical relationships between groups of data.
-
13. The method according to claim 1, wherein the step of receiving data includes receiving similarity matrices for the similarity of at least one of:
- items; and
users;integrating the similarity matrices into the step of updating the tempered posterior probabilities by transforming similarities into probabilities; and
smoothing the estimates of the class conditional probabilities using the transformed similarities.
- items; and
Specification