System and method for personalized search, information filtering, and for generating recommendations utilizing statistical latent class models

US 6,687,696 B2
Filed: 07/26/2001
Issued: 02/03/2004
Est. Priority Date: 07/26/2000
Status: Active Grant

First Claim

Patent Images

1. A method in a computer system for training a latent class model comprising the steps:

receiving data in the form of a list of tupels of entities;

receiving a list of parameters, including a number of dimensions to be used in the model training, a predetermined termination condition, and a predetermined fraction of hold out data;

splitting the dataset into training data and hold out data according to the predetermined fraction of hold out data;

applying Tempered Expectation Maximization to the data to train a plurality of latent class models according to the following steps;

computing tempered posterior probabilities for each tupel and each possible state of a corresponding latent class variable;

using these posterior probabilities, updating class conditional probabilities for items, descriptors and attributes, and users;

iterating the steps of computing tempered posterior probabilities and updating class conditional probabilities until the predictive performance on the hold-out data degrades; and

adjusting the temperature parameter and continuing at the step of computing tempered posterior probabilities until the predetermined termination condition is met; and

combining the trained models of different dimensionality into a single model by linearly combining their estimated probabilities.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The disclosed system implements a novel method for personalized filtering of information and automated generation of user-specific recommendations. The system uses a statistical latent class model, also known as Probabilistic Latent Semantic Analysis, to integrate data including textual and other content descriptions of items to be searched, user profiles, demographic information, query logs of previous searches, and explicit user ratings of items. The disclosed system learns one or more statistical models based on available data. The learning may be reiterated once additional data is available. The statistical model, once learned, is utilized in various ways: to make predictions about item relevance and user preferences on un-rated items, to generate recommendation lists of items, to generate personalized search result lists, to disambiguate a users query, to refine a search, to compute similarities between items or users, and for data mining purposes such as identifying user communities.

557 Citations

13 Claims

1. A method in a computer system for training a latent class model comprising the steps:
- receiving data in the form of a list of tupels of entities;
  
  receiving a list of parameters, including a number of dimensions to be used in the model training, a predetermined termination condition, and a predetermined fraction of hold out data;
  
  splitting the dataset into training data and hold out data according to the predetermined fraction of hold out data;
  
  applying Tempered Expectation Maximization to the data to train a plurality of latent class models according to the following steps;
  
  computing tempered posterior probabilities for each tupel and each possible state of a corresponding latent class variable;
  
  using these posterior probabilities, updating class conditional probabilities for items, descriptors and attributes, and users;
  
  iterating the steps of computing tempered posterior probabilities and updating class conditional probabilities until the predictive performance on the hold-out data degrades; and
  
  adjusting the temperature parameter and continuing at the step of computing tempered posterior probabilities until the predetermined termination condition is met; and
  
  combining the trained models of different dimensionality into a single model by linearly combining their estimated probabilities.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method according to claim 1, wherein the entities include at least one of:
    - items;
      
      users;
      
      content descriptors;
      
      attributes; and
      
      preferences.
  - 3. The method according to claim 1, further comprising combining the updated class conditional probabilities with a preference value.
  - 4. The method according to claim 1, wherein the step of combining the trained models includes computing the weights for the trained models being combined to maximize the predictive model performance on the hold-out data.
  - 5. The method according to claim 1, further comprising:
    - iteratively retraining all models based on both the training data and the holdout data.
  - 6. The method according to claim 1, wherein the step of adjusting the temperature parameter is omitted.
  - 7. The method according to claim 1, further comprising:
8. The method according to claim 1, wherein the received data consists only of items characterized by text.
9. The method according to claim 1, wherein the received data consists only of pairs of users and items.
10. The method according to claim 1, wherein the received data consists only of triplets of users, items, and ratings.
11. The method according to claim 1, wherein the received data consists only of:
- pairs of users and items; and
  
  triplets of users;
  
  items and ratings.
12. The method according to claim 1, further comprising:
- extracting hierarchical relationships between groups of data.
13. The method according to claim 1, wherein the step of receiving data includes receiving similarity matrices for the similarity of at least one of:
- items; and
  
  users;
  
  integrating the similarity matrices into the step of updating the tempered posterior probabilities by transforming similarities into probabilities; and
  
  smoothing the estimates of the class conditional probabilities using the transformed similarities.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Recommind Incorporated (Open Text Corporation)
Original Assignee
Recommind Incorporated (Open Text Corporation)
Inventors
Hofmann, Thomas, Puzicha, Jan Christian
Primary Examiner(s)
Rones, Charles
Assistant Examiner(s)
ABEL JALIL, NEVEEN

Application Number

US09/915,755
Publication Number

US 20020107853A1
Time in Patent Office

922 Days
Field of Search

707/1, 707/100, 707/101, 707/104.1, 707/500, 707/3, 707/4, 707/10, 707/200, 707/201, 707/6, 709/203, 709/217, 704/1, 704/9, 704/10, 703/22, 703/10, 705/26
US Class Current

1/1
CPC Class Codes

G06F 16/335   Filtering based on addition...

Y10S 707/99934   Query formulation, input pr...

Y10S 707/99936   Pattern matching access

Y10S 707/99942   Manipulating data structure...

System and method for personalized search, information filtering, and for generating recommendations utilizing statistical latent class models

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

557 Citations

13 Claims

Specification

Use Cases

Quick Links

Others

System and method for personalized search, information filtering, and for generating recommendations utilizing statistical latent class models

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

557 Citations

13 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others