System and method for providing recommendations based on multi-modal user clusters
First Claim
1. A method for providing document recommendations from a document collection based on multi-modal user clusters, comprising the steps of:
- identifying an initial set of users representing a subset of all possible users;
identifying documents in the collection accessed by the initial set of users;
for each user of the initial set of users, extrapolating from the documents accessed by the user to the content of the documents accessed by the user;
clustering the initial set of users into a plurality of user clusters by representing each user of the initial set using the content of the documents accessed by the user;
identifying a new user;
collecting information about documents accessed by the new user;
extrapolating from the documents accessed by the new user to the content of the documents accessed by the new user; and
assigning the new user to a user cluster based upon similarity between the content of the documents accessed by the new user and the content of the documents accessed by other users included in the user cluster.
9 Assignments
0 Petitions
Accused Products
Abstract
A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users'"'"' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.
460 Citations
26 Claims
-
1. A method for providing document recommendations from a document collection based on multi-modal user clusters, comprising the steps of:
-
identifying an initial set of users representing a subset of all possible users;
identifying documents in the collection accessed by the initial set of users;
for each user of the initial set of users, extrapolating from the documents accessed by the user to the content of the documents accessed by the user;
clustering the initial set of users into a plurality of user clusters by representing each user of the initial set using the content of the documents accessed by the user;
identifying a new user;
collecting information about documents accessed by the new user;
extrapolating from the documents accessed by the new user to the content of the documents accessed by the new user; and
assigning the new user to a user cluster based upon similarity between the content of the documents accessed by the new user and the content of the documents accessed by other users included in the user cluster. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
the clustering step employs demographic information to cluster the initial set of users; and
the collecting step further comprises collecting demographic information from the new user.
-
-
3. The method of claim 1, further comprising the step of identifying at least one user in the user cluster.
-
4. The method of claim 1, wherein the initial set of users includes all possible users.
-
5. The method of claim 1, wherein the step of collecting information about user document accesses comprises receiving an initial text query.
-
6. The method of claim 1, further comprising the step of recommending the at least one document to the new user.
-
7. The method of claim 6, wherein the at least one document comprises a plurality of identified documents, and wherein the recommending step comprises the steps of:
-
ranking the plurality of identified documents;
selecting a plurality of recommended documents from the plurality of identified documents, wherein the plurality of recommended documents comprise the most popular documents accessed by the user cluster; and
generating a list of recommended documents from the selected plurality of recommended documents.
-
-
8. The method of claim 7, wherein the recommending step further comprises the step of displaying the list of recommended documents.
-
9. The method of claim 1, wherein the assigning step comprises the steps of:
-
identifying a similar user cluster, wherein the similar user cluster is closer to the new user than any other user cluster; and
assigning the new user to the similar user cluster.
-
-
10. The method of claim 3, further comprising the step of identifying the at least one user to the new user.
-
11. The method of claim 1 wherein each document in the document collection has a first feature and a second feature, the first feature being a text feature, the second feature being a one of a URL feature, an inlink feature, an outlink feature, an image feature, a user information feature and a text genre feature, and
wherein the step of extrapolating from documents accessed by the user to the content of the documents accessed by the user comprises using both the first feature and second feature associated with each document to determine the content of the document. -
12. The method of claim 11, wherein the clustering step utilizes a multi-dimensional vector similarity metric to identify similarities between users.
-
13. A computer-readable medium storing instructions for providing document recommendations from a document collection based on multi-modal user clusters, each document in the collection being represented by a first and second feature, the first feature being a text feature, and the second feature being a one of a URL feature, an inlink feature, an outlink feature, an image feature, a user information feature and a text genre feature, the instructions comprising:
-
identifying an initial set of users representing a subset of all possible users;
determining the documents in the collection accessed by the initial set of users;
for each user of the initial set of users, extrapolating from the documents accessed by the user to the content of the documents accessed by the user;
clustering the initial set of users into a plurality of user clusters by representing each user of the initial set using the content of the documents accessed by the user;
identifying a new user;
collecting information about documents accessed by the new user; and
assigning the new user to a user cluster based upon similarity between documents accessed by the new user and the content of the documents accessed by other users included in the user cluster. - View Dependent Claims (14, 15, 16, 17)
recommending to the new user a plurality of identified documents;
ranking the plurality of identified documents;
selecting a plurality of recommended documents from the plurality of identified documents, wherein the plurality of recommended documents comprise the most popular documents accessed by the user cluster; and
generating a list of recommended documents from the selected plurality of recommended documents.
-
-
16. The computer readable medium of claim 15, wherein the instructions further comprise displaying the list of recommended documents.
-
17. The computer readable medium of claim 16, wherein the assigning instruction comprises:
-
identifying a similar user cluster, wherein the similar user cluster is closer to the new user than any other user cluster; and
assigning the new user to the similar user cluster.
-
-
18. A signal representing instructions for providing page recommendations from a page collection based on multi-modal user clusters, each page in the collection being represented by a first and a second feature vector, each of the first and second feature vectors being a multi-dimensional vector, the first feature vector being representative of a first feature of the pages and the second feature vector being representative of a second feature of the pages, the first feature being a text feature, and the second feature being a one of a set of multi-modal features including a URL feature, an inlink feature, an outlink feature an image feature, a user information feature and a text genre feature, the instructions comprising:
-
identifying an initial set of users representing a subset of all possible users;
determining the pages in the collection accessed by each of the initial set of users;
for each user of the initial set of user, extrapolating from the pages accessed by the user to the content of the pages accessed by the user;
clustering the initial set of users into a plurality of user clusters by representing each user of the initial set using the content of the pages accessed by the user;
identifying a new user;
collecting information about page accesses by the new user; and
assigning the new user to a user cluster based upon similarity between pages accessed by the new user and the user cluster and the content of the pages accessed by other user included in the user cluster. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26)
recommending to the new user a plurality of identified pages;
ranking the plurality of identified pages;
selecting a plurality of recommended pages from the plurality of identified pages, wherein the plurality of recommended pages comprise the most popular pages accessed by the user cluster; and
generating a list of recommended pages from the selected plurality of recommended pages.
-
-
20. The signal of claim 19, wherein the instructions further comprise displaying the list of recommended pages.
-
21. The signal of claim 18, wherein the assigning instruction comprises:
-
identifying a similar user cluster, wherein the similar user cluster is closer to the new user than any other user cluster; and
assigning the new user to the similar user cluster.
-
-
22. The signal of claim 18, wherein the clustering instruction utilizes a multi-dimensional vector similarity metric to identify similarities between the content of pages accessed by the users.
-
23. The signal of claim 18 further comprises the instructions of:
-
generating a matrix P of page accesses using the pages accessed by the users of the initial set, matrix P having with a number of rows np equal to the total number of pages accessed and a number of columns nd equal to the number of users in the initial set of users;
generating a matrix T of text using the pages accessed by the users of the initial set, matrix T having a number of rows nt equal to the number of words in the page collection and a number of columns np equal to the number of pages;
combining matrices P and T to extrapolate to the textual content of the pages accessed by the users of the initial set.
-
-
24. The signal of claim 23 wherein the combining instruction comprises multiplying matrices P and T to generate a matrix PT, matrix PT being a textual representation of the users of the initial set, matrix PT having a number of columns nd equal to the number of users in the initial set of users, each column of matrix PT being a textual representation of the pages accessed by a single user of the initial set.
-
25. The signal of claim 24 wherein the assigning instruction comprises determining the similarity between the new user and the users of the initial set using a cosine distance metric.
-
26. The signal of claim 24 further comprising the instruction of:
-
generating a matrix O associated with the second feature vectors associated with the pages accessed by the initial set of users; and
wherein the instruction of extrapolating to the content of the pages accessed by the initial set of users includes multiplying the matrices O and P.
-
Specification