System and method for visually representing the contents of a multiple data object cluster
First Claim
1. A method for visualizing clusters of users represented by way of documents selected from a collection of documents, comprising the steps of:
- identifying a selected plurality of users in a user population, wherein the plurality of users share an interest as determined through multi-modal collection use analysis;
for each user in the plurality of users, for each document in the collection, determining a corresponding access probability representing the frequency with which the user has accessed the document;
for each document in the collection, calculating an aggregate access probability across the users in the selected plurality of users, corresponding to the likelihood that a user in the selected plurality will access the document;
displaying a Disk Tree having a plurality of nodes, each node representing a document in the collection; and
highlighting each node in the Disk Tree having an aggregate access probability greater than a desired threshold.
9 Assignments
0 Petitions
Accused Products
Abstract
A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users'"'"' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.
-
Citations
12 Claims
-
1. A method for visualizing clusters of users represented by way of documents selected from a collection of documents, comprising the steps of:
-
identifying a selected plurality of users in a user population, wherein the plurality of users share an interest as determined through multi-modal collection use analysis;
for each user in the plurality of users, for each document in the collection, determining a corresponding access probability representing the frequency with which the user has accessed the document;
for each document in the collection, calculating an aggregate access probability across the users in the selected plurality of users, corresponding to the likelihood that a user in the selected plurality will access the document;
displaying a Disk Tree having a plurality of nodes, each node representing a document in the collection; and
highlighting each node in the Disk Tree having an aggregate access probability greater than a desired threshold. - View Dependent Claims (2, 3)
-
-
4. A method for visualizing clusters of users represented by way of documents selected from a collection of documents, each document in the collection having at least one corresponding feature vector, the method comprising:
-
identifying a selected plurality of users in a user population, wherein the plurality of users share an interest;
for each user in the plurality of users, for each document in the collection, determining a corresponding access probability representing the frequency with which the user has accessed the document;
for each document in the collection, calculating an aggregate access probability across the users in the selected plurality of users, corresponding to the likelihood that a user in the selected plurality will access the document; and
visually displaying the aggregate access probability for each document in the collection by;
calculating an aggregate feature vector by summing over all documents in the collection the feature vectors weighted by the corresponding aggregate access probability;
isolating the salient terms of the aggregate feature vector having the largest magnitudes;
determining the salient dimensions of the aggregate feature vector that correspond to the salient terms; and
listing the salient dimensions. - View Dependent Claims (5, 6)
-
-
7. A computer-readable medium storing instructions for visualizing clusters of users represented by way of documents selected from a collection of documents, each document in the collection having at least one corresponding feature vector, the instructions comprising:
-
identifying a selected plurality of users in a user population, wherein the plurality of users share an interest;
for each user in the plurality of users, for each document in the collection, identifying a corresponding access probability representing the frequency with which the user has accessed the document;
for each document in the collection, calculating an aggregate access probability across the users in the selected plurality of users, corresponding to the likelihood that a user in the selected plurality will access the document;
calculating an aggregate feature vector by summing over all documents in the collection the feature vectors weighted by the corresponding aggregate access probability;
isolating the salient terms of the aggregate feature vector having the largest magnitudes;
determining the salient dimensions of the aggregate feature vector that correspond to the salient terms; and
visually displaying the listing of the salient dimensions for a subset of the documents. - View Dependent Claims (8, 9)
-
-
10. A signal for transmitting computer instructions for visualizing clusters of users represented by way of documents selected from a collection of documents, each document in the collection having at least one corresponding feature vector, the instructions comprising:
-
identifying a selected plurality of users in a user population, wherein the plurality of users share an interest;
for each user in the plurality of users, for each document in the collection, determining a corresponding access probability representing the frequency with which the user has accessed the document;
for each document in the collection, calculating an aggregate access probability across the users in the selected plurality of users, corresponding to the likelihood that a user in the selected plurality will access the document;
calculating an aggregate feature vector by summing over all documents in the collection the feature vectors weighted by the corresponding aggregate access probability;
isolating the salient terms of the aggregate feature vector having the largest magnitudes;
determining the salient dimensions of the aggregate feature vector that correspond to the salient terms; and
listing the salient dimensions. - View Dependent Claims (11, 12)
-
Specification