System and method for visually representing the contents of a multiple data object cluster

US 6,564,202 B1
Filed: 10/19/1999
Issued: 05/13/2003
Est. Priority Date: 01/26/1999
Status: Expired due to Term

First Claim

Patent Images

1. A method for visualizing clusters of users represented by way of documents selected from a collection of documents, comprising the steps of:

identifying a selected plurality of users in a user population, wherein the plurality of users share an interest as determined through multi-modal collection use analysis;

for each user in the plurality of users, for each document in the collection, determining a corresponding access probability representing the frequency with which the user has accessed the document;

for each document in the collection, calculating an aggregate access probability across the users in the selected plurality of users, corresponding to the likelihood that a user in the selected plurality will access the document;

displaying a Disk Tree having a plurality of nodes, each node representing a document in the collection; and

highlighting each node in the Disk Tree having an aggregate access probability greater than a desired threshold.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users'"'"' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.

Citations

12 Claims

1. A method for visualizing clusters of users represented by way of documents selected from a collection of documents, comprising the steps of:
- identifying a selected plurality of users in a user population, wherein the plurality of users share an interest as determined through multi-modal collection use analysis;
  
  for each user in the plurality of users, for each document in the collection, determining a corresponding access probability representing the frequency with which the user has accessed the document;
  
  for each document in the collection, calculating an aggregate access probability across the users in the selected plurality of users, corresponding to the likelihood that a user in the selected plurality will access the document;
  
  displaying a Disk Tree having a plurality of nodes, each node representing a document in the collection; and
  
  highlighting each node in the Disk Tree having an aggregate access probability greater than a desired threshold.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, further comprising the step of ranking the documents in order of decreasing aggregate probability.
  - 3. The method of claim 1, wherein the threshold is zero.

4. A method for visualizing clusters of users represented by way of documents selected from a collection of documents, each document in the collection having at least one corresponding feature vector, the method comprising:
- identifying a selected plurality of users in a user population, wherein the plurality of users share an interest;
  
  for each user in the plurality of users, for each document in the collection, determining a corresponding access probability representing the frequency with which the user has accessed the document;
  
  for each document in the collection, calculating an aggregate access probability across the users in the selected plurality of users, corresponding to the likelihood that a user in the selected plurality will access the document; and
  
  visually displaying the aggregate access probability for each document in the collection by;
  
  calculating an aggregate feature vector by summing over all documents in the collection the feature vectors weighted by the corresponding aggregate access probability;
  
  isolating the salient terms of the aggregate feature vector having the largest magnitudes;
  
  determining the salient dimensions of the aggregate feature vector that correspond to the salient terms; and
  
  listing the salient dimensions.
- View Dependent Claims (5, 6)
- - 5. The method of claim 4, wherein the isolating step identifies all terms having a magnitude greater than a magnitude threshold.
  - 6. The method of claim 4, wherein the isolating step identifies a predetermined number of terms.

7. A computer-readable medium storing instructions for visualizing clusters of users represented by way of documents selected from a collection of documents, each document in the collection having at least one corresponding feature vector, the instructions comprising:
- identifying a selected plurality of users in a user population, wherein the plurality of users share an interest;
  
  for each user in the plurality of users, for each document in the collection, identifying a corresponding access probability representing the frequency with which the user has accessed the document;
  
  for each document in the collection, calculating an aggregate access probability across the users in the selected plurality of users, corresponding to the likelihood that a user in the selected plurality will access the document;
  
  calculating an aggregate feature vector by summing over all documents in the collection the feature vectors weighted by the corresponding aggregate access probability;
  
  isolating the salient terms of the aggregate feature vector having the largest magnitudes;
  
  determining the salient dimensions of the aggregate feature vector that correspond to the salient terms; and
  
  visually displaying the listing of the salient dimensions for a subset of the documents.
- View Dependent Claims (8, 9)
- - 8. The computer readable medium of claim 7, wherein the isolating instruction identifies all terms having a magnitude greater than a magnitude threshold.
  - 9. The computer readable medium of claim 7, wherein the instruction identifies a predetermined number of terms.

10. A signal for transmitting computer instructions for visualizing clusters of users represented by way of documents selected from a collection of documents, each document in the collection having at least one corresponding feature vector, the instructions comprising:
- identifying a selected plurality of users in a user population, wherein the plurality of users share an interest;
  
  for each user in the plurality of users, for each document in the collection, determining a corresponding access probability representing the frequency with which the user has accessed the document;
  
  for each document in the collection, calculating an aggregate access probability across the users in the selected plurality of users, corresponding to the likelihood that a user in the selected plurality will access the document;
  
  calculating an aggregate feature vector by summing over all documents in the collection the feature vectors weighted by the corresponding aggregate access probability;
  
  isolating the salient terms of the aggregate feature vector having the largest magnitudes;
  
  determining the salient dimensions of the aggregate feature vector that correspond to the salient terms; and
  
  listing the salient dimensions.
- View Dependent Claims (11, 12)
- - 11. The signal of claim 10, wherein the isolating instruction identifies all terms having a magnitude greater than a magnitude threshold.
  - 12. The signal of claim 10, wherein the isolating instruction identifies a predetermined number of terms.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Li, Jun, Chi, Ed H., Pirolli, Peter L., Pitkow, James E., Schuetze, Hinrich
Primary Examiner(s)
Mizrahi, Diane D.
Assistant Examiner(s)
Mofiz, Apu M

Application Number

US09/421,419
Time in Patent Office

1,302 Days
Field of Search

707/1, 707/2, 707/3, 707/5, 707/6, 707/10, 707/102, 707/104.1, 707/513, 711/121
US Class Current

1/1
CPC Class Codes

G06F 16/30 of unstructured textual dat...

Y10S 707/99932 Access augmentation or opti...

System and method for visually representing the contents of a multiple data object cluster

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for visually representing the contents of a multiple data object cluster

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links