SYSTEM AND METHOD FOR IDENTIFYING SIMILARITIES AMONG OBJECTS IN A COLLECTION

US 20030074369A1
Filed: 10/19/1999
Published: 04/17/2003
Est. Priority Date: 01/26/1999
Status: Active Grant

First Claim

Patent Images

1. A method for calculating the similarity between two objects in a collection of objects, wherein each object is associated with at least one multi-dimensional vector representative of a feature of the object, comprising the steps of:

identifying a first vector corresponding to a first feature of a first object and a second vector corresponding to a first feature of a second object; and

computing a first distance metric between the first vector and the second vector.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users'"'"' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.

218 Citations

27 Claims

1. A method for calculating the similarity between two objects in a collection of objects, wherein each object is associated with at least one multi-dimensional vector representative of a feature of the object, comprising the steps of:
- identifying a first vector corresponding to a first feature of a first object and a second vector corresponding to a first feature of a second object; and
  
  computing a first distance metric between the first vector and the second vector.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein each object corresponds to a document in a collection of documents.
  - 3. The method of claim 2, wherein the first feature comprises a text feature, and wherein the first distance metric comprises a cosine similarity measure between the first vector and the second vector.
  - 4. The method of claim 2, wherein the first feature comprises a URL feature, and wherein the first distance metric comprises a cosine similarity measure between the first vector and the second vector.
  - 5. The method of claim 2, wherein the first feature comprises an inlink feature, and wherein the first distance metric comprises a cosine similarity measure between the first vector and the second vector.
  - 6. The method of claim 2, wherein the first feature comprises an outlink feature, and wherein the first distance metric comprises a cosine similarity measure between the first vector and the second vector.
  - 7. The method of claim 2, wherein the first feature comprises a text genre feature, and wherein the first distance metric comprises a cosine similarity measure between the first vector and the second vector.
  - 8. The method of claim 2, wherein the first feature comprises a color histogram feature, and wherein the first distance metric comprises a cosine similarity measure between the first vector and the second vector.
  - 9. The method of claim 2, wherein the first feature comprises a color histogram feature, and wherein the first distance metric comprises a normalized intersection measure between the first vector and the second vector.
  - 10. The method of claim 2, wherein the first feature comprises a color complexity feature, and wherein the first distance metric comprises a cosine similarity measure between the first vector and the second vector.
  - 11. The method of claim 1, further comprising the steps of:
    - identifying a third vector corresponding to a second feature of the first object and a fourth vector corresponding to a second feature of the second object;
      
      computing a second distance metric between the third vector and the fourth vector; and
      
      computing a sum of the first distance metric and the second distance metric.
  - 12. The method of claim 11, wherein the step of computing a sum comprises uses a first weighting factor for the first distance metric and a second weighting factor for the second distance metric.

13. A method for calculating the similarity between two documents in a collection of documents, wherein each document is associated with at least two multi-dimensional vectors representative of a color complexity feature of the object, comprising the steps of:
- identifying a first horizontal complexity vector corresponding to a first document, a first vertical complexity vector corresponding to the first document, a second horizontal complexity vector corresponding to a second document, and a second vertical complexity vector corresponding to the second document; and
  
  computing a distance metric between the first document and the second document, wherein the distance metric comprises a normalized sum of a cosine similarity measure between the first horizontal complexity vector and the second horizontal complexity vector, and between the first vertical complexity vector and the second vertical complexity vector.

14. A method for calculating the similarity between two objects in a collection of objects, wherein each object is associated with a plurality of multi-dimensional vectors representative of a plurality of corresponding features of the object, comprising the steps of:
- for each feature, identifying a first vector corresponding to a first object and a second vector corresponding to a second object, for each feature, computing a distance metric between the first vector and the second vector; and
  
  summing the distance metrics for each feature into an aggregate distance metric.
- View Dependent Claims (15)
- - 15. The method of claim 14, wherein the step of summing the distance metric uses a distinct weighting factor for each distance metric.

16. A method for calculating the similarity between two objects in a collection of objects, wherein each object is associated with a plurality of multi-dimensional vectors representative of a feature of the object, comprising the steps of:
- identifying a first set of vectors corresponding to a first object and a second set of vectors corresponding to a second object, wherein the number of vectors in the first set is equal to the number of vectors in the second set;
  
  computing a distance metric between each vector in the first set and a corresponding vector in the second set; and
  
  summing the distance metrics into a composite distance metric.
- View Dependent Claims (17)
- - 17. The method of claim 16, wherein the step of summing the distance metrics uses a distinct weighting factor for each vector in the first set of vectors and corresponding vector in the second set of vectors.

18. A method for calculating the similarity between two users in a user population, wherein each user is associated with a multi-dimensional vector representative of a user feature, comprising the steps of:
- identifying a first vector corresponding to a first user and a second vector corresponding to a second user; and
  
  computing a first distance metric between the first vector and the second vector.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27)
- - 19. The method of claim 18, wherein the user feature comprises demographic information.
  - 20. The method of claim 18, wherein the user feature comprises group membership information.
  - 21. The method of claim 18, wherein the user feature comprises the user'"'"'s access of documents within a collection of documents.
  - 22. The method of claim 21, wherein each document in the collection of documents is associated with at least one multi-dimensional vector representative of a document feature, and wherein:
    - the first vector represents a mediated representation of the first user through the document feature corresponding to the documents in the collection accessed by the first user; and
      
      the second vector represents a mediated representation of the second user through the document feature corresponding to the documents in the collection accessed by the second user.
  - 23. The method of claim 22, wherein the mediated representation is calculated by multiplying a first matrix and a second matrix, wherein:
    - the first matrix comprises a first plurality of column vectors each representing a document in the collection by way of the document feature; and
      
      the second matrix comprises a second plurality of column vectors each representing a user in the user population by way of document accesses.
  - 24. The method of claim 23, wherein the document feature comprises the text represented by documents in the collection.
  - 25. The method of claim 23, wherein the document feature comprises the outlinks represented by documents in the collection.
  - 26. The method of claim 23, wherein the document feature comprises the inlinks represented by documents in the collection.
  - 27. The method of claim 23, wherein the document feature comprises the URLs represented by documents in the collection.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Li, Jun, Chi, Ed H., Chen, Francine R., Pirolli, Peter L., Pitkow, James E., Schuetze, Hinrich

Granted Patent

US 6,941,321 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/103R
CPC Class Codes

G06F 16/355   Class or cluster creation o...

G06F 16/904   Browsing; Visualisation the...

Y10S 707/99944   Object-oriented database st...

Y10S 707/99945   Object-oriented database st...

SYSTEM AND METHOD FOR IDENTIFYING SIMILARITIES AMONG OBJECTS IN A COLLECTION

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

218 Citations

27 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR IDENTIFYING SIMILARITIES AMONG OBJECTS IN A COLLECTION

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

218 Citations

27 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links