System and method for identifying similarities among objects in a collection
First Claim
1. A computer-implemented method for calculating the similarity between two objects in a collection of objects, wherein each object is associated with at least a first feature vector and a second feature vector, each of the first and second feature vectors being a multi-dimensional vector, the first feature vector being representative of a first feature of the objects and the second feature vector being representative of a second feature of the objects, the first feature being a one of a first set of multi-modal features including a text feature, a URL feature, an inlink feature and an outlink feature, and the second feature being an image feature, comprising the steps of:
- identifying the first feature vector for a first object and the first feature vector of a second object;
computing a first distance metric between the first feature vector for the first object and the first feature vector for the second object;
identifying, without reference to textual information, the second feature vector of the first object and the second feature vector of the second object;
computing a second distance metric between the second feature vector for the first object and the second feature vector for the second object; and
computing a sum of the first distance metric and the second distance metric.
10 Assignments
0 Petitions
Accused Products
Abstract
A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users'"'"' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.
-
Citations
30 Claims
-
1. A computer-implemented method for calculating the similarity between two objects in a collection of objects, wherein each object is associated with at least a first feature vector and a second feature vector, each of the first and second feature vectors being a multi-dimensional vector, the first feature vector being representative of a first feature of the objects and the second feature vector being representative of a second feature of the objects, the first feature being a one of a first set of multi-modal features including a text feature, a URL feature, an inlink feature and an outlink feature, and the second feature being an image feature, comprising the steps of:
-
identifying the first feature vector for a first object and the first feature vector of a second object;
computing a first distance metric between the first feature vector for the first object and the first feature vector for the second object;
identifying, without reference to textual information, the second feature vector of the first object and the second feature vector of the second object;
computing a second distance metric between the second feature vector for the first object and the second feature vector for the second object; and
computing a sum of the first distance metric and the second distance metric. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-readable medium storing instructions for calculating the similarity between two documents in a collection of documents, wherein each document is associated with at least two multi-dimensional vectors representative of a color complexity feature of an image included in the document, the instructions comprising:
-
identifying a first horizontal complexity vector corresponding to a first document without reference to textual information, a first vertical complexity vector corresponding to the first document without reference to textual information, a second horizontal complexity vector corresponding to a second document without reference to textual information, and a second vertical complexity vector corresponding to the second document without reference to textual information; and
computing a distance metric between the first document and the second document, wherein the distance metric comprises a normalized sum of a cosine similarity measure between the first horizontal complexity vector and the second horizontal complexity vector, and between the first vertical complexity vector and the second vertical complexity vector.
-
-
13. A computer-readable medium for transmitting computer instructions calculating the similarity between two objects in a collection of objects, wherein each object is associated with at least a first set of feature vector and a second set of feature vectors, each of the feature vectors of the first and second set of feature vectors being a multi-dimensional vector representative of a feature of an object, the features of the first set of feature vectors being a one of a text feature, a URL feature, an inlink feature and an outlink feature, and the features of the second set of feature vectors being an image feature, the instructions comprising:
-
identifying the first set of feature vectors corresponding to a first object and the first set of feature vectors corresponding to the second object;
identifying without reference to textual information the second set of feature vectors corresponding to the first object and the second set of feature vectors corresponding to the second object;
computing a distance metric between each vector in the sets of feature vectors associated with the first object and each vector in the sets of feature vectors associated with the second object; and
summing the distance metrics into a composite distance metric. - View Dependent Claims (14)
-
-
15. A computer-implemented method for calculating the similarity between characteristics of two users in a population of users of a document collection, wherein each user is associated with a multi-dimensional vector representative of a user feature, and each document in the collection of documents is associated with at least one multi-dimensional vector representative of a document feature, the user feature representing for each user at least a document browsing history comprising the steps of:
-
identifying a first vector corresponding to a first user and a second vector corresponding to a second user; and
wherein the first vector represents a mediated representation of the first user through the document feature corresponding to the documents in the collection accessed by the first user; and
the second vector represents a mediated representation of the second user through the document feature corresponding to the documents in the collection accessed by the second user;
computing a first distance metric between the first vector and the second vector. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22)
-
-
23. A computer-readable medium storing instructions for calculating the similarity between two documents in a collection of documents, wherein each document is associated with at least a first feature vector, a second feature vector, a third feature vector and a fourth feature vector, each of the first, second, third and fourth feature vectors each being a multi-dimensional vector, the first feature vector being representative of a text feature of the documents, the second feature vector being representative of an image feature of the documents, the third feature vector being representative of a text genre feature of the documents, and the fourth feature vector being representative of a link feature of the documents, the instructions comprising:
-
identifying the first, second, third and fourth feature vectors for the first document and the first, second, third and fourth feature vectors of a second document;
computing a first distance metric between the first feature vector for the first document and the first feature vector for the second document;
computing a second distance metric between the second feature vector for the first document and the second feature vector for the second document; and
computing a third distance metric between the third feature vector for the first document and the third feature vector for the second document; and
computing a fourth distance metric between the fourth feature vector for the first document and the fourth feature vector for the second document. - View Dependent Claims (24, 25, 26, 27, 28)
-
-
29. A computer-implemented system for calculating the similarity between two objects in a collection of objects, wherein each object is associated with at least a first feature vector and a second feature vector, each of the first and second feature vectors being a multi-dimensional vector, the first feature vector being representative of a first feature of the objects and the second feature vector being representative of a second feature of the objects, the first feature being a one of a first set of multi-modal features including a text feature, a URL feature, an inlink feature and an outlink feature, and the second feature being an image feature, comprising:
-
means for identifying the first feature vector for a first object and the first feature vector of a second object;
means for computing a first distance metric between the first feature vector for the first object and the first feature vector for the second object;
means for identifying, without reference to textual information, the second feature vector of the first object and the second feature vector of the second object;
means for computing a second distance metric between the second feature vector for the first object and the second feature vector for the second object; and
means for computing a sum of the first distance metric and the second distance metric.
-
-
30. A computer-implemented system for calculating the similarity between two objects in a collection of objects, wherein each object is associated with at least a first feature vector and a second feature vector, each of the first and second feature vectors being a multi-dimensional vector, the first feature vector being representative of a first feature of the objects and the second feature vector being representative of a second feature of the objects, the first feature being a one of a first set of multi-modal features including a text feature, a URL feature, an inlink feature and an outlink feature, and the second feature being an image feature, comprising:
-
a processor adapted to execute instructions; and
a computer-readable memory storing instructions for causing the processor to calculate the similarity between two objects in a collection of objects;
identifying the first feature vector for a first object and the first feature vector of a second object;
computing a first distance metric between the first feature vector for the first object and the first feature vector for the second object;
identifying, without reference to textual information, the second feature vector of the first object and the second feature vector of the second object;
computing a second distance metric between the second feature vector for the first object and the second feature vector for the second object; and
computing a sum of the first distance metric and the second distance metric.
-
Specification