System and method for quantitatively representing data objects in vector space
First Claim
1. A method for quantitatively representing digital documents in a vector space, comprising the steps of:
- identifying a first digital document to be processed from a plurality of digital documents;
extracting a first feature corresponding to the first document from the plurality of digital documents, the first feature comprising text surrounding an image included in the digital document, the text surrounding the image not being anchor text;
converting the first feature to a first vector; and
associating the first vector with the first digital document.
9 Assignments
0 Petitions
Accused Products
Abstract
A system and method for browsing, retrieving, and recommending information from a collection uses multi-modal features of the documents in the collection, as well as an analysis of users'"'"' prior browsing and retrieval behavior. The system and method are premised on various disclosed methods for quantitatively representing documents in a document collection as vectors in multi-dimensional vector spaces, quantitatively determining similarity between documents, and clustering documents according to those similarities. The system and method also rely on methods for quantitatively representing users in a user population, quantitatively determining similarity between users, clustering users according to those similarities, and visually representing clusters of users by analogy to clusters of documents.
256 Citations
44 Claims
-
1. A method for quantitatively representing digital documents in a vector space, comprising the steps of:
-
identifying a first digital document to be processed from a plurality of digital documents;
extracting a first feature corresponding to the first document from the plurality of digital documents, the first feature comprising text surrounding an image included in the digital document, the text surrounding the image not being anchor text;
converting the first feature to a first vector; and
associating the first vector with the first digital document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A signal representing instructions for quantitatively representing in a vector space users of a collection of digital documents, the instructions comprising:
-
identifying a first user to be processed from the users of the collection of digital documents;
extracting from the collection of digital documents a first feature representing a first sub-set of digital documents of the collection that have been accessed by the first user;
converting the first feature to a first vector; and
associating the first vector with the first user. - View Dependent Claims (16, 17)
-
-
18. A computer-readable medium containing instructions for causing a computer-system to quantitatively represent digital documents in a vector space, by the steps of:
-
identifying a digital document to be processed from a plurality of digital documents;
selecting an image feature as a first feature, the image feature being associated with the non-text content of an image included in the digital document;
extracting from the document information associated with the first feature;
converting information associated with the first feature into a first vector;
associating the first vector with the digital document;
selecting a second feature from a set of multi-modal features including a user information feature and a genre feature;
extracting from the document information associated with the second feature;
converting the information associated with the second feature into a second vector; and
associating the second vector with the digital document. - View Dependent Claims (19, 20, 21, 22)
-
-
23. A method for quantitatively representing digital documents in a vector space, comprising the steps of:
-
identifying a first digital document to be processed from a plurality of digital documents;
extracting a first feature corresponding to the first digital document from the plurality of digital documents, the first feature comprising an image feature associated with non-text content of an image included in the first digital document;
converting the first feature to a first vector;
associating the first vector with the first digital document;
extracting a second feature corresponding to the digital document, the second feature comprising a one of a user feature and a text genre feature;
converting the second feature into a second vector; and
associating the second vector with the first digital document. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
-
-
43. A system for quantitatively representing digital documents in a vector space, comprising:
comprising; a processor adapted to execute instructions; and
a computer-readable memory storing instructions for causing the processor to quantitatively represent digital documents in a vector space;
identifying a first digital document to be processed from a plurality of digital documents;
extracting a first feature corresponding to the first document from the plurality of digital documents, the first feature comprising text surrounding an image included in the digital document, the text surrounding the image not being anchor text;
converting the first feature to a first vector; and
associating the first vector with the first digital document.
-
44. A system for quantitatively representing digital documents in a vector space, comprising:
comprising; a processor adapted to execute instructions; and
a computer-readable memory storing instructions for causing the processor to quantitatively represent digital documents in a vector space;
identifying a first digital document to be processed from a plurality of digital documents;
extracting a first feature corresponding to the first digital document from the plurality of digital documents, the first feature comprising an image feature associated with non-text content of an image included in the first digital document;
converting the first feature to a first vector;
associating the first vector with the first digital document;
extracting a second feature corresponding to the digital document, the second feature comprising a one of a user feature and a text genre feature;
converting the second feature into a second vector; and
associating the second vector with the first digital document.
Specification