Dynamic reduction of dimensions of a document vector in a document search and retrieval system
First Claim
1. A method for reducing dimensions of a document vector used to determine the similarity of a first document to a plurality of other documents in a computer, the method comprising:
- receiving a document that is input to the computer for determining the similarity of the document to the plurality of other documents;
preprocessing the document to generate a document vector;
reducing a number of dimensions in the document vector;
comparing the document vector to at least one document vector for the plurality of documents to determine a similarity of the document to the plurality of other documents; and
displaying a measure of similarity of the document to the other documents to a human observer.
4 Assignments
0 Petitions
Accused Products
Abstract
The method and system of the invention involves processing each new document (20) coming into the system into a document vector (16), and creating a document vector with reduced dimensionality (17) for comparison with the data model (15) without recomputing the data model (15). These operations are carried out by a first computer (11) while a second computer (12) updates the data model (18), which can be comprised of an initial large group of documents (19) and is premised on the computing an initial data model (13, 14, 15) to provide a reference point for determining document vectors from documents processed from the data stream (20).
43 Citations
18 Claims
-
1. A method for reducing dimensions of a document vector used to determine the similarity of a first document to a plurality of other documents in a computer, the method comprising:
-
receiving a document that is input to the computer for determining the similarity of the document to the plurality of other documents; preprocessing the document to generate a document vector; reducing a number of dimensions in the document vector; comparing the document vector to at least one document vector for the plurality of documents to determine a similarity of the document to the plurality of other documents; and displaying a measure of similarity of the document to the other documents to a human observer. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer system for reducing dimensions of a document vector used to determine the similarity of a first document to a plurality of other documents in the computer system, the system comprising:
-
means for receiving a document that is input to the computer for determining the similarity of the document to the plurality of other documents; means for preprocessing the document to generate a document vector; and means for reducing a number of dimensions in the document vector; means for comparing the document vector to at least one document vector for the plurality of documents to determine a similarity of the document to the plurality of other documents; and means for displaying a measure of similarity of the document to the other documents to a human observer. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification