Clustering system and method
First Claim
Patent Images
1. A system for clustering documents, the system comprising:
- a document vector generator operable to generate a document vector for each of a plurality of documents of a corpus;
a reference vector generator operable to generate a plurality of reference vectors;
a comparator operable to compare the document vectors to each of the reference vectors to generate similarity values for each of the document vectors;
a sorter operable to sort the document vectors based on the similarity values for the document vectors to form a sorted list; and
a cluster generator operable to form clusters of documents based on the similarity between adjacent document vectors in the sorted list.
5 Assignments
0 Petitions
Accused Products
Abstract
In order to clustering documents, document vectors are formed for each of a plurality of documents of a corpus and plurality of reference vectors is generated. The document vectors are then compared to the reference vectors to generate similarity values for each of the document vectors. The document vectors are then sorted based on the similarity values for the document vectors to form a sorted list. Clusters are then formed based on the similarity between adjacent document vectors in the sorted list.
-
Citations
22 Claims
-
1. A system for clustering documents, the system comprising:
-
a document vector generator operable to generate a document vector for each of a plurality of documents of a corpus; a reference vector generator operable to generate a plurality of reference vectors; a comparator operable to compare the document vectors to each of the reference vectors to generate similarity values for each of the document vectors; a sorter operable to sort the document vectors based on the similarity values for the document vectors to form a sorted list; and a cluster generator operable to form clusters of documents based on the similarity between adjacent document vectors in the sorted list. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer implemented method of clustering documents, the method comprising:
-
generating document vectors for each of a plurality of documents of a corpus; generating a plurality of reference vectors; comparing the document vectors to each of the reference vectors to generate similarity values for each of the document vectors; sorting the document vectors based on the similarity values for the document vectors to form a sorted list; and forming clusters of documents based on the similarity between adjacent document vectors in the sorted list. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer program product comprising program instructions, the program instructions being operable to implement a method of clustering documents, the method comprising:
-
generating document vectors for terms of each of a plurality of documents of a corpus; generating a plurality of reference vectors; comparing the document vectors to each of the reference vectors to generate similarity values for each of the document vectors; sorting the document vectors based on the similarity values for the document vectors to form a sorted list; and forming clusters of documents based on the similarity between adjacent document vectors in the sorted list.
-
-
22. Apparatus for clustering documents, the apparatus comprising:
-
means for generating document vectors for each of a plurality of documents of a corpus; means for generating a plurality of reference vectors; means for comparing the document vectors to each of the reference vectors to generate similarity values for each of the document vectors; means for sorting the document vectors based on the similarity values for the document vectors to form a sorted list; and means for forming clusters of documents based on the similarity between adjacent document vectors in the sorted list.
-
Specification