Computer method and apparatus for clustering documents and automatic generation of cluster keywords
First Claim
1. A computer method for clustering documents comprising the steps of:
- in a digital processor, representing each document by a respective M dimensional vector in a matrix, where M equals a number of words in a predefined domain of document terms, such that an initial matrix of documents is formed;
reducing dimensionality of the initial matrix to form resultant vectors of the documents; and
clustering the resultant vectors such that different respective documents are grouped into a plurality of clusters.
3 Assignments
0 Petitions
Accused Products
Abstract
A computer method and apparatus determines keywords of documents. An initial document by term matrix is formed, each document being represented by a respective M dimensional vector, where M represents the number of terms or words in a predetermined domain of documents. The dimensionality of the initial matrix is reduced to form resultant vectors of the documents. The resultant vectors are then clustered such that correlated documents are grouped into respective clusters. For each cluster, the terms having greatest impact on the documents in that cluster are identified. The identified terms represent key words of each document in that cluster. Further, the identified terms form a cluster summary indicative of the documents in that cluster.
413 Citations
20 Claims
-
1. A computer method for clustering documents comprising the steps of:
-
in a digital processor, representing each document by a respective M dimensional vector in a matrix, where M equals a number of words in a predefined domain of document terms, such that an initial matrix of documents is formed; reducing dimensionality of the initial matrix to form resultant vectors of the documents; and clustering the resultant vectors such that different respective documents are grouped into a plurality of clusters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 20)
-
-
11. A computer apparatus for clustering documents comprising:
-
a memory for storing documents, each one of said documents represented by a respective M dimensional vector in a matrix where M equals a number of words in a predefined domain of document terms such that an initial matrix of documents is stored; a dimensionality reducer for reducing the dimensionality of the initial matrix to form resultant vectors of the documents, the reducer executed on a processor coupled to the memory; a clusterer coupled to the reducer for receiving and clustering the resultant vectors such that different respective documents are grouped into different clusters. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
Specification