System and method for clustering unstructured documents
First Claim
Patent Images
1. A system for clustering unstructured documents, comprising:
- a selection module that selects documents having terms with frequencies of occurrence of the terms that satisfy upper edge conditions less than 100% and lower edge conditions greater than 0% from a set of documents;
a concept module that generates concepts based on one or more of the terms for the selected documents; and
a cluster module that groups the selected documents into clusters, comprising;
an evaluation module that evaluates a weight for each of the clusters;
a determination module that determines, for each of the selected documents, inner products of that selected document and each cluster from the frequencies of occurrence for at least one of the terms from the concepts and the cluster weights; and
an assignment module that assigns each selected document into one such cluster based on the inner products of the selected document; and
a processor to execute each of the modules, which are stored on a computer-readable storage medium.
12 Assignments
0 Petitions
Accused Products
Abstract
A system and method for clustering unstructured documents is provided. Documents having terms with frequencies of occurrence that satisfy upper and lower edge conditions are selected. Concepts are generated for the selected documents. The selected documents are grouped into clusters of the documents. A weight for each of the clusters is evaluated. A similarity value is determined from the frequencies of occurrence for at least one of the terms from the concepts and the cluster weights for each selected document. Each selected document is assigned into one such cluster based on the similarity value of the selected document.
53 Citations
16 Claims
-
1. A system for clustering unstructured documents, comprising:
-
a selection module that selects documents having terms with frequencies of occurrence of the terms that satisfy upper edge conditions less than 100% and lower edge conditions greater than 0% from a set of documents; a concept module that generates concepts based on one or more of the terms for the selected documents; and a cluster module that groups the selected documents into clusters, comprising; an evaluation module that evaluates a weight for each of the clusters; a determination module that determines, for each of the selected documents, inner products of that selected document and each cluster from the frequencies of occurrence for at least one of the terms from the concepts and the cluster weights; and an assignment module that assigns each selected document into one such cluster based on the inner products of the selected document; and a processor to execute each of the modules, which are stored on a computer-readable storage medium. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-implemented method for clustering unstructured documents, comprising the steps of:
-
selecting documents having terms with frequencies of occurrence of the terms that satisfy upper edge conditions less than 100% and lower edge conditions greater than 0% from a set of documents; generating concepts based on one or more of the terms for the selected documents; and grouping the selected documents into clusters, comprising; evaluating a weight for each of the clusters; determining, for each of the selected documents, inner products of that selected document and each cluster from the frequencies of occurrence for at least one of the terms from the concepts and the cluster weights; and assigning each selected document into one such cluster based on the inner products of the selected document, wherein all the steps are performed on a suitably programmed computer. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
Specification