System And Method For Clustering Unstructured Documents
First Claim
1. A system for clustering unstructured documents, comprising:
- a selection module to select documents having terms with frequencies of occurrence that satisfy upper and lower edge conditions;
a concept module to generate concepts for the selected documents; and
a cluster module to group the selected documents into clusters of the documents, comprising;
an evaluation module to evaluate a weight for each of the clusters;
a determination module to determine a similarity value from the frequencies of occurrence for at least one of the terms from the concepts and the cluster weights for each selected document; and
an assignment module to assign each selected document into one such cluster based on the similarity value of the selected document.
12 Assignments
0 Petitions
Accused Products
Abstract
A system and method for clustering unstructured documents is provided. Documents having terms with frequencies of occurrence that satisfy upper and lower edge conditions are selected. Concepts are generated for the selected documents. The selected documents are grouped into clusters of the documents. A weight for each of the clusters is evaluated. A similarity value is determined from the frequencies of occurrence for at least one of the terms from the concepts and the cluster weights for each selected document. Each selected document is assigned into one such cluster based on the similarity value of the selected document.
78 Citations
24 Claims
-
1. A system for clustering unstructured documents, comprising:
-
a selection module to select documents having terms with frequencies of occurrence that satisfy upper and lower edge conditions;
a concept module to generate concepts for the selected documents; and
a cluster module to group the selected documents into clusters of the documents, comprising;
an evaluation module to evaluate a weight for each of the clusters;
a determination module to determine a similarity value from the frequencies of occurrence for at least one of the terms from the concepts and the cluster weights for each selected document; and
an assignment module to assign each selected document into one such cluster based on the similarity value of the selected document. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for clustering unstructured documents, comprising:
-
selecting documents having terms with frequencies of occurrence that satisfy upper and lower edge conditions;
generating concepts for the selected documents; and
grouping the selected documents into clusters of the documents, comprising;
evaluating a weight for each of the clusters;
determining a similarity value from the frequencies of occurrence for at least one of the terms from the concepts and the cluster weights for each selected document; and
assigning each selected document into one such cluster based on the similarity value of the selected document. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A system for providing thematically-grouped documents, comprising:
-
a retrieval manager to extract terms from documents and to tabulate frequencies of occurrence for the terms in the documents;
a text analyzer to generate themes from the terms, comprising;
a selection module to select those terms with the frequencies of occurrence that satisfy upper and lower edge conditions; and
a theme module to group the selected terms into the themes; and
a cluster module to form clusters of the documents based on the themes, each cluster comprising a cluster weight, comprising;
a correlation module to correlate one or more of the themes with each cluster;
a similarity value module to determine a similarity value for each document derived from the frequencies of occurrence of the selected terms in that document and the cluster weight of the frequencies of occurrence of the selected terms for each document grouped in the theme for one such cluster; and
an assignment module to assign each document to one of the clusters based on the similarity value for that document. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A method for providing thematically-grouped documents, comprising:
-
extracting terms from documents and tabulating frequencies of occurrence for the terms in the documents;
generating themes from the terms, comprising;
selecting those terms with the frequencies of occurrence that satisfy upper and lower edge conditions; and
grouping the selected terms into the themes; and
forming clusters of the documents based on the themes, each cluster comprising a cluster weight, comprising;
correlating one or more of the themes with each cluster;
determining a similarity value for each document derived from the frequencies of occurrence of the selected terms in that document and the cluster weight of the frequencies of occurrence of the selected terms for each document grouped in the theme for one such cluster; and
assigning each document to one of the clusters based on the similarity value for that document. - View Dependent Claims (20, 21, 22, 23, 24)
-
Specification