System and method for thematically grouping documents into clusters
First Claim
Patent Images
1. A system for thematically grouping documents into clusters, comprising:
- an extraction module to extract from a plurality of documents, concepts comprising at least one of nouns and noun phrases;
a frequency determination module to determine a number of occurrences for each concept within each document;
a threshold module to select the documents having the concepts with the occurrences that satisfy a bounded range comprising upper edge conditions and lower edge conditions;
a theme generator module to generate themes for the selected documents from the subset of concepts by identifying two or more concepts with common semantic meaning; and
a cluster module to generate clusters of the selected documents based on the themes; and
a processor to execute the modules.
8 Assignments
0 Petitions
Accused Products
Abstract
A system and method for thematically grouping documents into clusters is provided. Concepts are extracted from a plurality of documents. The concepts include nouns or noun phrases. A number of occurrences for each concept are determined within each document. A bounded range is applied to the concepts and a subset of the concepts is selected by removing the concepts that fall outside the bounded range. The bounded range includes upper edge conditions and lower edge conditions. Themes are generated from the subset of concepts by identifying two or more concepts with common semantic meaning. Clusters of the documents are generated based on the themes.
34 Citations
20 Claims
-
1. A system for thematically grouping documents into clusters, comprising:
-
an extraction module to extract from a plurality of documents, concepts comprising at least one of nouns and noun phrases; a frequency determination module to determine a number of occurrences for each concept within each document; a threshold module to select the documents having the concepts with the occurrences that satisfy a bounded range comprising upper edge conditions and lower edge conditions; a theme generator module to generate themes for the selected documents from the subset of concepts by identifying two or more concepts with common semantic meaning; and a cluster module to generate clusters of the selected documents based on the themes; and a processor to execute the modules. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for thematically grouping documents into clusters, comprising the steps of:
-
extracting from a plurality of documents, concepts comprising at least one of nouns and noun phrases; determining a number of occurrences for each concept within each document; selecting the documents having the concepts with the occurrences that satisfy a bounded range comprising upper edge conditions and lower edge conditions; generating themes for the selected documents from the subset of concepts by identifying two or more concepts with common semantic meaning; and generating clusters of the selected documents based on the themes, wherein the steps are performed by a suitably programmed computer. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification