APPARATUS FOR AUTOMATIC THEME DETECTION FROM UNSTRUCTURED DATA
First Claim
Patent Images
1. A system comprising:
- a repository of unstructured documents;
a theme detection component configured to;
process the unstructured documents;
discover themes;
assign labels to each discovered theme;
identify patterns that describe each theme; and
organize the themes in a hierarchy; and
a user interface configured to;
allow an operator to initiate theme detection by the theme detection component; and
allow an operator to view and interact with the results of the theme detection, wherein the results comprise at least one of the assigned labels, the patterns, and the hierarchy.
7 Assignments
0 Petitions
Accused Products
Abstract
This apparatus provides a system and method of determining significant repeating themes in a collection of documents. The apparatus operates unsupervised and leverages a natural language processing mechanism supported with lexicon, synonym and taxonomy dictionaries to determine themes and establish their relevance using a two-level hierarchical structure. The apparatus also assigns meaningful names to identified themes and determines a set of rules that describe the theme such that it can be applied to categorize other documents outside of the collection as well.
79 Citations
32 Claims
-
1. A system comprising:
-
a repository of unstructured documents; a theme detection component configured to; process the unstructured documents; discover themes; assign labels to each discovered theme; identify patterns that describe each theme; and organize the themes in a hierarchy; and a user interface configured to; allow an operator to initiate theme detection by the theme detection component; and allow an operator to view and interact with the results of the theme detection, wherein the results comprise at least one of the assigned labels, the patterns, and the hierarchy. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method of determining themes from a collection of unstructured text documents, the method comprising:
-
receiving a set of unstructured text documents to process; determining, by a computing system, frequently occurring terms within the set of unstructured text documents; determining, by the computing system, a label for each term in the frequently occurring terms; determining, by the computing system, one or more text patterns, wherein the one or more text patterns are used to identify if the term is contained within a document; and creating, by the computing system, a category model to organize the identified terms as themes of top level themes. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer readable storage medium comprising instructions that if executed enables a computing system to:
-
receive a set of unstructured text documents to process; determine frequently occurring terms within the set of unstructured text documents; determine a label for each term in the frequently occurring terms; determine one or more text patterns, wherein the one or more text patterns are used to identify if the term is contained within a document; and create a category model to organize the identified terms as themes of top level themes. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
Specification