System and method for dynamically evaluating latent concepts in unstructured documents
First Claim
1. A computer-implement system for analyzing unstructured documents for conceptual relationships, comprising:
- a histogram module determining a frequency of occurrences of concepts in a set of unstructured documents, each concept representing an element occurring in one or more of the unstructured documents;
a selection module selecting a subset of concepts out of the frequency of occurrences, grouping one or more concepts from the concepts subset, and assigning weights to one or more clusters of concepts for each group of concepts; and
a best fit module calculating a best fit approximation for each document indexed by each such group of concepts between the frequency of occurrences and the weighted cluster for each such concept grouped into the group of concepts.
12 Assignments
0 Petitions
Accused Products
Abstract
A system and method for dynamically evaluating latent concepts in unstructured documents is disclosed. A multiplicity of concepts are extracted from a set of unstructured documents into a lexicon. The lexicon uniquely identifies each concept and a frequency of occurrence. A frequency of occurrence representation is created for the documents set. The frequency representation provides an ordered corpus of the frequencies of occurrence of each concept. A subset of concepts is selected from the frequency of occurrence representation filtered against a pre-defined threshold. A group of weighted clusters of concepts selected from the concepts subset is generated. A matrix of best fit approximations is determined for each document weighted against each group of weighted clusters of concepts.
-
Citations
44 Claims
-
1. A computer-implement system for analyzing unstructured documents for conceptual relationships, comprising:
-
a histogram module determining a frequency of occurrences of concepts in a set of unstructured documents, each concept representing an element occurring in one or more of the unstructured documents; a selection module selecting a subset of concepts out of the frequency of occurrences, grouping one or more concepts from the concepts subset, and assigning weights to one or more clusters of concepts for each group of concepts; and a best fit module calculating a best fit approximation for each document indexed by each such group of concepts between the frequency of occurrences and the weighted cluster for each such concept grouped into the group of concepts. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-implemented method for analyzing unstructured documents for conceptual relationships, comprising:
-
determining a frequency of occurrences of concepts in a set of unstructured documents, each concept representing an element occurring in one or more of the unstructured documents; selecting a subset of concepts out of the frequency of occurrences; grouping one or more concepts from the concepts subset; assigning weights to one or more clusters of concepts for each group of concepts; and calculating a best fit approximation for each document indexed by each such group of concepts between the frequency of occurrences and the weighted cluster for each such concept grouped into the group of concepts. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer-implemented system for dynamically evaluating latent concepts in unstructured documents, comprising:
-
an extraction module extracting a multiplicity of concepts from a set of unstructured documents into a lexicon uniquely identifying each concept and a frequency of occurrence; a frequency mapping module creating a frequency of occurrence representation for each documents set, the representation providing an ordered corpus of the frequencies of occurrence of each concept; a concept selection module selecting a subset of concepts from the frequency of occurrence representation filtered against a minimal set of concepts each referenced in at least two documents with no document in the corpus being unreferenced; a group generation module generating a group of weighted clusters of concepts selected from the concepts subset; and a best fit module determining a matrix of best fit approximations for each document weighted against each group of weighted clusters of concepts. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A computer-implemented method for dynamically evaluating latent concepts in unstructured documents, comprising:
-
extracting a multiplicity of concepts from a set of unstructured documents into a lexicon uniquely identifying each concept and a frequency of occurrence; creating a frequency of occurrence representation for each documents set, the representation providing an ordered corpus of the frequencies of occurrence of each concept; selecting a subset of concepts from the frequency of occurrence representation filtered against a minimal set of concepts each referenced in at least two documents with no document in the corpus being unreferenced; generating a group of weighted clusters of concepts selected from the concepts subset; and determining a matrix of best fit approximations for each document weighted against each group of weighted clusters of concepts. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
-
Specification