Methods and systems for the analysis of large text corpora
First Claim
1. A computerized method for the analysis of textual data, comprising:
- receiving, from one or more memories at one or more processors, textual data to be analyzed;
using the one or more processors, formatting the textual data for subsequent analysis;
using the one or more processors, applying a probabilistic topic model to the textual data to extract a set of semantically meaningful topics that collectively describe all or a portion of the textual data;
using a keyword weighting module executed on the one or more processors, generating a topic cloud view representing the topics as a tagcloud with each being associated with a plurality of keywords;
using a topic ordering module executed on the one or more processors, generating a document distribution view representing a distribution of all or a portion of the textual data across multiple topics;
using a document entropy calculation module executed on the one or more processors, generating a document scatterplot view representing how many topics are attributable to all or a portion of the textual data;
using a temporal topic trend calculation module executed on the one or more processors, generating a temporal view representing changes in the occurrence of topics over time in relation to all or a portion of the textual data; and
displaying one or more of the topic cloud view, the document distribution view, the document scatterplot view, and the temporal view to a user in the analysis of all or a portion of the textual data.
4 Assignments
0 Petitions
Accused Products
Abstract
Computerized methods and systems for the analysis of textual data, including: receiving, from one or more memories at one or more processors, textual data; using the processors, formatting the textual data for analysis and applying a probabilistic topic model to the textual data to extract semantically meaningful topics that collectively describe it; using a keyword weighting module, generating a topic cloud view representing the topics as a tagcloud with each being associated with a plurality of keywords; using a topic ordering module, generating a document distribution view representing a distribution of the textual data across multiple topics; using a document entropy calculation module, generating a document scatterplot view representing how many topics are attributable to the textual data; using a temporal topic trend calculation module, generating a temporal view representing changes in the occurrence of topics over time; and displaying one or more of the views to a user.
-
Citations
20 Claims
-
1. A computerized method for the analysis of textual data, comprising:
-
receiving, from one or more memories at one or more processors, textual data to be analyzed; using the one or more processors, formatting the textual data for subsequent analysis; using the one or more processors, applying a probabilistic topic model to the textual data to extract a set of semantically meaningful topics that collectively describe all or a portion of the textual data; using a keyword weighting module executed on the one or more processors, generating a topic cloud view representing the topics as a tagcloud with each being associated with a plurality of keywords; using a topic ordering module executed on the one or more processors, generating a document distribution view representing a distribution of all or a portion of the textual data across multiple topics; using a document entropy calculation module executed on the one or more processors, generating a document scatterplot view representing how many topics are attributable to all or a portion of the textual data; using a temporal topic trend calculation module executed on the one or more processors, generating a temporal view representing changes in the occurrence of topics over time in relation to all or a portion of the textual data; and displaying one or more of the topic cloud view, the document distribution view, the document scatterplot view, and the temporal view to a user in the analysis of all or a portion of the textual data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computerized system for the analysis of textual data, comprising:
-
one or more memories operable for storing and one or more processors operable for receiving textual data to be analyzed; an algorithm executed on the one or more processors operable for formatting the textual data for subsequent analysis; an algorithm executed on the one or more processors operable for applying a probabilistic topic model to the textual data to extract a set of semantically meaningful topics that collectively describe all or a portion of the textual data; a keyword weighting module executed on the one or more processors operable for generating a topic cloud view representing the topics as a tagcloud with each being associated with a plurality of keywords; a topic ordering module executed on the one or more processors operable for generating a document distribution view representing a distribution of all or a portion of the textual data across multiple topics; a document entropy calculation module executed on the one or more processors operable for generating a document scatterplot view representing how many topics are attributable to all or a portion of the textual data; a temporal topic trend calculation module executed on the one or more processors operable for generating a temporal view representing changes in the occurrence of topics over time in relation to all or a portion of the textual data; and a display operable for displaying one or more of the topic cloud view, the document distribution view, the document scatterplot view, and the temporal view to a user in the analysis of all or a portion of the textual data. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification