×

Methods and systems for the analysis of large text corpora

  • US 9,135,242 B1
  • Filed: 03/15/2013
  • Issued: 09/15/2015
  • Est. Priority Date: 10/10/2011
  • Status: Active Grant
First Claim
Patent Images

1. A computerized method for the analysis of textual data, comprising:

  • receiving, from one or more memories at one or more processors, textual data to be analyzed;

    using the one or more processors, formatting the textual data for subsequent analysis;

    using the one or more processors, applying a probabilistic topic model to the textual data to extract a set of semantically meaningful topics that collectively describe all or a portion of the textual data;

    using a keyword weighting module executed on the one or more processors, generating a topic cloud view representing the topics as a tagcloud with each being associated with a plurality of keywords;

    using a topic ordering module executed on the one or more processors, generating a document distribution view representing a distribution of all or a portion of the textual data across multiple topics;

    using a document entropy calculation module executed on the one or more processors, generating a document scatterplot view representing how many topics are attributable to all or a portion of the textual data;

    using a temporal topic trend calculation module executed on the one or more processors, generating a temporal view representing changes in the occurrence of topics over time in relation to all or a portion of the textual data; and

    displaying one or more of the topic cloud view, the document distribution view, the document scatterplot view, and the temporal view to a user in the analysis of all or a portion of the textual data.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×