Data analytics system and methods for text data
First Claim
1. A device, comprising:
- a processing system including a processor; and
a memory that stores executable instructions that, when executed by the processing system, facilitate performance of operations, comprising;
performing a statistical natural language processing analysis on a plurality of text documents to determine a plurality of topics, wherein prior to performing the statistical natural language processing analysis, a training is performed on sample documents to determine parameters for the statistical natural language processing analysis;
creating a proper subset of topics from the plurality of topics, based on user input;
mapping a topic in the proper subset of topics to each document in the plurality of text documents, thereby creating a plurality of topic-document pairs;
for each topic-document pair of the plurality of topic-document pairs, identifying a bias from text in a corresponding document of the topic-document pair;
creating clusters of topics from the proper subset of topics, wherein each cluster of topics is determined from the bias of each topic-document pair and a frequency of occurrence of each topic in the document identified by the topic-document pair, and wherein the clusters of topics have an image configuration based on the bias and the frequency of occurrence that distinguishes one cluster from another; and
generating presentable content depicting each cluster of the clusters of topics according to a corresponding image configuration, wherein the image configuration specifies that an area for each cluster of topics is subdivided into separate sub-areas for each topic, wherein the sub-area for each topic represents a frequency of occurrence of that topic.
1 Assignment
0 Petitions
Accused Products
Abstract
Aspects of the subject disclosure may include, for example, a computer that performs a statistical natural language processing analysis on a plurality of text documents to determine a plurality of topics, creates a proper subset of topics from the plurality of topics, based on user input, maps one or more topics in the proper subset of topics to each document in the plurality of text documents, thereby creating a plurality of topic-document pairs, identifies n-dimensions of bias for each topic-document pair from the text, creates clusters of topics from the proper subset of topics, and generates presentable content depicting each cluster of the clusters of topics according to a corresponding image configuration. The topics and n-dimensions of bias data can be further analyzed with co-collected structured data for statistical relationships. The topics and n-dimensions of bias data can be used for a publisher-subscriber network that uses content-driven routing when delivering raw data and summarized data via the network. Other embodiments are disclosed.
-
Citations
20 Claims
-
1. A device, comprising:
-
a processing system including a processor; and a memory that stores executable instructions that, when executed by the processing system, facilitate performance of operations, comprising; performing a statistical natural language processing analysis on a plurality of text documents to determine a plurality of topics, wherein prior to performing the statistical natural language processing analysis, a training is performed on sample documents to determine parameters for the statistical natural language processing analysis; creating a proper subset of topics from the plurality of topics, based on user input; mapping a topic in the proper subset of topics to each document in the plurality of text documents, thereby creating a plurality of topic-document pairs; for each topic-document pair of the plurality of topic-document pairs, identifying a bias from text in a corresponding document of the topic-document pair; creating clusters of topics from the proper subset of topics, wherein each cluster of topics is determined from the bias of each topic-document pair and a frequency of occurrence of each topic in the document identified by the topic-document pair, and wherein the clusters of topics have an image configuration based on the bias and the frequency of occurrence that distinguishes one cluster from another; and generating presentable content depicting each cluster of the clusters of topics according to a corresponding image configuration, wherein the image configuration specifies that an area for each cluster of topics is subdivided into separate sub-areas for each topic, wherein the sub-area for each topic represents a frequency of occurrence of that topic. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory computer-readable storage medium, comprising executable instructions that, when executed by a processing system including a processor, facilitate performance of operations, comprising:
performing training on a plurality of sample documents to determine parameters for further analysis in order to control a number of topics determined by the further analysis, the further analysis comprising; determining a plurality of topics from a plurality of text documents; mapping one or more topics in the plurality of topics to each document in the plurality of text documents; reducing the plurality of topics into a proper subset of topics based on a frequency of occurrence of each topic in the plurality of text documents; identifying n-dimensions of bias for each topic in the proper subset of topics, the n-dimensions of bias identified from text in a corresponding document mapped to the topic; creating clusters of topics from the proper subset of topics, wherein each cluster of topics in the clusters of topics is determined from a latent semantic analysis comprising singular value decomposition into orthogonal dimensions, wherein each cluster of topics has an image configuration based on the n-dimensions of bias and the frequency of occurrence for topics in the clusters of topics that distinguishes one cluster from another; and generating presentable content illustrating each cluster of the clusters of topics according to a corresponding image configuration. - View Dependent Claims (13, 14, 15, 16, 17)
-
18. A method, comprising:
-
performing, by a system comprising a processor, a latent Dirichlet allocation of a plurality of text documents to determine a plurality of topics, wherein the plurality of topics are determined according to parameters, wherein the parameters are determined by training on a sample of documents in order to control a number of the plurality of topics that are determined; creating, by the system, a proper subset of topics from the plurality of topics, based on user input; mapping, by the system, one or more topics in the proper subset of topics to each document in the plurality of text documents, thereby creating a plurality of topic-document pairs; performing, by the system, a latent semantic analysis of text in the document associated with each topic-document pair to determine n-dimensions of bias for each topic-document pair; creating, by the system, clusters of topics from the proper subset of topics, wherein each cluster of topics is determined from a value for each bias dimension of each topic-document pair and a frequency of occurrence of each topic in the plurality of text documents; and generating, by the system, presentable content that illustrates each cluster of the clusters of topics according to a corresponding image configuration, wherein the image configuration is based on all or a subset of the bias dimensions and the frequency of occurrence of topics in a cluster that distinguishes the cluster from other clusters. - View Dependent Claims (19, 20)
-
Specification