DATA ANALYTICS SYSTEM AND METHODS FOR TEXT DATA
First Claim
1. A device, comprising:
- a processing system including a processor; and
a memory that stores executable instructions that, when executed by the processing system, facilitate performance of operations, the operations comprising;
performing a statistical, natural-language processing analysis on a plurality of text documents to determine a plurality of topics;
associating a topic of the plurality of topics with each document in the plurality of text documents, thereby creating a plurality of topic-document pairs;
for each topic-document pair of the plurality of topic-document pairs, identifying a bias from text in a document identified by the topic-document pair; and
creating a plurality of topic clusters,wherein the plurality of topic clusters are determined from the bias of each topic-document pair and a frequency of occurrence of each topic in the document identified by the topic-document pair, andwherein the plurality of topic clusters have an image configuration based on the bias and the frequency of occurrence that distinguishes one topic cluster of the plurality of topic clusters from another.
1 Assignment
0 Petitions
Accused Products
Abstract
Aspects of the subject disclosure may include, for example, a process that performs a statistical, natural-language processing analysis on a group of text documents to determine a group of topics. The topics are determined according to parameters obtained by training on a sample of documents. One or more topics in a subset of topics are associated to each document, resulting in topic-document pairs. A bias is identified for each topic-document pair, and clusters of topics are created from the subset of topics. Each cluster of topics is determined from a value for each bias of each topic-document pair and from a frequency of occurrence of each topic. Each cluster is presentable according to a corresponding image configuration based on all or a subset of the bias dimensions and the frequency of occurrence of topics in a cluster that distinguishes the cluster from other clusters. Other embodiments are disclosed.
-
Citations
20 Claims
-
1. A device, comprising:
-
a processing system including a processor; and a memory that stores executable instructions that, when executed by the processing system, facilitate performance of operations, the operations comprising; performing a statistical, natural-language processing analysis on a plurality of text documents to determine a plurality of topics; associating a topic of the plurality of topics with each document in the plurality of text documents, thereby creating a plurality of topic-document pairs; for each topic-document pair of the plurality of topic-document pairs, identifying a bias from text in a document identified by the topic-document pair; and creating a plurality of topic clusters, wherein the plurality of topic clusters are determined from the bias of each topic-document pair and a frequency of occurrence of each topic in the document identified by the topic-document pair, and wherein the plurality of topic clusters have an image configuration based on the bias and the frequency of occurrence that distinguishes one topic cluster of the plurality of topic clusters from another. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory, machine-readable storage medium, comprising executable instructions that, when executed by a processing system including a processor, facilitate performance of operations, the operations comprising:
-
determining a plurality of topics from a plurality of text documents according to a statistical, natural-language processing analysis, wherein the analysis uses parameters determined from a plurality of sample documents; associating one or more topics in the plurality of topics to each document in the plurality of text documents, wherein the plurality of topics is reduced into a proper subset of topics based on a frequency of occurrence of each topic in the plurality of text documents; identifying a bias for each topic in the proper subset of topics, wherein the bias is identified according to text in a corresponding document mapped to the topic; and creating clusters of topics from the proper subset of topics, wherein each cluster of topics in the clusters of topics is determined from a latent semantic analysis comprising singular value decomposition into orthogonal dimensions, wherein each cluster of topics has an image configuration based on the bias and the frequency of occurrence for topics in the clusters of topics that distinguishes one cluster from another. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A method, comprising:
-
performing, by a system comprising a processor, a statistical natural language processing analysis on a plurality of text documents to determine a plurality of topics, wherein the plurality of topics are determined according to parameters that are determined by training on a sample of documents in order to control a number of the plurality of topics that are determined; associating, by the system, one or more topics in a subset of topics to each document in the plurality of text documents, thereby creating a plurality of topic-document pairs; identifying, by the system, a bias for each topic-document pair of the plurality of topic-document pairs; and creating, by the system, clusters of topics from the subset of topics, wherein each cluster of topics is determined from a value for each bias dimension of each topic-document pair and a frequency of occurrence of each topic in the plurality of text documents, wherein each cluster of the clusters of topics is presentable according to a corresponding image configuration, wherein the image configuration is based on all or a subset of the bias dimensions and the frequency of occurrence of topics in a cluster that distinguishes the cluster from other clusters. - View Dependent Claims (19, 20)
-
Specification