DATA ANALYTICS SYSTEM AND METHODS FOR TEXT DATA

US 20190272320A1
Filed: 03/12/2019
Published: 09/05/2019
Est. Priority Date: 07/15/2016
Status: Active Grant

First Claim

Patent Images

1. A device, comprising:

a processing system including a processor; and

a memory that stores executable instructions that, when executed by the processing system, facilitate performance of operations, the operations comprising;

performing a statistical, natural-language processing analysis on a plurality of text documents to determine a plurality of topics;

associating a topic of the plurality of topics with each document in the plurality of text documents, thereby creating a plurality of topic-document pairs;

for each topic-document pair of the plurality of topic-document pairs, identifying a bias from text in a document identified by the topic-document pair; and

creating a plurality of topic clusters,wherein the plurality of topic clusters are determined from the bias of each topic-document pair and a frequency of occurrence of each topic in the document identified by the topic-document pair, andwherein the plurality of topic clusters have an image configuration based on the bias and the frequency of occurrence that distinguishes one topic cluster of the plurality of topic clusters from another.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Aspects of the subject disclosure may include, for example, a process that performs a statistical, natural-language processing analysis on a group of text documents to determine a group of topics. The topics are determined according to parameters obtained by training on a sample of documents. One or more topics in a subset of topics are associated to each document, resulting in topic-document pairs. A bias is identified for each topic-document pair, and clusters of topics are created from the subset of topics. Each cluster of topics is determined from a value for each bias of each topic-document pair and from a frequency of occurrence of each topic. Each cluster is presentable according to a corresponding image configuration based on all or a subset of the bias dimensions and the frequency of occurrence of topics in a cluster that distinguishes the cluster from other clusters. Other embodiments are disclosed.

Citations

20 Claims

1. A device, comprising:
- a processing system including a processor; and
  
  a memory that stores executable instructions that, when executed by the processing system, facilitate performance of operations, the operations comprising;
  
  performing a statistical, natural-language processing analysis on a plurality of text documents to determine a plurality of topics;
  
  associating a topic of the plurality of topics with each document in the plurality of text documents, thereby creating a plurality of topic-document pairs;
  
  for each topic-document pair of the plurality of topic-document pairs, identifying a bias from text in a document identified by the topic-document pair; and
  
  creating a plurality of topic clusters,wherein the plurality of topic clusters are determined from the bias of each topic-document pair and a frequency of occurrence of each topic in the document identified by the topic-document pair, andwherein the plurality of topic clusters have an image configuration based on the bias and the frequency of occurrence that distinguishes one topic cluster of the plurality of topic clusters from another.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The device of claim 1, wherein the image configuration comprises size, shape, color coding, or any combination thereof.
  - 3. The device of claim 1, wherein the image configuration comprises an area for each topic cluster in the plurality of topic clusters.
  - 4. The device of claim 3, wherein a size of the area for each topic cluster in the plurality of topic clusters represents the frequency of occurrence of each topic in the plurality of topic clusters.
  - 5. The device of claim 1, wherein the image configuration comprises an area for a topic cluster in the plurality of topic clusters, wherein the area is subdivided into separate areas for each topic in a topic cluster in the plurality of topic clusters.
  - 6. The device of claim 1, wherein the image configuration comprises an area for a topic cluster in the plurality of topic clusters, and wherein the area is subdivided into separate areas for each topic in a topic cluster in the plurality of topic clusters, the separate areas for each topic further comprising a color that represents the bias of the topic.
  - 7. The device of claim 1, wherein identifying the bias of each topic-document pair comprises a latent semantic analysis.
  - 8. The device of claim 1, wherein identifying the bias comprises one of positive bias, neutral, or negative bias.
  - 9. The device of claim 1, wherein the statistical, natural-language processing analysis comprises a latent Dirichlet allocation.
  - 10. The device of claim 1, wherein the operations further comprise generating presentable content depicting each topic cluster of the plurality of topic clusters according to a corresponding image configuration.
  - 11. The device of claim 10, wherein the presentable content includes a Pareto analysis of bias associated with each topic in each topic cluster of the plurality of topic clusters.

12. A non-transitory, machine-readable storage medium, comprising executable instructions that, when executed by a processing system including a processor, facilitate performance of operations, the operations comprising:
- determining a plurality of topics from a plurality of text documents according to a statistical, natural-language processing analysis, wherein the analysis uses parameters determined from a plurality of sample documents;
  
  associating one or more topics in the plurality of topics to each document in the plurality of text documents, wherein the plurality of topics is reduced into a proper subset of topics based on a frequency of occurrence of each topic in the plurality of text documents;
  
  identifying a bias for each topic in the proper subset of topics, wherein the bias is identified according to text in a corresponding document mapped to the topic; and
  
  creating clusters of topics from the proper subset of topics, wherein each cluster of topics in the clusters of topics is determined from a latent semantic analysis comprising singular value decomposition into orthogonal dimensions, wherein each cluster of topics has an image configuration based on the bias and the frequency of occurrence for topics in the clusters of topics that distinguishes one cluster from another.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The non-transitory, machine-readable storage medium of claim 12, wherein the image configuration comprises a geometric area for a topic in a cluster of topics.
  - 14. The non-transitory, machine-readable storage medium of claim 13, wherein a size of the geometric area for the topic in the cluster of topics represents a summary statistic in the plurality of text documents comprising the topic.
  - 15. The non-transitory, machine-readable storage medium of claim 14, wherein the bias comprises n-dimension of bias, and wherein a color of the geometric area for the topic in the cluster of topics represents the n-dimensions of bias of the topic from the plurality of text documents associated with the topic.
  - 16. The non-transitory, machine-readable storage medium of claim 12, wherein the processing system includes a plurality of processors operating in a distributed processing environment.
  - 17. The non-transitory, machine-readable storage medium of claim 12, wherein identifying the bias comprises one of positive bias, neutral, or negative bias.

18. A method, comprising:
- performing, by a system comprising a processor, a statistical natural language processing analysis on a plurality of text documents to determine a plurality of topics, wherein the plurality of topics are determined according to parameters that are determined by training on a sample of documents in order to control a number of the plurality of topics that are determined;
  
  associating, by the system, one or more topics in a subset of topics to each document in the plurality of text documents, thereby creating a plurality of topic-document pairs;
  
  identifying, by the system, a bias for each topic-document pair of the plurality of topic-document pairs; and
  
  creating, by the system, clusters of topics from the subset of topics, wherein each cluster of topics is determined from a value for each bias dimension of each topic-document pair and a frequency of occurrence of each topic in the plurality of text documents, wherein each cluster of the clusters of topics is presentable according to a corresponding image configuration, wherein the image configuration is based on all or a subset of the bias dimensions and the frequency of occurrence of topics in a cluster that distinguishes the cluster from other clusters.
- View Dependent Claims (19, 20)
- - 19. The method of claim 18, wherein the identifying the bias further comprises performing, by the system, a latent semantic analysis of text in the document associated with each topic-document pair, and wherein the latent semantic analysis comprises singular value decomposition into orthogonal dimensions.
  - 20. The method of claim 18, wherein the subset of topics is determined from the plurality of topics according to user input, and wherein the user input comprises merging two topics.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Original Assignee
AT&T Intellectual Property I LP (AT&T, Inc.)
Inventors
Bogdan, Pamela A. M., Gressel, Gary, Reser, Gary, Rubarkh, Alex, Shirley, Kenneth

Granted Patent

US 10,642,932 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/285   Clustering or classification

G06F 16/355   Class or cluster creation o...

G06F 16/358   Browsing; Visualisation the...

G06F 40/216   using statistical methods

G06F 40/30   Semantic analysis

DATA ANALYTICS SYSTEM AND METHODS FOR TEXT DATA

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

DATA ANALYTICS SYSTEM AND METHODS FOR TEXT DATA

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links