Realtime data stream cluster summarization and labeling system

US 20170255536A1
Filed: 12/08/2016
Published: 09/07/2017
Est. Priority Date: 03/15/2013
Status: Active Grant

First Claim

Patent Images

1. A method for generating topic labels from statistical topic models, comprising:

receiving a collection of topics, associated topic word probabilities for a given topic in conjunction with a statistical topic model and a set of documents associated with each topic;

truncating a document set to include documents having an aggregate topic word probability that meets truncation criteria;

reweighting the probabilities for each topic word in the truncated document set for a given topic based on the frequency that the topic word appears across the collection of topics;

determining, for each document in the truncated document set for the given topic, an aggregate topic word probability;

identifying topic fragments in each document in a truncated document set based on the topic words to create user friendly and highly descriptive topic labels.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method is provided for automatically discovering topics in electronic posts, such as social media posts. The method includes receiving a corpus that includes a plurality of electronic posts. The method further includes identifying a plurality of candidate terms within the corpus and selecting, as a trimmed lexicon, a subset of the plurality of candidate terms using predefined criteria. The method further includes clustering at least a subset of the plurality of electronic posts according to a plurality of clusters using the lexicon to produce a plurality of statistical topic models. The method further includes storing information corresponding to the statistical topic models.

77 Citations

View as Search Results

5 Claims

1. A method for generating topic labels from statistical topic models, comprising:
- receiving a collection of topics, associated topic word probabilities for a given topic in conjunction with a statistical topic model and a set of documents associated with each topic;
  
  truncating a document set to include documents having an aggregate topic word probability that meets truncation criteria;
  
  reweighting the probabilities for each topic word in the truncated document set for a given topic based on the frequency that the topic word appears across the collection of topics;
  
  determining, for each document in the truncated document set for the given topic, an aggregate topic word probability;
  
  identifying topic fragments in each document in a truncated document set based on the topic words to create user friendly and highly descriptive topic labels.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, wherein the statistical topic models is generated by an LDA process.
  - 3. The method of claim 1, wherein truncation criteria includes a criterion that is met when the aggregate topic word probability for a document exceeds a truncation threshold.
  - 4. The method of claim 1, wherein determining, for each document in the truncated document set for the given topic, an aggregate topic word probability is based on the aggregate of the reweighted probabilities for each topic word for a given topic.
  - 5. The method of claim 1, wherein identifying topic fragments in each document includes pre and post annotating the topic fragments by iterating backwards and forwards from a topic word through the document and storing stopwords positioned relative to the topic word until a non-stopword is identified.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Target Brands Inc. (Target Corporation)
Original Assignee
Uda LLC (Anexinet Corp.)
Inventors
Weissinger, Steve, Stevens, Luis, Schiavone, Vincent

Granted Patent

US 10,204,026 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 11/3409   for performance assessment

G06F 16/24568   Data stream processing; Con...

G06F 16/313   Selection or weighting of t...

G06F 16/35   Clustering; Classification

G06F 16/9024   Graphs; Linked lists G06F16...

G06F 16/9535   Search customisation based ...

G06Q 30/0201   Market modelling; Market an...

G06Q 50/01   Social networking

H04L 65/60   Network streaming of media ...

Realtime data stream cluster summarization and labeling system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

77 Citations

5 Claims

Specification

Solutions

Use Cases

Quick Links

Realtime data stream cluster summarization and labeling system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

77 Citations

5 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links