Realtime data stream cluster summarization and labeling system
First Claim
1. A method for generating topic labels from statistical topic models, comprising:
- receiving a collection of topics, associated topic word probabilities for a given topic in conjunction with a statistical topic model and a set of documents associated with each topic;
truncating a document set to include documents having an aggregate topic word probability that meets truncation criteria;
reweighting the probabilities for each topic word in the truncated document set for a given topic based on the frequency that the topic word appears across the collection of topics;
determining, for each document in the truncated document set for the given topic, an aggregate topic word probability;
identifying topic fragments in each document in a truncated document set based on the topic words to create user friendly and highly descriptive topic labels.
1 Assignment
0 Petitions
Accused Products
Abstract
A method is provided for automatically discovering topics in electronic posts, such as social media posts. The method includes receiving a corpus that includes a plurality of electronic posts. The method further includes identifying a plurality of candidate terms within the corpus and selecting, as a trimmed lexicon, a subset of the plurality of candidate terms using predefined criteria. The method further includes clustering at least a subset of the plurality of electronic posts according to a plurality of clusters using the lexicon to produce a plurality of statistical topic models. The method further includes storing information corresponding to the statistical topic models.
77 Citations
5 Claims
-
1. A method for generating topic labels from statistical topic models, comprising:
-
receiving a collection of topics, associated topic word probabilities for a given topic in conjunction with a statistical topic model and a set of documents associated with each topic; truncating a document set to include documents having an aggregate topic word probability that meets truncation criteria; reweighting the probabilities for each topic word in the truncated document set for a given topic based on the frequency that the topic word appears across the collection of topics; determining, for each document in the truncated document set for the given topic, an aggregate topic word probability; identifying topic fragments in each document in a truncated document set based on the topic words to create user friendly and highly descriptive topic labels. - View Dependent Claims (2, 3, 4, 5)
-
Specification