Method and system for social media burst classifications
First Claim
1. A computer-implemented method for analyzing social media events, comprising:
- gathering a plurality of social media data received from one or more social media sources, each social media data including a post;
extracting one or more features associated with each social media data;
identifying substantially similar features among the plurality of social media data;
clustering the plurality of social media posts that share substantially similar features, thereby identifying one or more clustered features patterns; and
detecting a burst of clustered social media posts that have similar characteristics, wherein the characteristics of each social media post are determined by the composition of the associated features;
The detecting step comprises detecting the burst of clustered social media posts that have similar characteristics by the following equation;
5 Assignments
0 Petitions
Accused Products
Abstract
The present invention is directed to a method, system, and article of manufacture for systematically and automatically identifying abnormal or collective behavior patterns in microblogging messages that produce burst phenomena, such as Twitter storms. A microblogging storm engine in a storm detection server is configured to detect and classify the volume, shape, and type of a Twitter storm when keying on topics such as, but not limited to, a brand, an event, a person, an entity, a country, or a controversial issue. The microblogging storm engine comprises a storm detection module, a storm classification module, a database interface module, and a sentiment process module. The storm detection module is configured to detect different patterns of microblogging storms by capturing the volume of a particular storm to assist in output statistical analysis. The storm classification module is configured to classify the storms into different types of a particular storm category.
-
Citations
16 Claims
-
1. A computer-implemented method for analyzing social media events, comprising:
-
gathering a plurality of social media data received from one or more social media sources, each social media data including a post; extracting one or more features associated with each social media data; identifying substantially similar features among the plurality of social media data; clustering the plurality of social media posts that share substantially similar features, thereby identifying one or more clustered features patterns; and detecting a burst of clustered social media posts that have similar characteristics, wherein the characteristics of each social media post are determined by the composition of the associated features; The detecting step comprises detecting the burst of clustered social media posts that have similar characteristics by the following equation;
-
-
2. The method of claim 1, wherein one of the feature patterns comprises extracting at least the first character of each token in the text to create an abbreviated sequence of characters.
-
3. The method of claim 1, wherein the burst of clustered social media posts is a function of the number of related social media posts within a time period.
-
4. The method of claim 1, wherein one of the feature patterns comprises clustering the plurality of social media posts based on the distance of a number of like matching characters divided by the length of the longer item of a compared pair in social media posts.
-
5. The method of claim 1, wherein one of the feature patterns comprises clustering the plurality of social media posts based on the number of matching tokens divided by the total number of unique tokens.
-
6. The method of claim 1, wherein one of the feature patterns comprises a threshold-based clustering mechanism where the threshold is used to group the storms into different categories;
- the threshold can be determined by learning or estimating in using previously grouped storms.
-
7. The method of claim 1, wherein one of the feature patterns is based on histogram analysis of the repeating phrases and the sentiment orientation in the plurality of social media posts.
-
8. The method of claim 1, wherein one of the feature patterns comprises re-tweeting at least one social media post in addition to the histogram analysis and sentiment orientation to identify whether the burst constitutes a real storm or a spam.
-
9. The method of claim 1, wherein one of the feature patterns comprises a sorted histogram of token prefix strings, the strings being created by taking the first character of each token appearing in a corpus and using a sorting algorithm to group like prefix tokens.
-
10. The method of claim 1, wherein each social media data is selected from a group consisting of a text, an image, a video, an audio, or any combination thereof.
-
11. The method of claim 1, wherein each social media data includes a multimedia file.
-
12. A computer program product comprising a non-transitory computer readable storage medium structured to store instructions executable by a processor, the instructions, when executed cause the processor to:
-
gathering a plurality of social media data received from one or more social media sources, each social media data including a post; extracting one or more features associated with each social media data; identifying substantially similar features among the plurality of social media data; clustering the plurality of social media posts that share substantially similar features, thereby identifying one or more clustered feature patterns; and
detecting a burst of clustered social media posts that have similar characteristics, wherein the characteristics of each social media post are determined by the composition of the associated features;The detecting step comprises detecting the burst of clustered social media posts that have similar characteristics by the following equation;
-
-
13. The computer program product of claim 12, wherein one of the feature patterns comprises extracting at least the first character of each token in the text to create an abbreviated sequence of characters.
-
14. The computer program product of claim 12, wherein the burst of clustered social media posts is a function of the number of related social media posts within a time period.
-
15. The computer program product of claim 12, wherein one of the feature patterns comprises clustering the plurality of social media posts based on the distance of a number of like matching characters divided by the length of the longer item of a compared pair in social media posts.
-
16. The computer program product of claim 12, wherein one of the feature patterns comprises clustering the plurality of social media posts based on the distance of a number of like matching characters divided by the length of the longer item of a compared pair in social media posts.
Specification