Predicting future trending topics
First Claim
1. A method for identifying future trending n-grams, comprising:
- for at least one particular content item of multiple content items;
extracting text from the particular content item;
identifying a plurality of classifications for the particular content item, the plurality of classifications including a geographical region classification and a subject-based classification;
organizing the extracted text into one or more n-grams;
adding the one or more n-grams to a cumulative set of n-grams, wherein each n-gram in the cumulative set is associated with a time-based value for the particular content item;
sorting the n-grams in the cumulative set into groups by the plurality of classifications of the content item that the n-gram originated from;
computing a frequency value, within each group, for each unique n-gram in that group;
selecting unique n-grams, for at least one of the groups, that have a frequency value above a frequency threshold;
computing a predicted change in frequency value for the selected unique n-grams, the computing for a given unique n-gram comprising fitting a polynomial to the time-based values for the n-grams that have the same sequence of words as the given unique n-gram and that are in the same group as the given unique n-gram, wherein the computed predicted change in frequency is a slope of the polynomial at a point corresponding to a current time; and
selecting, as the future trending n-grams, for a geographical region specified in the geographical region classification, one or more n-grams with a predicted change in frequency value above a predicted change threshold.
2 Assignments
0 Petitions
Accused Products
Abstract
A prediction system can predict future trending topics. The prediction system can classify social media posts by region and vertical, extract text from the posts, tokenize the extracted text, and organizing the tokens into n-grams. The prediction system can store the n-grams from the posts in a cumulative set of n-grams, with each n-gram tagged with the originating post'"'"'s identified region, vertical, and a time value. The prediction system can compute, for each n-gram, a frequency within each category defined by a region/vertical pair. The prediction system can fit occurrence data for n-grams to a polynomial and identify the slope of the point on for the current time. The slope can be used as a prediction of growth or decline for the n-gram. The prediction system can identify n-grams with a comparatively large slope within that region/vertical as likely to be trending in the future.
222 Citations
18 Claims
-
1. A method for identifying future trending n-grams, comprising:
for at least one particular content item of multiple content items; extracting text from the particular content item; identifying a plurality of classifications for the particular content item, the plurality of classifications including a geographical region classification and a subject-based classification; organizing the extracted text into one or more n-grams; adding the one or more n-grams to a cumulative set of n-grams, wherein each n-gram in the cumulative set is associated with a time-based value for the particular content item; sorting the n-grams in the cumulative set into groups by the plurality of classifications of the content item that the n-gram originated from; computing a frequency value, within each group, for each unique n-gram in that group; selecting unique n-grams, for at least one of the groups, that have a frequency value above a frequency threshold; computing a predicted change in frequency value for the selected unique n-grams, the computing for a given unique n-gram comprising fitting a polynomial to the time-based values for the n-grams that have the same sequence of words as the given unique n-gram and that are in the same group as the given unique n-gram, wherein the computed predicted change in frequency is a slope of the polynomial at a point corresponding to a current time; and selecting, as the future trending n-grams, for a geographical region specified in the geographical region classification, one or more n-grams with a predicted change in frequency value above a predicted change threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
9. A non-transitory computer-readable storage medium storing instructions that, when executed by a computing system, cause the computing system to perform operations for identifying one or more future trending n-grams, the operations comprising:
for at least one particular content item of multiple content items; identifying a plurality of classifications for the particular content item, the plurality of classifications includes a geographical region classification and a subject-based classification; organizing text associated with the particular content item into one or more n-grams; adding, from the one or more n-grams into to a cumulative set of n-grams, at least one n-gram; computing a frequency value for each unique n-gram in the cumulative set of n-grams, the frequency value computed for a frequency within the group of n-grams in the cumulative set of n-grams that have the same one or more classifications; computing a predicted change in frequency value for at least some of the unique n-grams, the computing for a given n-gram comprising fitting a polynomial to time-based values associated with the n-grams in the cumulative set that have the same sequence of words as the given unique n-gram and that have the same one or more classifications as the given unique n-gram, wherein the computed change in frequency is a slope of the polynomial at a point corresponding to a current time; and selecting, as the future trending n-grams, for a geographical region specified in the geographical region classification, one or more n-grams with a predicted change in frequency value above a predicted change threshold. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18)
Specification