System and method for classification of microblog posts based on identification of topics
First Claim
1. A method for assigning a topic to a collection of microblog posts, the method comprising:
- receiving from at least one messaging service server, a plurality of posts, wherein;
each of the plurality of posts comprise post content, andone or more of the posts comprise one or more links with an address to an external document;
causing a communications hardware component to initiate a communication via a communications network that accesses the external document at an external server that is associated with the address and fetches external content associated with the external document; and
by a generation module;
analyzing the post content included in each post to identify at least one label for that post,for each post that includes a link, analyzing external content corresponding to the external document associated with that link to identify a topic, andusing a topic modeling technique to generate a trained topic model, wherein the trained topic model comprises, for each identified label, a plurality of topics and a plurality of words associated with each of the plurality of topics; and
saving the trained topic model to a computer-readable memory device,wherein, the topic modeling technique comprises a User-Labeled Linked Hierarchical Dirichlet Process (L2-HDP) technique, that utilizes a user'"'"'s interest variable vector to generate the trained topic model, wherein the user'"'"'s interest variable vector is indicative of a topic composition of the set of documents associated with a user.
6 Assignments
0 Petitions
Accused Products
Abstract
A method for assigning a topic to a collection of microblog posts may include, by an acquisition module, receiving from at least one messaging service server, a plurality of posts, wherein each of the plurality of posts comprise post content; by a generation module, analyzing the posts and extract, from at least one of the posts, a link with an address to an external document; and, by the acquisition module, accessing the external document that is associated with the address and fetch external content associated with the document. The method may also include by the generation module: analyzing the post content to identify at least one label for each post, for each post that includes a link, analyzing the external content to identify a topic, and using a topic modeling technique to generate a trained topic model comprising a plurality of topics and a plurality of associated words.
-
Citations
18 Claims
-
1. A method for assigning a topic to a collection of microblog posts, the method comprising:
-
receiving from at least one messaging service server, a plurality of posts, wherein; each of the plurality of posts comprise post content, and one or more of the posts comprise one or more links with an address to an external document; causing a communications hardware component to initiate a communication via a communications network that accesses the external document at an external server that is associated with the address and fetches external content associated with the external document; and by a generation module; analyzing the post content included in each post to identify at least one label for that post, for each post that includes a link, analyzing external content corresponding to the external document associated with that link to identify a topic, and using a topic modeling technique to generate a trained topic model, wherein the trained topic model comprises, for each identified label, a plurality of topics and a plurality of words associated with each of the plurality of topics; and saving the trained topic model to a computer-readable memory device, wherein, the topic modeling technique comprises a User-Labeled Linked Hierarchical Dirichlet Process (L2-HDP) technique, that utilizes a user'"'"'s interest variable vector to generate the trained topic model, wherein the user'"'"'s interest variable vector is indicative of a topic composition of the set of documents associated with a user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for assigning a topic to a collection of microblog posts, the system comprising:
-
a processing device; and a computer-readable memory containing programming instructions that, when executed by the processing device, cause the processing device to; by an acquisition module, receive from at least one messaging service server, a plurality of posts, wherein; each of the plurality of posts comprise post content, and one or more of the posts comprise one or more links with an address to an external document; by the acquisition module, cause a communications hardware component to initiate a communication via a communications network that accesses the external document at an external server that is associated with the address and fetches external content associated with the document; by the generation module; analyze the post content included in each post to identify at least one label for that post, for each post that includes a link, analyze external content corresponding to the external document associated with that link to identify a topic, and use a topic modeling technique to generate a trained topic model, wherein the trained topic model comprises, for each identified label, a plurality of topics and a plurality of words associated with each of the plurality of topics; and by the generation module, save the trained topic model to a computer-readable memory device wherein, the topic modeling technique comprises a User-Labeled Linked Hierarchical Dirichlet Process (L2-HDP) technique, that utilizes a user'"'"'s interest variable vector to generate the trained topic model, wherein the user'"'"'s interest variable vector is indicative of a topic composition of the set of documents associated with a user. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification