Consumer insights analysis using word embeddings
First Claim
1. A method comprising:
- by a computing device in an online social network, receiving a request to generate a visualization of public sentiments regarding a particular subject by a plurality of clusters, wherein each cluster comprises a plurality of words semantically close to each other, wherein the request comprises one or more conditions characterizing a group of users, and wherein the request comprises an input n-gram representing the particular subject;
by the computing device, identifying users of the online social network who satisfy the one or more conditions;
by the computing device, constructing a first corpus of text by collecting text containing the input n-gram from a plurality of content objects in the online social network created by the identified users;
by the computing device, identifying a list of unique n-grams appearing in the first corpus of text;
by the computing device, generating, using a word embedding model, a table comprising unique n-grams in the list and their corresponding word vectors, wherein the word embedding model was trained using a second corpus of text collected from a plurality of user-created content objects in the online social network as training data, wherein each of the word vectors represents a semantic context of a corresponding n-gram as a point in a d-dimensional embedding space;
by the computing device, classifying word vectors in the table into a plurality of clusters based on semantic similarities of the word vectors;
by the computing device, calculating, for each of the plurality of clusters, a Term Frequency-Inverse Document Frequency (TF-IDF) score associated with the cluster;
by the computing device, determining k most relevant clusters to the particular subject based on the calculated TF-IDF scores; and
by the computing device, sending, as a response to the request, instructions to display n-grams associated with the determined k clusters in a two-dimensional display space, wherein n-grams corresponding to word vectors that belong to a cluster are displayed together.
2 Assignments
0 Petitions
Accused Products
Abstract
In one embodiment, a method includes receiving a request to generate a visualization of public sentiments regarding a particular subject by a plurality of clusters, where each cluster includes a plurality of words semantically close to each other, constructing a first corpus of text by collecting text containing the input n-gram from a plurality of user-created content objects in the online social network, identifying a list of unique n-grams appearing in the first corpus of text, generating a table comprising unique n-grams in the list and their corresponding word vectors using a word embedding model, classifying word vectors in the table into a plurality of clusters based on semantic similarities of the word vectors, and sending, as a response to the request, instructions to display n-grams in the table in a two-dimensional display space, where n-grams corresponding to word vectors that belong to a cluster are displayed together.
55 Citations
17 Claims
-
1. A method comprising:
-
by a computing device in an online social network, receiving a request to generate a visualization of public sentiments regarding a particular subject by a plurality of clusters, wherein each cluster comprises a plurality of words semantically close to each other, wherein the request comprises one or more conditions characterizing a group of users, and wherein the request comprises an input n-gram representing the particular subject; by the computing device, identifying users of the online social network who satisfy the one or more conditions; by the computing device, constructing a first corpus of text by collecting text containing the input n-gram from a plurality of content objects in the online social network created by the identified users; by the computing device, identifying a list of unique n-grams appearing in the first corpus of text; by the computing device, generating, using a word embedding model, a table comprising unique n-grams in the list and their corresponding word vectors, wherein the word embedding model was trained using a second corpus of text collected from a plurality of user-created content objects in the online social network as training data, wherein each of the word vectors represents a semantic context of a corresponding n-gram as a point in a d-dimensional embedding space; by the computing device, classifying word vectors in the table into a plurality of clusters based on semantic similarities of the word vectors; by the computing device, calculating, for each of the plurality of clusters, a Term Frequency-Inverse Document Frequency (TF-IDF) score associated with the cluster; by the computing device, determining k most relevant clusters to the particular subject based on the calculated TF-IDF scores; and by the computing device, sending, as a response to the request, instructions to display n-grams associated with the determined k clusters in a two-dimensional display space, wherein n-grams corresponding to word vectors that belong to a cluster are displayed together. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. One or more computer-readable non-transitory storage media embodying software that is operable when executed to:
-
receive a request to generate a visualization of public sentiments regarding a particular subject by a plurality of clusters, wherein each cluster comprises a plurality of words semantically close to each other, wherein the request comprises one or more conditions characterizing a group of users, and wherein the request comprises an input n-gram representing the particular subject; identify users of the online social network who satisfy the one or more conditions; construct a first corpus of text by collecting text containing the input n-gram from a plurality of content objects in an online social network created by the identified users; identify a list of unique n-grams appearing in the first corpus of text; generate, using a word embedding model, a table comprising unique n-grams in the list and their corresponding word vectors, wherein the word embedding model was trained using a second corpus of text collected from a plurality of user-created content objects in the online social network as training data, wherein each of the word vectors represents a semantic context of a corresponding n-gram as a point in a d-dimensional embedding space; classify word vectors in the table into a plurality of clusters based on semantic similarities of the word vectors; calculate, for each of the plurality of clusters, a Term Frequency-Inverse Document Frequency (TF-IDF) score associated with the cluster; determine k most relevant clusters to the particular subject based on the calculated TF-IDF scores; and send, as a response to the request, instructions to display n-grams associated with the determined k clusters in a two-dimensional display space, wherein n-grams corresponding to word vectors that belong to a cluster are displayed together. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A system comprising:
-
one or more processors; and one or more computer-readable non-transitory storage media coupled to one or more of the processors and comprising instructions operable when executed by one or more of the processors to cause the system to; receive a request to generate a visualization of public sentiments regarding a particular subject by a plurality of clusters, wherein each cluster comprises a plurality of words semantically close to each other, wherein the request comprises one or more conditions characterizing a group of users, and wherein the request comprises an input n-gram representing the particular subject; identify users of the online social network who satisfy the one or more conditions; construct a first corpus of text by collecting text containing the input n-gram from a plurality of content objects in an online social network created by the identified users; identify a list of unique n-grams appearing in the first corpus of text; generate, using a word embedding model, a table comprising unique n-grams in the list and their corresponding word vectors, wherein the word embedding model was trained using a second corpus of text collected from a plurality of user-created content objects in the online social network as training data, wherein each of the word vectors represents a semantic context of a corresponding n-gram as a point in a d-dimensional embedding space; classify word vectors in the table into a plurality of clusters based on semantic similarities of the word vectors; calculate, for each of the plurality of clusters, a Term Frequency-Inverse Document Frequency (TF-IDF) score associated with the cluster; determine k most relevant clusters to the particular subject based on the calculated TF-IDF scores; and send, as a response to the request, instructions to display n-grams associated with the determined k clusters in a two-dimensional display space, wherein n-grams corresponding to word vectors that belong to a cluster are displayed together.
-
Specification