Consumer insights analysis using word embeddings

US 10,685,183 B1
Filed: 01/04/2018
Issued: 06/16/2020
Est. Priority Date: 01/04/2018
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

by a computing device in an online social network, receiving a request to generate a visualization of public sentiments regarding a particular subject by a plurality of clusters, wherein each cluster comprises a plurality of words semantically close to each other, wherein the request comprises one or more conditions characterizing a group of users, and wherein the request comprises an input n-gram representing the particular subject;

by the computing device, identifying users of the online social network who satisfy the one or more conditions;

by the computing device, constructing a first corpus of text by collecting text containing the input n-gram from a plurality of content objects in the online social network created by the identified users;

by the computing device, identifying a list of unique n-grams appearing in the first corpus of text;

by the computing device, generating, using a word embedding model, a table comprising unique n-grams in the list and their corresponding word vectors, wherein the word embedding model was trained using a second corpus of text collected from a plurality of user-created content objects in the online social network as training data, wherein each of the word vectors represents a semantic context of a corresponding n-gram as a point in a d-dimensional embedding space;

by the computing device, classifying word vectors in the table into a plurality of clusters based on semantic similarities of the word vectors;

by the computing device, calculating, for each of the plurality of clusters, a Term Frequency-Inverse Document Frequency (TF-IDF) score associated with the cluster;

by the computing device, determining k most relevant clusters to the particular subject based on the calculated TF-IDF scores; and

by the computing device, sending, as a response to the request, instructions to display n-grams associated with the determined k clusters in a two-dimensional display space, wherein n-grams corresponding to word vectors that belong to a cluster are displayed together.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In one embodiment, a method includes receiving a request to generate a visualization of public sentiments regarding a particular subject by a plurality of clusters, where each cluster includes a plurality of words semantically close to each other, constructing a first corpus of text by collecting text containing the input n-gram from a plurality of user-created content objects in the online social network, identifying a list of unique n-grams appearing in the first corpus of text, generating a table comprising unique n-grams in the list and their corresponding word vectors using a word embedding model, classifying word vectors in the table into a plurality of clusters based on semantic similarities of the word vectors, and sending, as a response to the request, instructions to display n-grams in the table in a two-dimensional display space, where n-grams corresponding to word vectors that belong to a cluster are displayed together.

55 Citations

View as Search Results

17 Claims

1. A method comprising:
- by a computing device in an online social network, receiving a request to generate a visualization of public sentiments regarding a particular subject by a plurality of clusters, wherein each cluster comprises a plurality of words semantically close to each other, wherein the request comprises one or more conditions characterizing a group of users, and wherein the request comprises an input n-gram representing the particular subject;
  
  by the computing device, identifying users of the online social network who satisfy the one or more conditions;
  
  by the computing device, constructing a first corpus of text by collecting text containing the input n-gram from a plurality of content objects in the online social network created by the identified users;
  
  by the computing device, identifying a list of unique n-grams appearing in the first corpus of text;
  
  by the computing device, generating, using a word embedding model, a table comprising unique n-grams in the list and their corresponding word vectors, wherein the word embedding model was trained using a second corpus of text collected from a plurality of user-created content objects in the online social network as training data, wherein each of the word vectors represents a semantic context of a corresponding n-gram as a point in a d-dimensional embedding space;
  
  by the computing device, classifying word vectors in the table into a plurality of clusters based on semantic similarities of the word vectors;
  
  by the computing device, calculating, for each of the plurality of clusters, a Term Frequency-Inverse Document Frequency (TF-IDF) score associated with the cluster;
  
  by the computing device, determining k most relevant clusters to the particular subject based on the calculated TF-IDF scores; and
  
  by the computing device, sending, as a response to the request, instructions to display n-grams associated with the determined k clusters in a two-dimensional display space, wherein n-grams corresponding to word vectors that belong to a cluster are displayed together.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the word embedding model is a word2vec model.
  - 3. The method of claim 1, wherein the classifying word vectors comprises performing a hierarchical clustering on the word vectors in the table.
  - 4. The method of claim 1, further comprises determining, for each n-gram in the table, a Term Frequency-Inverse Document Frequency (TF-IDF) score.
  - 5. The method of claim 4, wherein the instructions comprise instructions to adjust a font size for an n-gram based at least on a respective TF-IDF score assigned to the n-gram.
  - 6. The method of claim 4, wherein calculating a TF-IDF score associated with a cluster comprises taking an average of determined TD-IDF scores for n-grams corresponding to word vectors that belong to the cluster.
  - 7. The method of claim 4, wherein calculating a TF-IDF score associated with a cluster comprises taking a maximum TD-IDF score among the determined TD-IDF scores for n-grams corresponding to word vectors that belong to the cluster.
  - 8. The method of claim 1, wherein the instructions comprise instructions to assign a font color for n-grams in a semantic cluster.
  - 9. The method of claim 1, whereinthe second corpus of text is constructed by collecting texts from a plurality of content objects in the online social network created by the group of users that satisfy the one or more conditions.
  - 10. The method of claim 9, wherein the content objects were created within a pre-determined period of time.

11. One or more computer-readable non-transitory storage media embodying software that is operable when executed to:
- receive a request to generate a visualization of public sentiments regarding a particular subject by a plurality of clusters, wherein each cluster comprises a plurality of words semantically close to each other, wherein the request comprises one or more conditions characterizing a group of users, and wherein the request comprises an input n-gram representing the particular subject;
  
  identify users of the online social network who satisfy the one or more conditions;
  
  construct a first corpus of text by collecting text containing the input n-gram from a plurality of content objects in an online social network created by the identified users;
  
  identify a list of unique n-grams appearing in the first corpus of text;
  
  generate, using a word embedding model, a table comprising unique n-grams in the list and their corresponding word vectors, wherein the word embedding model was trained using a second corpus of text collected from a plurality of user-created content objects in the online social network as training data, wherein each of the word vectors represents a semantic context of a corresponding n-gram as a point in a d-dimensional embedding space;
  
  classify word vectors in the table into a plurality of clusters based on semantic similarities of the word vectors;
  
  calculate, for each of the plurality of clusters, a Term Frequency-Inverse Document Frequency (TF-IDF) score associated with the cluster;
  
  determine k most relevant clusters to the particular subject based on the calculated TF-IDF scores; and
  
  send, as a response to the request, instructions to display n-grams associated with the determined k clusters in a two-dimensional display space, wherein n-grams corresponding to word vectors that belong to a cluster are displayed together.
- View Dependent Claims (12, 13, 14, 15, 16)
- - 12. The media of claim 11, wherein the word embedding model is a word2vec model.
  - 13. The media of claim 11, wherein the classifying word vectors comprises performing a hierarchical clustering on the word vectors in the table.
  - 14. The media of claim 11, wherein the software is further operable when executed to determine, for each n-gram in the table, a Term Frequency-Inverse Document Frequency (TF-IDF) score.
  - 15. The media of claim 14, wherein the instructions comprise instructions to adjust a font size for an n-gram based at least on a respective TF-IDF score assigned to the n-gram.
  - 16. The media of claim 11, wherein the instructions comprise instructions to assign a font color for n-grams in a semantic cluster.

17. A system comprising:
- one or more processors; and
  
  one or more computer-readable non-transitory storage media coupled to one or more of the processors and comprising instructions operable when executed by one or more of the processors to cause the system to;
  
  receive a request to generate a visualization of public sentiments regarding a particular subject by a plurality of clusters, wherein each cluster comprises a plurality of words semantically close to each other, wherein the request comprises one or more conditions characterizing a group of users, and wherein the request comprises an input n-gram representing the particular subject;
  
  identify users of the online social network who satisfy the one or more conditions;
  
  construct a first corpus of text by collecting text containing the input n-gram from a plurality of content objects in an online social network created by the identified users;
  
  identify a list of unique n-grams appearing in the first corpus of text;
  
  generate, using a word embedding model, a table comprising unique n-grams in the list and their corresponding word vectors, wherein the word embedding model was trained using a second corpus of text collected from a plurality of user-created content objects in the online social network as training data, wherein each of the word vectors represents a semantic context of a corresponding n-gram as a point in a d-dimensional embedding space;
  
  classify word vectors in the table into a plurality of clusters based on semantic similarities of the word vectors;
  
  calculate, for each of the plurality of clusters, a Term Frequency-Inverse Document Frequency (TF-IDF) score associated with the cluster;
  
  determine k most relevant clusters to the particular subject based on the calculated TF-IDF scores; and
  
  send, as a response to the request, instructions to display n-grams associated with the determined k clusters in a two-dimensional display space, wherein n-grams corresponding to word vectors that belong to a cluster are displayed together.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Original Assignee
Meta Platforms, Inc. (f/k/a Facebook, Inc.)
Inventors
Arfa, Jonathan Michael, Nawathe, Nikhil Girish, Kauder, Bryan, Subramanian, Shriram
Primary Examiner(s)
Le, Thuykhanh

Application Number

US15/862,070
Time in Patent Office

894 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/295   Named entity recognition

G06F 40/30   Semantic analysis

G06N 20/00   Machine learning

G06N 3/08   Learning methods

H04L 51/214   using selective forwarding

H04L 51/52   for supporting social netwo...

Consumer insights analysis using word embeddings

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

55 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Consumer insights analysis using word embeddings

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

55 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links