Embedding-based parsing of search queries on online social networks
First Claim
1. A method comprising, by one or more computer systems:
- receiving, from a client system of a user of an online social network, a query inputted by the user, wherein the query comprises a plurality of n-grams;
parsing the query to identify a subset of n-grams of the plurality of n-grams;
generating, for each identified n-gram, an embedding of the n-gram, wherein embeddings correspond to points in a d-dimensional embedding space;
determining, for each identified n-gram, one or more word senses corresponding to one or more embeddings of the word senses, respectively;
calculating, for each determined word sense for each identified n-gram, a relatedness-score for the word sense based on one or more similarity metrics of the embedding of the word sense and the embeddings of each of the one or more other word senses corresponding to the other identified n-grams, respectively;
selecting, for each identified n-gram, one of the one or more word senses determined for the identified n-gram having one or more relatedness-scores, respectively, the selected word sense having a highest relatedness-score of the one or more respective relatedness-scores;
identifying one or more objects matching at least a portion of the query;
ranking each identified object based on a quality of matching of the object to one or more selected word senses; and
sending, to the client system in response to the query, a search-results interface for display, wherein the search-results interface comprises one or more search results corresponding to one or more of the identified objects, respectively, each identified object corresponding to a search result having a rank greater than a threshold rank.
2 Assignments
0 Petitions
Accused Products
Abstract
In one embodiment, a method includes receiving a query including multiple n-grams; parsing the query to identify a subset of n-grams; generating, for each identified n-gram, an embedding of the n-gram; determining, for each identified n-gram, one or more word senses; calculating, for each word sense for each identified n-gram, a relatedness-score for the word sense based similarity metrics of the embedding of the word sense and the embeddings of each of the other word senses corresponding to the other identified n-grams; selecting, for each identified n-gram, one of the word senses determined for the identified n-gram having a highest relatedness-score; identifying objects matching at least a portion of the query; ranking each identified object based on a quality of matching of the object to selected word senses; and sending search results corresponding to one or more of the identified objects and having a rank greater than a threshold rank.
227 Citations
20 Claims
-
1. A method comprising, by one or more computer systems:
-
receiving, from a client system of a user of an online social network, a query inputted by the user, wherein the query comprises a plurality of n-grams; parsing the query to identify a subset of n-grams of the plurality of n-grams; generating, for each identified n-gram, an embedding of the n-gram, wherein embeddings correspond to points in a d-dimensional embedding space; determining, for each identified n-gram, one or more word senses corresponding to one or more embeddings of the word senses, respectively; calculating, for each determined word sense for each identified n-gram, a relatedness-score for the word sense based on one or more similarity metrics of the embedding of the word sense and the embeddings of each of the one or more other word senses corresponding to the other identified n-grams, respectively; selecting, for each identified n-gram, one of the one or more word senses determined for the identified n-gram having one or more relatedness-scores, respectively, the selected word sense having a highest relatedness-score of the one or more respective relatedness-scores; identifying one or more objects matching at least a portion of the query; ranking each identified object based on a quality of matching of the object to one or more selected word senses; and sending, to the client system in response to the query, a search-results interface for display, wherein the search-results interface comprises one or more search results corresponding to one or more of the identified objects, respectively, each identified object corresponding to a search result having a rank greater than a threshold rank.
-
-
2. The method of claim 1, wherein parsing the query comprises:
-
rewriting the query to remove one or more of the plurality of n-grams; and identifying the subset of n-grams as the remaining n-grams.
-
-
3. The method of claim 1, wherein the one or more similarity metrics are calculated based on one or more cosine similarities of the embedding of the word sense and the embeddings of each of the one or more other word senses corresponding to the other identified n-grams, respectively.
-
4. The method of claim 1, wherein for each identified n-gram, the one or more word senses are determined based on one or more similarity metrics of the embedding of the n-gram and the embeddings of the one or more word senses, respectively.
-
5. The method of claim 4, wherein the one or more word senses comprise a specified number of the one or more word senses with the highest similarity metrics of the embedding of the n-gram and the embeddings of the one or more word senses, respectively.
-
6. The method of claim 4, wherein the one or more word senses comprise each word sense having a similarity metric greater than or equal to a threshold similarity metric of the embedding of the n-gram and the embedding of the word sense.
-
7. The method of claim 1, wherein at least one of the word senses comprises a plurality of terms, and wherein the embedding of the word sense is an average pooling of embeddings of each term of the plurality of terms.
-
8. The method of claim 1, wherein calculating the relatedness-score comprises summing an initial-score for the word sense and a context-score for the word sense, and wherein the context-score is a sum of the one or more similarity metrics of the embedding of the word sense and the embeddings of each of the one or more other word senses corresponding to the other identified n-grams, respectively.
-
9. The method of claim 8, wherein, for each determined word sense for each identified n-gram, the initial-score based on a similarity metric of the embedding of the n-gram and the embedding of the word sense.
-
10. The method of claim 1, wherein each object is associated with one or more word senses, and wherein each identified object is associated with at least one word sense matching one of the one or more selected word senses.
-
11. The method of claim 1, wherein each object is associated with one or more topics, and wherein ranking each identified object is further based on a quality of matching of the one or more topics of the identified object to one or more selected word senses.
-
12. The method of claim 1, wherein each identified object is of a particular object-type, and wherein ranking each identified object is further based on a quality of matching of the object-type of the identified object to one or more selected word senses.
-
13. The method of claim 1, wherein each identified object is of a particular item-type, and wherein ranking each identified object is further based on a quality of matching of the item-type of the identified object to one or more selected word senses.
-
14. The method of claim 1, wherein the embeddings are determined based on a machine learning model.
-
15. The method of claim 1, wherein parsing the query to identify the subset of n-grams of the plurality of n-grams comprises:
-
determining a plurality of candidate subsets of n-grams of the plurality of n-grams, wherein each n-gram of each candidate subset of n-grams has at least one corresponding word sense; identifying one or more candidate subsets of n-grams with a number of corresponding n-grams less than or equal to each other candidate subset of n-grams; and if there is exactly one identified candidate subset of n-grams, then; identifying as the subset of n-grams the identified candidate subset of n-grams; and if there is a plurality of identified candidate subsets of n-grams, then; calculating, for each identified candidate subset of n-grams, a relatedness-score for the identified candidate subset of n-grams based on one or more similarity metrics of an embedding of each n-gram of the identified candidate subset of n-grams and embeddings of each other n-gram of the identified candidate subset of n-grams; and identifying as the subset of n-grams an identified candidate subset with a highest relatedness-score.
-
-
16. One or more computer-readable non-transitory storage media embodying software that is operable when executed to:
-
receive, from a client system of a user of an online social network, a query inputted by the user, wherein the query comprises a plurality of n-grams; parse the query to identify a subset of n-grams of the plurality of n-grams; generate, for each identified n-gram, an embedding of the n-gram, wherein embeddings correspond to points in a d-dimensional embedding space; determine, for each identified n-gram, one or more word senses corresponding to one or more embeddings of the word senses, respectively; calculate, for each determined word sense for each identified n-gram, a relatedness-score for the word sense based on one or more similarity metrics of the embedding of the word sense and the embeddings of each of the one or more other word senses corresponding to the other identified n-grams, respectively; select, for each identified n-gram, one of the one or more word senses determined for the identified n-gram having one or more relatedness-scores, respectively, the selected word sense having a highest relatedness-score of the one or more respective relatedness-scores; identify one or more objects matching at least a portion of the query; rank each identified object based on a quality of matching of the object to one or more selected word senses; and send, to the client system in response to the query, a search-results interface for display, wherein the search-results interface comprises one or more search results corresponding to one or more of the identified objects, respectively, each identified object corresponding to a search result having a rank greater than a threshold rank.
-
-
17. The media of claim 16, wherein parsing the query comprises:
-
rewriting the query to remove one or more of the plurality of n-grams; and identifying the subset of n-grams as the remaining n-grams.
-
-
18. The media of claim 16, wherein the one or more similarity metrics are calculated based on one or more cosine similarities of the embedding of the word sense and the embeddings of each of the one or more other word senses corresponding to the other identified n-grams, respectively.
-
19. The media of claim 16, wherein for each identified n-gram, the one or more word senses are determined based on one or more similarity metrics of the embedding of the n-gram and the embeddings of the one or more word senses, respectively.
-
20. A system comprising:
- one or more processors; and
a non-transitory memory coupled to the processors comprising instructions executable by the processors, the processors operable when executing the instructions to;receive, from a client system of a user of an online social network, a query inputted by the user, wherein the query comprises a plurality of n-grams; parse the query to identify a subset of n-grams of the plurality of n-grams; generate, for each identified n-gram, an embedding of the n-gram, wherein embeddings correspond to points in a d-dimensional embedding space; determine, for each identified n-gram, one or more word senses corresponding to one or more embeddings of the word senses, respectively; calculate, for each determined word sense for each identified n-gram, a relatedness-score for the word sense based on one or more similarity metrics of the embedding of the word sense and the embeddings of each of the one or more other word senses corresponding to the other identified n-grams, respectively; select, for each identified n-gram, one of the one or more word senses determined for the identified n-gram having one or more relatedness-scores, respectively, the selected word sense having a highest relatedness-score of the one or more respective relatedness-scores; identify one or more objects matching at least a portion of the query; rank each identified object based on a quality of matching of the object to one or more selected word senses; and send, to the client system in response to the query, a search-results interface for display, wherein the search-results interface comprises one or more search results corresponding to one or more of the identified objects, respectively, each identified object corresponding to a search result having a rank greater than a threshold rank.
- one or more processors; and
Specification