Contextual speller models on online social networks
First Claim
1. A method comprising, by one or more computing devices:
- receiving, from a client system of a first user of an online social network, a search query comprising one or more n-grams;
determining, based on a contextual speller model, that at least one n-gram of the one or more n-grams is misspelled, wherein the contextual speller model is based at least on a standard language model and a personal language model customized for the first user based on social-networking data associated with the first user;
identifying, for each misspelled n-gram, one or more variant-tokens based at least on the search query and the contextual speller model;
generating one or more unique combinations of the n-grams and variant-tokens, wherein each unique combination comprises a variant-token corresponding to each misspelled n-gram;
calculating a relevance-score for each unique combination based at least in part on the search query and the contextual speller model, wherein the relevance-score for a unique combination is based on a comparison of a probability associated with the n-grams or variant tokens of the unique combination in the standard language model of the contextual speller model to a probability associated with the n-grams or variant tokens of the unique combination in the personal language model of the contextual speller model;
generating one or more corrected queries, each corrected query comprising a unique combination having a relevance-score greater than a threshold relevance-score; and
sending, to the client system of the first user for display in response to receiving the search query, one or more of the corrected queries.
2 Assignments
0 Petitions
Accused Products
Abstract
In one embodiment, a method includes receiving a search query including one or more n-grams, determining for each n-gram if a contextual speller model indicates the n-gram is misspelled, identifying for each misspelled n-gram one or more variant-tokens based at least on the search query and a contextual speller model, generating one or more unique combinations of the n-grams and variant-tokens, where each unique combination includes a variant-token corresponding to each misspelled n-gram, calculating a relevance-score for each unique combination based at least in part on the search query and the contextual speller model, generating one or more corrected queries, where each corrected query includes a unique combination having a relevance-score greater than a threshold relevance-score, and sending one or more of the corrected queries to a user for display.
-
Citations
37 Claims
-
1. A method comprising, by one or more computing devices:
-
receiving, from a client system of a first user of an online social network, a search query comprising one or more n-grams; determining, based on a contextual speller model, that at least one n-gram of the one or more n-grams is misspelled, wherein the contextual speller model is based at least on a standard language model and a personal language model customized for the first user based on social-networking data associated with the first user; identifying, for each misspelled n-gram, one or more variant-tokens based at least on the search query and the contextual speller model; generating one or more unique combinations of the n-grams and variant-tokens, wherein each unique combination comprises a variant-token corresponding to each misspelled n-gram; calculating a relevance-score for each unique combination based at least in part on the search query and the contextual speller model, wherein the relevance-score for a unique combination is based on a comparison of a probability associated with the n-grams or variant tokens of the unique combination in the standard language model of the contextual speller model to a probability associated with the n-grams or variant tokens of the unique combination in the personal language model of the contextual speller model; generating one or more corrected queries, each corrected query comprising a unique combination having a relevance-score greater than a threshold relevance-score; and sending, to the client system of the first user for display in response to receiving the search query, one or more of the corrected queries.
-
-
2. The method of claim 1, further comprising:
-
receiving from the first user a selection of one of the corrected queries; identifying one or more objects matching the selected query; and sending, to the client system of the first user, a search-result page responsive to the selected query, the search-results page comprising one or more references to one or more of the identified objects, respectively.
-
-
3. The method of claim 1, wherein identifying one or more variant-tokens for each misspelled n-gram comprises:
accessing, for each misspelled n-gram, the contextual speller model to identify the variant-tokens having probabilities of appearing in the search query greater than a threshold probability.
-
4. The method of claim 1, wherein calculating the relevance-score for each unique combination based at least in part on the search query and the contextual speller model comprises:
-
accessing, for each variant-token or n-gram of the unique combination, the contextual speller model to retrieve a probability of the variant-token or n-gram appearing in the search query; and calculating the relevance-score for the unique combination based at least on one or more of the retrieved probabilities.
-
-
5. The method of claim 1, wherein calculating the relevance-score for each unique combination based at least in part on the search query and the contextual speller model comprises:
-
accessing, for each variant-token of the unique combination, the contextual speller model to determine a probability of the variant-token being correctly-spelled; and calculating the relevance-score for the unique combination based at least on one or more of the determined probabilities corresponding to the variant-tokens of the unique combination.
-
-
6. The method of claim 1, wherein the standard language model comprises a plurality of n-grams corresponding to social-networking data of all users or entities of the online social network.
-
7. The method of claim 1, wherein the personal language model comprises a plurality of n-grams and associated metadata, the metadata associated with each n-gram comprising one or more of:
-
a frequency of use of the n-gram in the data forming a basis for the personal language model; a time context associated with the n-gram;
ora social context associated with the n-gram.
-
-
8. The method of claim 1, wherein the personal language model comprises a plurality of n-grams extracted from one or more of:
-
one or more feed searches of the first user on the online social network; one or more posts viewed by the first user on the online social network; one or more posts viewed by a second user on the online social network, wherein the posts as viewed by the second user are associated with the first user of the online social network; one or more likes of the first user on the online social network; one or more previous search results of the first user on the online social network; a profile of the first user on the online social network;
orany combination thereof.
-
-
9. The method of claim 1, wherein the personal language model is time-invariant.
-
10. The method of claim 1, wherein calculating the relevance-score for each unique combination based at least in part on the contextual speller model comprises modifying the calculated relevance-score of each unique combination comprising an n-gram having a frequency of use in the personal language model different from a frequency of use in the standard language model greater than a threshold frequency of use.
-
11. The method of claim 10, wherein modifying the calculated relevance-score of each unique combination comprising the n-gram having the frequency of use in the personal language model different from the frequency of use in the standard language model greater than the threshold frequency of use comprises:
-
increasing a probability of the n-gram appearing in the search query, the n-gram having the frequency of use in the personal language model different from the frequency of use in the standard language model greater than or equal to the threshold frequency of use;
ordecreasing the probability of the n-gram appearing in the search query, the n-gram having the frequency of use in the personal language model different from the frequency of use in the standard language model less than the threshold frequency of use.
-
-
12. The method of claim 1, wherein the personal language model is further customized based on social-networking data associated with a first group of users.
-
13. The method of claim 1, wherein the social-networking data associated with the first user comprises data associated with the first user retrieved from the online social network within a pre-determined time range.
-
14. The method of claim 1, wherein the social-networking data associated with the first user comprises:
-
demographic information of the first user;
orone or more concepts of the online social network connected to the first user.
-
-
15. The method of claim 1, wherein the contextual speller model comprises a plurality of speller sub-models.
-
16. The method of claim 15, wherein one of the plurality of speller sub-models corresponds to social-networking data associated with a particular time context associated with the first user.
-
17. The method of claim 15, wherein one of the plurality of speller sub-models corresponds to social-networking data associated with a particular social context associated with the first user.
-
18. The method of claim 15, wherein the plurality of speller sub-models are based at least on one or more levels of aggregation, and wherein each level of aggregation differentiates the first user from global users of the online social network.
-
19. One or more computer-readable non-transitory storage media embodying software that is operable when executed to:
-
receive, from a client system of a first user of an online social network, a search query comprising one or more n-grams; determine, based on a contextual speller model, that at least one n-gram of the one or more n-grams is misspelled, wherein the contextual speller model is based at least on a standard language model and a personal language model customized for the first user based on social-networking data associated with the first user; identify, for each misspelled n-gram, one or more variant-tokens based at least on the search query and the contextual speller model; generate one or more unique combinations of the n-grams and variant-tokens, wherein each unique combination comprises a variant-token corresponding to each misspelled n-gram; calculate a relevance-score for each unique combination based at least in part on the search query and the contextual speller model, wherein the relevance-score for a unique combination is based on a comparison of a probability associated with the n-grams or variant tokens of the unique combination in the standard language model of the contextual speller model to a probability associated with the n-grams or variant tokens of the unique combination in the personal language model of the contextual speller model; generate one or more corrected queries, each corrected query comprising a unique combination having a relevance-score greater than a threshold relevance-score; and send, to the client system of the first user for display in response to receiving the search query, one or more of the corrected queries.
-
-
20. A system comprising:
- one or more processors; and
a memory coupled to the processors comprising instructions executable by the processors, the processors operable when executing the instructions to;receive, from a client system of a first user of an online social network, a search query comprising one or more n-grams; determine, based on a contextual speller model, that at least one n-gram of the one or more n-grams is misspelled, wherein the contextual speller model is based at least on a standard language model and a personal language model customized for the first user based on social-networking data associated with the first user; identify, for each misspelled n-gram, one or more variant-tokens based at least on the search query and the contextual speller model; generate one or more unique combinations of the n-grams and variant-tokens, wherein each unique combination comprises a variant-token corresponding to each misspelled n-gram; calculate a relevance-score for each unique combination based at least in part on the search query and the contextual speller model, wherein the relevance-score for a unique combination is based on a comparison of a probability associated with the n-grams or variant tokens of the unique combination in the standard language model of the contextual speller model to a probability associated with the n-grams or variant tokens of the unique combination in the personal language model of the contextual speller model; generate one or more corrected queries, each corrected query comprising a unique combination having a relevance-score greater than a threshold relevance-score; and send, to the client system of the first user for display in response to receiving the search query, one or more of the corrected queries.
- one or more processors; and
-
21. The system of claim 20, wherein the processors are further operable when executing the instructions to:
-
receive from the first user a selection of one of the corrected queries; identify one or more objects matching the selected query; and send, to the client system of the first user, a search-result page responsive to the selected query, the search-results page comprising one or more references to one or more of the identified objects, respectively.
-
-
22. The system of claim 20, wherein the instructions to identify one or more variant-tokens for each misspelled n-gram further comprise instructions to:
access, for each misspelled n-gram, the contextual speller model to identify the variant-tokens having probabilities of appearing in the search query greater than a threshold probability.
-
23. The system of claim 20, wherein the instructions to calculate the relevance-score for each unique combination based at least in part on the search query and the contextual speller model further comprise instructions to:
-
access, for each variant-token or n-gram of the unique combination, the contextual speller model to retrieve a probability of the variant-token or n-gram appearing in the search query; and calculate the relevance-score for the unique combination based at least on one or more of the retrieved probabilities.
-
-
24. The system of claim 20, wherein the instructions to calculate the relevance-score for each unique combination based at least in part on the search query and the contextual speller model further comprise instructions to:
-
access, for each variant-token of the unique combination, the contextual speller model to determine a probability of the variant-token being correctly-spelled; and calculate the relevance-score for the unique combination based at least on one or more of the determined probabilities corresponding to the variant-tokens of the unique combination.
-
-
25. The system of claim 20, wherein the standard language model comprises a plurality of n-grams corresponding to social-networking data of all users or entities of the online social network.
-
26. The system of claim 20, wherein the personal language model comprises a plurality of n-grams and associated metadata, the metadata associated with each n-gram comprising one or more of:
-
a frequency of use of the n-gram in the data forming a basis for the personal language model; a time context associated with the n-gram;
ora social context associated with the n-gram.
-
-
27. The system of claim 20, wherein the personal language model comprises a plurality of n-grams extracted from one or more of:
-
one or more feed searches of the first user on the online social network; one or more posts viewed by the first user on the online social network; one or more posts viewed by a second user on the online social network, wherein the posts as viewed by the second user are associated with the first user of the online social network; one or more likes of the first user on the online social network; one or more previous search results of the first user on the online social network; a profile of the first user on the online social network;
orany combination thereof.
-
-
28. The system of claim 20, wherein the personal language model is time-invariant.
-
29. The system of claim 20, wherein the instructions to calculate the relevance-score for each unique combination based at least in part on the contextual speller model further comprise instructions to:
modify the calculated relevance-score of each unique combination comprising an n-gram having a frequency of use in the personal language model different from a frequency of use in the standard language model greater than a threshold frequency of use.
-
30. The system of claim 29, wherein the instructions to modify the calculated relevance-score of each unique combination comprising the n-gram having the frequency of use in the personal language model different from the frequency of use in the standard language model greater than the threshold frequency of use further comprise instructions to:
-
increase a probability of the n-gram appearing in the search query, the n-gram having the frequency of use in the personal language model different from the frequency of use in the standard language model greater than or equal to the threshold frequency of use;
ordecrease the probability of the n-gram appearing in the search query, the n-gram having the frequency of use in the personal language model different from the frequency of use in the standard language model less than the threshold frequency of use.
-
-
31. The system of claim 20, wherein the personal language model is further customized based on social-networking data associated with a first group of users.
-
32. The system of claim 20, wherein the social-networking data associated with the first user comprises data associated with the first user retrieved from the online social network within a pre-determined time range.
-
33. The system of claim 20, wherein the social-networking data comprises:
-
demographic information of the first user;
orone or more concepts of the online social network connected to the first user.
-
-
34. The system of claim 20, wherein the contextual speller model comprises a plurality of speller sub-models.
-
35. The system of claim 34, wherein one of the plurality of speller sub-models corresponds to social-networking data associated with a particular time context associated with the first user.
-
36. The system of claim 34, wherein one of the plurality of speller sub-models corresponds to social-networking data associated with a particular social context associated with the first user.
-
37. The system of claim 34, wherein the plurality of speller sub-models are based at least on one or more levels of aggregation, and wherein each level of aggregation differentiates the first user from global users of the online social network.
Specification