Character and word level language models for out-of-vocabulary text input
First Claim
1. A method comprising:
- receiving, by a computing device, an indication of user-inputted text;
storing, by the computing device, a lexicon that includes a set of in-lexicon candidate strings and does not include a set of out-of-lexicon candidate strings, wherein each of the in-lexicon candidate strings and each of the out-of-lexicon candidate strings is a respective word in a particular language;
for each respective in-lexicon candidate string of the set of in-lexicon candidate strings, determining, by the computing device, a respective score for the respective in-lexicon candidate string, wherein;
the respective score is based at least in part on a probability of the respective in-lexicon candidate string as a whole being entered, andthe probability of the respective in-lexicon candidate string being entered is affected by a word-level context of the respective in-lexicon candidate string that includes one or more character strings that precede the respective in-lexicon candidate string in the user-inputted text;
for each respective out-of-lexicon candidate string of the set of out-of-lexicon candidate strings, determining, by the computing device, a respective score for the respective out-of-lexicon candidate string, wherein the respective score for the respective out-of-lexicon candidate string is based at least in part on respective probabilities of individual characters in the respective out-of-lexicon candidate string being entered and is not based on word-level probabilities;
determining, by the computing device and based at least in part on the scores for the in-lexicon candidate strings and the scores for the out-of-lexicon candidate strings, a combined set of candidate strings from the set of in-lexicon candidate strings and the set of out-of-lexicon candidate strings, the combined set of candidate strings including at least one in-lexicon candidate string from the set of in-lexicon candidate strings and at least one out-of-lexicon candidate string from the set of out-of-lexicon candidate strings;
outputting, by the computing device, at least a portion of the combined set of candidate strings for display; and
responsive to an indication of a selection of a candidate string from the combined set of candidate strings, outputting, by the computing device, for display in place of the user-inputted text, the selected candidate string.
3 Assignments
0 Petitions
Accused Products
Abstract
A computing device determines, based at least in part on indications of user input, scores for a first set of candidate strings and a second set of candidate strings. Each candidate string from the first set of candidate strings is in a lexicon. Candidate strings from the second set of candidate strings are not necessarily in the lexicon. The computing device determines the scores for the first set of candidate strings based on probabilities of the candidate strings being entered. For each candidate string from the second set of candidate strings, the computing device determines the scores for the candidate string based on probabilities of characters of the candidate string being entered. The computing device selects a candidate string based on the scores for the first and second sets of candidate strings and outputs, for display at the display device, the selected candidate string.
81 Citations
18 Claims
-
1. A method comprising:
-
receiving, by a computing device, an indication of user-inputted text; storing, by the computing device, a lexicon that includes a set of in-lexicon candidate strings and does not include a set of out-of-lexicon candidate strings, wherein each of the in-lexicon candidate strings and each of the out-of-lexicon candidate strings is a respective word in a particular language; for each respective in-lexicon candidate string of the set of in-lexicon candidate strings, determining, by the computing device, a respective score for the respective in-lexicon candidate string, wherein; the respective score is based at least in part on a probability of the respective in-lexicon candidate string as a whole being entered, and the probability of the respective in-lexicon candidate string being entered is affected by a word-level context of the respective in-lexicon candidate string that includes one or more character strings that precede the respective in-lexicon candidate string in the user-inputted text; for each respective out-of-lexicon candidate string of the set of out-of-lexicon candidate strings, determining, by the computing device, a respective score for the respective out-of-lexicon candidate string, wherein the respective score for the respective out-of-lexicon candidate string is based at least in part on respective probabilities of individual characters in the respective out-of-lexicon candidate string being entered and is not based on word-level probabilities; determining, by the computing device and based at least in part on the scores for the in-lexicon candidate strings and the scores for the out-of-lexicon candidate strings, a combined set of candidate strings from the set of in-lexicon candidate strings and the set of out-of-lexicon candidate strings, the combined set of candidate strings including at least one in-lexicon candidate string from the set of in-lexicon candidate strings and at least one out-of-lexicon candidate string from the set of out-of-lexicon candidate strings; outputting, by the computing device, at least a portion of the combined set of candidate strings for display; and responsive to an indication of a selection of a candidate string from the combined set of candidate strings, outputting, by the computing device, for display in place of the user-inputted text, the selected candidate string. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computing device comprising one or more processors configured to:
-
receive an indication of user-inputted text; store a lexicon that includes a set of in-lexicon candidate strings and does not include a set of out-of-lexicon candidate strings, wherein each of the in-lexicon candidate strings and each of the out-of-lexicon candidate strings is a respective word in a particular language; for each respective in-lexicon candidate string of the set of in-lexicon candidate strings, determine a respective score for the respective in-lexicon candidate string, wherein; the respective score for the respective in-lexicon candidate string is based at least in part on locations indicated by indications of user input and a probability of the respective in-lexicon candidate string as a whole being entered, the probability of the respective in-lexicon candidate string being entered is affected by a word-level context of the respective in-lexicon candidate string that includes one or more words that precede the respective in-lexicon candidate string in the user-inputted text; for each respective out-of-lexicon candidate string of the set of out-of-lexicon candidate strings, determine a respective score for the respective out-of-lexicon candidate string, wherein; the respective score for the respective out-of-lexicon candidate string is based at least in part on the locations indicated by the indications of user input and respective probabilities of individual characters in the respective out-of-lexicon candidate string being entered and is not based on word-level probabilities; determine, based at least in part on the scores for the in-lexicon candidate strings and the out-of-lexicon candidate strings, a combined set of candidate strings from among the set of in-lexicon candidate strings and the set of out-of-lexicon candidate strings, the combined set of candidate strings including at least one in-lexicon candidate string from the set of in-lexicon candidate strings and at least one out-of-lexicon candidate string from the set of out-of-lexicon candidate strings; output at least a portion of the combined set of candidate strings for display; and responsive to an indication of a selection of a candidate string from the combined set of candidate strings, output, for display in place of the user-inputted text, the selected candidate string.
-
-
18. A non-transitory computer-readable storage medium comprising instructions that, when executed by one or more processors of a computing device, configure the computing device to:
-
receive an indication of user-inputted text; store a lexicon that includes a set of in-lexicon candidate strings and does not include a set of out-of-lexicon candidate strings, wherein each of the in-lexicon candidate strings and each of the out-of-lexicon candidate strings is a respective word in a particular language; for each respective in-lexicon candidate string of the set of in-lexicon candidate strings, determine a respective score for the respective in-lexicon candidate string, wherein; the respective score for the respective in-lexicon candidate string is based at least in part on locations indicated by indications of user input and a probability of the respective in-lexicon candidate string as a whole being entered, and the probability of the respective in-lexicon candidate string being entered is affected by a word-level context of the respective in-lexicon candidate string that includes one or more words that precede the respective in-lexicon character string in the user-inputted text; for each respective out-of-lexicon candidate string of the set of out-of-lexicon candidate strings, determine a respective score for the respective out-of-lexicon candidate string, wherein; the respective score for the respective out-of-lexicon candidate string is based at least in part on one or more of the locations indicated by the indications of user input and probabilities of respective individual characters in the respective out-of-lexicon candidate string being entered, for each respective character of the respective out-of-lexicon candidate string, the probability of the respective probability of the respective character being entered is based at least in part on one or more characters that precede the respective character, and the respective score for the out-of-lexicon candidate string is not based on a word-level context of the respective out-of-lexicon candidate string and is not based on word-level probabilities; generate a combined set of candidate strings including at least one candidate string from the set of in-lexicon candidate strings and at least one candidate string from the set of out-of-lexicon candidate strings, wherein generating the combined set of candidate strings comprises; ranking, based at least in part on the scores for the in-lexicon candidate strings and the out-of-lexicon candidate strings, candidate strings from the set of in-lexicon candidate strings and the set of out-of-lexicon candidate strings; selecting, one or more highest-ranked candidate strings from among the set of in-lexicon candidate strings and the set of out-of-lexicon candidate strings; and
output the one or more highest-ranked candidate strings for display; andresponsive to an indication of a selection of a candidate string from the combined set of candidate strings, output, for display in place of the user-inputted text, the selected candidate string.
-
Specification