TECHNIQUES FOR TRANSLITERATING INPUT TEXT FROM A FIRST CHARACTER SET TO A SECOND CHARACTER SET
First Claim
1. A computer-implemented method comprising:
- receiving, at a computing device having one or more processors, input text in a first character set;
determining, at the computing device, a set of possible transliterations of the input text based on a plurality of mapping standards, each possible transliteration of the set of possible transliterations corresponding to a transliteration of the input text into a second character set corresponding to a target language, each mapping standard of the plurality of mapping standards defining a mapping of characters in the first character set to characters in the second character set, and each mapping standard having an associated transliteration probability, each transliteration probability being indicative of a likelihood that its corresponding mapping standard is appropriate for transliterating the input text to the second character set;
determining a transliteration score for each of the possible transliterations based on the transliteration probabilities, the transliteration score being indicative of a likelihood that its corresponding possible transliteration is an accurate transliteration of the input text;
determining, at the computing device, a set of candidate words in the target language based on the set of possible transliterations and a text corpus of the target language, wherein the set of candidate words includes words in the text corpus that match one of the set of possible transliterations, that are similar to one of the set of possible transliterations, and sound similar to one of the set of possible transliterations;
determining, at the computing device, a likelihood score for each one of the set of candidate words based on a language model in the target language and one or more previous words received, each likelihood score being indicative of a probability that a corresponding candidate word corresponds to the input text;
providing, from the computing device, one or more candidate words of the set of candidate words based on the likelihood scores;
receiving a user selection indicating one of the candidate words;
determining, at the computing device, a particular mapping standard of the plurality of mapping standards on which the selected candidate word was based; and
adjusting, at the computing device, the transliteration probabilities based on the determination of the particular mapping standard.
2 Assignments
0 Petitions
Accused Products
Abstract
Computer implemented techniques for performing transliteration of input text in a first character set to a second character set are disclosed. The techniques include receiving input text and determining a set of possible transliterations of the input text based on a plurality of mapping standards. Each mapping standard defines a mapping of characters in the first character set to characters in the second character set. The techniques further include determining a set of candidate words in the target language based on the possible transliterations and a text corpus. The techniques also include determining a likelihood score for each one of the candidate words based on a language model in the target language previously received words. The techniques also include providing one or more candidate words based on the likelihood scores and receiving a user selection indicating one of the candidate words.
21 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
receiving, at a computing device having one or more processors, input text in a first character set; determining, at the computing device, a set of possible transliterations of the input text based on a plurality of mapping standards, each possible transliteration of the set of possible transliterations corresponding to a transliteration of the input text into a second character set corresponding to a target language, each mapping standard of the plurality of mapping standards defining a mapping of characters in the first character set to characters in the second character set, and each mapping standard having an associated transliteration probability, each transliteration probability being indicative of a likelihood that its corresponding mapping standard is appropriate for transliterating the input text to the second character set; determining a transliteration score for each of the possible transliterations based on the transliteration probabilities, the transliteration score being indicative of a likelihood that its corresponding possible transliteration is an accurate transliteration of the input text; determining, at the computing device, a set of candidate words in the target language based on the set of possible transliterations and a text corpus of the target language, wherein the set of candidate words includes words in the text corpus that match one of the set of possible transliterations, that are similar to one of the set of possible transliterations, and sound similar to one of the set of possible transliterations; determining, at the computing device, a likelihood score for each one of the set of candidate words based on a language model in the target language and one or more previous words received, each likelihood score being indicative of a probability that a corresponding candidate word corresponds to the input text; providing, from the computing device, one or more candidate words of the set of candidate words based on the likelihood scores; receiving a user selection indicating one of the candidate words; determining, at the computing device, a particular mapping standard of the plurality of mapping standards on which the selected candidate word was based; and adjusting, at the computing device, the transliteration probabilities based on the determination of the particular mapping standard.
-
-
2. A computer-implemented method comprising:
-
receiving, at a computing device having one or more processors, input text in a first character set; determining, at the computing device, a set of possible transliterations of the input text based on a plurality of mapping standards, each possible transliteration of the set of possible transliterations corresponding to a transliteration of the input text into a second character set corresponding to a target language, each mapping standard of the plurality of mapping standards defining a mapping of characters in the first character set to characters in the second character set; determining, at the computing device, a set of candidate words in the target language based on the set of possible transliterations and a text corpus of the target language, the text corpus corresponding to a set of known words in the target language; determining, at the computing device, a likelihood score for each one of the set of candidate words based on a language model in the target language and one or more previous words received, each likelihood score being indicative of a probability that a corresponding candidate word corresponds to the input text; providing, from the computing device, to a user device one or more candidate words of the set of candidate words based on the likelihood scores; and receiving, at the computing device, a user selection indicating one of the candidate words. - View Dependent Claims (3, 4, 5, 6, 7, 8, 10, 11)
-
-
9. The computer-implemented method of claim 9, wherein the words that sound similar to one or more possible transliterations are determined by a Soundex algorithm.
-
12. A computing device, comprising:
-
an input device that receives user input indicating input text in a first character set; a transliteration determination module configured to receive the input text and determine a set of possible transliterations of the input text based on a plurality of mapping standards and the input text, each possible transliteration of the set of possible transliterations corresponding to a transliteration of the input text into a second character set corresponding to a target language, each mapping standard of the plurality of mapping standards defining a mapping of characters in the first character set to characters in the second character set; a candidate word determination module configured to determine a set of candidate words in the target language based on the set of possible transliterations and a text corpus of the target language, the text corpus corresponding to a set of known words in the target language; and a word selection module configured to; a) determine a likelihood score for each candidate in the set of candidate words based on a language model in the target language and one or more previous words received, each likelihood score being indicative of a probability that its corresponding candidate word corresponds to the input text, b) provide one or more candidate words of the set of candidate words based on the likelihood scores of the one or more candidate words, and c) receive a user selection indicating one of the candidate words. - View Dependent Claims (13, 14, 15, 16, 17, 18, 20)
-
-
19. The computing device of claim 19, wherein the candidate word determination module determines the words that sound similar to one or more possible transliterations based on a Soundex algorithm.
Specification