Machine Learning For Transliteration
First Claim
1. A method comprising:
- receiving from a user an input of a sequence of multiple input characters entered in an input script, the sequence being terminated by entry of a word-break character, the word-break character not being part of the sequence; and
using a transliteration model after entry of the word-break character to determine an output word in an output script from the sequence of multiple input characters.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer program products, for performing transliteration between text in different scripts. In one aspect, a method includes generating a transliteration model based on statistical information derived from parallel text having first text in an input script and corresponding second text in an output script; and using the transliteration model to transliterate input characters in the input script to output characters in the output script. In another aspect, a method includes performing word level transliterations. In another aspect, a method includes using an entry-aligned dictionary of source and target script pairs, in which, whenever a particular source word is mapped to multiple target words, the dictionary includes an entry for each target word including the same source word repeated in each entry. In another aspect, a method includes using phonetic scores of words in different scripts to identify corresponding parallel text.
-
Citations
105 Claims
-
1. A method comprising:
-
receiving from a user an input of a sequence of multiple input characters entered in an input script, the sequence being terminated by entry of a word-break character, the word-break character not being part of the sequence; and using a transliteration model after entry of the word-break character to determine an output word in an output script from the sequence of multiple input characters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10-31. -31. (canceled)
-
32. A method comprising:
-
generating a transliteration model based on statistical information derived from a corpus of parallel text having first text in an input script and corresponding second text in an output script; and using the transliteration model to transliterate a sequence of input characters in the input script to a sequence of output characters in the output script. - View Dependent Claims (33, 34, 35)
-
-
36. A computer program product, encoded on a computer-readable medium, operable to cause data processing apparatus to perform operations comprising:
-
receiving from a user an input of a sequence of multiple input characters entered in an input script, the sequence being terminated by entry of a word-break character, the word-break character not being part of the sequence; and using a transliteration model after entry of the word-break character to determine an output word in an output script from the sequence of multiple input characters. - View Dependent Claims (37, 38, 39, 40, 41, 42, 43, 44)
-
-
45-66. -66. (canceled)
-
67. A computer program product, encoded on a computer-readable medium, operable to cause data processing apparatus to perform operations comprising:
-
generating a transliteration model based on statistical information derived from a corpus of parallel text having first text in an input script and corresponding second text in an output script; and using the transliteration model to transliterate a sequence of input characters in the input script to a sequence of output characters in the output script. - View Dependent Claims (68, 69, 70)
-
-
71. A system comprising:
-
means for receiving from a user an input of a sequence of multiple input characters entered in an input script, the sequence being terminated by entry of a word-break character, the word-break character not being part of the sequence; and means for using a transliteration model after entry of the word-break character to determine an output word in an output script from the sequence of multiple input characters. - View Dependent Claims (72, 73, 74, 75, 76, 77, 78, 79)
-
-
80-101. -101. (canceled)
-
102. A system comprising:
-
means for generating a transliteration model based on statistical information derived from a corpus of parallel text having first text in an input script and corresponding second text in an output script; and means for using the transliteration model to transliterate a sequence of input characters in the input script to a sequence of output characters in the output script. - View Dependent Claims (103, 104, 105)
-
Specification