Transliterating semitic languages including diacritics
First Claim
1. A method of transliterating text, the method comprising:
- receiving a Romanized input text of a natural language that comprises a native character set that includes diacritics;
setting a threshold probability;
selecting at least one candidate transliteration rule in response to determining that a probability that the at least one candidate transliteration rule should apply is at least equal to the threshold probability;
applying each selected candidate transliteration rule to the Romanized input text to transliterate the Romanized input text into at least one corresponding candidate diacritized text in the native character set of the natural language;
computing a confidence score for each candidate diacritized text;
ranking each candidate diacritized text based at least on the computed confidence scores; and
outputting at least one candidate diacritized text based at least on the ranking.
2 Assignments
0 Petitions
Accused Products
Abstract
The present disclosure describes a system and method of transliterating Semitic languages with support for diacritics. An input module receives and pre-processes Romanized character and forwards the pre-processed Romanized characters to a transliteration engine. The transliteration engine selects candidate transliteration rules, applies the rules, and scores and ranks the results for output. To optimize search for candidate transliteration rules, the transliteration engine may apply word-stemming strategies to process inflections indicated by affixes. The present disclosure further describes optimizations as pre-processing emphasis text, caching, dynamic transliteration rule pruning, and buffering/throttling input. The system and methods are suitable for multiple applications including but not limited to web applications, windows applications, client-server applications and input method editors such as those via Microsoft Text Services Framework TSF™.
34 Citations
20 Claims
-
1. A method of transliterating text, the method comprising:
-
receiving a Romanized input text of a natural language that comprises a native character set that includes diacritics; setting a threshold probability; selecting at least one candidate transliteration rule in response to determining that a probability that the at least one candidate transliteration rule should apply is at least equal to the threshold probability; applying each selected candidate transliteration rule to the Romanized input text to transliterate the Romanized input text into at least one corresponding candidate diacritized text in the native character set of the natural language; computing a confidence score for each candidate diacritized text; ranking each candidate diacritized text based at least on the computed confidence scores; and outputting at least one candidate diacritized text based at least on the ranking. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method of inputting text into a transliteration engine, the method comprising:
-
receiving input text; buffering at least some of the received input text; detecting a first condition; terminating the buffering based at least on the detecting of the first condition; attempting to detect a second condition; setting a threshold probability; forwarding the buffered input text to the transliteration engine if the second condition is detected, wherein at least one transliteration rule is applied to the input text in response to determining that a probability that the at least one candidate transliteration rule should apply is at least equal to the threshold probability; and recommencing buffering if the second condition is not detected. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A system to transliterate text, the system comprising:
-
a first processor, a computer readable memory containing computer executable instructions, the instructions to perform acts that include; receiving a Romanized input text of a natural language that comprises a native character set that includes diacritics; selecting at least one candidate transliteration rule in response to determining that an absolute probability that the at least one candidate transliteration rule should apply is at least equal to a threshold probability; applying each selected candidate transliteration rule to the Romanized input text to transliterate the Romanized input text into at least one corresponding candidate diacritized text in the native character set of the natural language; computing an accuracy score for each candidate diacritized text; ranking each candidate diacritized text based at least on the computed accuracy scores; and outputting at least one candidate diacritized text based at least on the ranking. - View Dependent Claims (19, 20)
-
Specification