CONTEXT AWARE BACK-TRANSLITERATION AND TRANSLATION OF NAMES AND COMMON PHRASES USING WEB RESOURCES
First Claim
1. A system providing at least one analysis selected from the group comprising translating, transliterating and back-transliterating, the system comprising:
- an information extraction engine structured and arranged to receive an input from at least one electronic source documents;
a language detection module structured and arranged to classify the input based on an origin of the input;
a vocabulary module structured and arranged to map the input to an electronic database derived from words based on the origin of the input;
a transliteration module structured and arranged to produce multiple back-transliterated and translated forms for the input in a Romanized language and calculate confidence scores for the multiple back-transliterated and translated forms; and
an output module to provide the multiple translated and back-transliterated forms and the confidence score in a format compatible with machine translation systems for the input.
5 Assignments
0 Petitions
Accused Products
Abstract
Described within are systems and methods for transliterating and translating source non-Romanized language text strings from a plurality of electronic sources to Romanized target language text strings by converting the source non-Romanized language text strings to a standard document encoding format, splitting the source non-Romanized language text strings into smaller units, transforming the smaller units into entity profiles, processing the entities profiles with data from external databases, translating the entities in the entity profiles into a Romanized target language, and outputting the entities into a plurality of data formats for external systems.
329 Citations
18 Claims
-
1. A system providing at least one analysis selected from the group comprising translating, transliterating and back-transliterating, the system comprising:
-
an information extraction engine structured and arranged to receive an input from at least one electronic source documents; a language detection module structured and arranged to classify the input based on an origin of the input; a vocabulary module structured and arranged to map the input to an electronic database derived from words based on the origin of the input; a transliteration module structured and arranged to produce multiple back-transliterated and translated forms for the input in a Romanized language and calculate confidence scores for the multiple back-transliterated and translated forms; and an output module to provide the multiple translated and back-transliterated forms and the confidence score in a format compatible with machine translation systems for the input. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer based method providing at least one analysis selected from the group comprising translating, transliterating and back-transliterating, the method comprising:
-
receiving an input from at least one electronic document sources; classifying the input based on an origin of the input; splitting the input into at least one smaller unit with a tokenizer; converting the at least one smaller unit from a first encoding format into a second encoding format; transforming the at least one smaller unit in the second encoding formation in to at least one entity profile; processing the at least one entity profile with data from external databases; generating multiple back-transliteration and translation for each of the at least one entity profile; computing a confidence score for the multiple back-transliteration and translation; and outputting the multiple back-transliteration and translation and the confidence score into at least one format compatible with other external systems. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
Specification