STATISTICAL MACHINE TRANSLATION SYSTEM AND METHOD FOR TRANSLATION OF TEXT INTO LANGUAGES WHICH PRODUCE CLOSED COMPOUND WORDS
First Claim
1. A machine translation method for translating source text from a first language to target text in a second language, comprising:
- receiving the source text in the first language;
accessing a library of bi-phrases, each of the bi-phrases including a text fragment from the first language and a text fragment from the second language, at least some of the bi-phrases comprising words tagged with restricted part of speech tags, at least one of the restricted part of speech tags configured for identifying a word from the second language as being one which also forms a part of a known closed compound word;
retrieving text fragments in the second language from the library corresponding to text fragments in the source text;
generating at least one target hypothesis, each of the target hypotheses comprising text fragments selected from the retrieved text fragments in the second language; and
evaluating the target hypothesis based at least in part on combinations of restricted part of speech tags; and
based on the evaluation, outputting one of the at least one target hypothesis as the optimal hypothesis for forming the translation.
1 Assignment
0 Petitions
Accused Products
Abstract
A translation system and method for translating source text from a first language to target text in a second language are disclosed. A library of bi-phrases is accessed to retrieve bi-phrases which each match a part of the source text. Each of the bi-phrases includes respective text fragments from the first and second language. Words of some (or all) of the bi-phrases are tagged with restricted part of speech (RPOS) tags. At least one of the RPOS tags is configured for identifying a word from the second language as being one which also forms a part of a closed compound word in the library. At least one target hypothesis is generated from the bi-phrases, which includes text fragments in the second language. The target hypothesis or hypotheses are evaluated, based at least in part on combinations of the restricted part of speech tags. Based on the evaluation, one of the at least one target hypothesis is output as the optimal hypothesis for forming the translation.
-
Citations
25 Claims
-
1. A machine translation method for translating source text from a first language to target text in a second language, comprising:
-
receiving the source text in the first language; accessing a library of bi-phrases, each of the bi-phrases including a text fragment from the first language and a text fragment from the second language, at least some of the bi-phrases comprising words tagged with restricted part of speech tags, at least one of the restricted part of speech tags configured for identifying a word from the second language as being one which also forms a part of a known closed compound word; retrieving text fragments in the second language from the library corresponding to text fragments in the source text; generating at least one target hypothesis, each of the target hypotheses comprising text fragments selected from the retrieved text fragments in the second language; and evaluating the target hypothesis based at least in part on combinations of restricted part of speech tags; and based on the evaluation, outputting one of the at least one target hypothesis as the optimal hypothesis for forming the translation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A machine translation system for translating source text from a first language to target text in a second language, comprising:
-
memory which stores a library of bi-phrases, each of the bi-phrases including a text fragment from the first language and a text fragment from the second language, words of at least some of the bi-phrases being tagged with restricted part of speech tags, at least one of the restricted part of speech tags configured for identifying a word of a text fragment from the second language as being one which also forms a part of a known closed compound word; and a processor which executes instructions stored in memory for retrieving text fragments in the second language from the library which correspond to text fragments in the source text, generating at least one target hypothesis, each of the target hypotheses comprising text fragments selected from the retrieved fragments in the second language, evaluating each of the target hypotheses with a translation scoring function which scores the hypothesis according to a plurality of features, at least one of the features comprising a feature which favors hypotheses comprising consecutive text fragments with restricted part of speech tags which indicate that the consecutive text fragments are ordered for forming a closed compound word, and, based on the evaluation, outputting a translation based on one of the target hypotheses. - View Dependent Claims (24)
-
-
25. A machine translation method for translating source text from a first language to target text in a second language, comprising:
-
receiving the source text in the first language; accessing a library of bi-phrases, each of the bi-phrases including a text fragment from the first language and a text fragment from the second language, at least some of the bi-phrases being tagged with restricted part of speech tags, the restricted part of speech tags including an NP tag which identifies a text fragment from the second language as being one which also forms a part of a known closed noun compound word other than in a head position of the closed noun compound word and an N tag which identifies at least one of a text fragment which appears in a closed noun compound word in the head position and another noun; retrieving text fragments from the second language from the library corresponding to text fragments in the source text; generating at least one target hypothesis, each of said target hypotheses comprising text fragments selected from the second language; and evaluating the target hypothesis based at least in part on combinations of restricted part of speech tags, the evaluating including at least one of; a) counting at least one of i) occurrences of combinations of NP-N and NP-NP which favor formation of closed compound words and ii) occurrences of NP immediately followed by a restricted part of speech tag other than N or NP, which disfavor formation of closed compound words, b) retrieving conditional probabilities of occurrence for subsequences of restricted part of speech tags in the target hypothesis and computing a combined probability based thereon; and based on the evaluation, outputting one of the at least one target hypothesis as the optimal hypothesis for forming the translation.
-
Specification