MACHINE TRANSLATION USING OVERLAPPING BIPHRASE ALIGNMENTS AND SAMPLING
First Claim
1. A method for training a translation scoring model for a statistical machine translation system, comprising:
- receiving a set of source sentences, each of the source sentences comprising source words in a source language;
for each source sentence, generating a target sentence comprising target words in a target language which are translations of the source words;
generating a plurality of translation neighbors based on the target sentence, each translation neighbor comprising at least some of the target words of the target sentence;
for each of the plurality of translation neighbors, computing a phrase alignment between the source sentence and the translation neighbor;
scoring each translation neighbor with a translation scoring model, based on the computed phrase alignment between the source sentence and the translation neighbor;
ranking a plurality of the translation neighbors based on the translation model scores;
updating parameters of the model based on a comparison of the ranking based on the translation model scores with an external ranking of the plurality of translation neighbors; and
iterating, at least once more, the generating of translation neighbors, scoring, ranking, and updating parameters, wherein in the generating of the plurality of translation neighbors of the target sentence, the target sentence is a translation neighbor from a prior iteration.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for machine translation are disclosed. Source sentences are received. For each source sentence, a target sentence comprising target words is generated. A plurality of translation neighbors of the target sentence is generated. Phrase alignments are computed between the source sentence and the translation neighbor. Translation neighbors are scored with a translation scoring model, based on the phrase alignment. Translation neighbors are ranked, based on the scores. In training the model, parameters of the model are updated based on an external ranking of the ranked translation neighbors. The generating of translation neighbors, scoring, ranking, and, in the case of training, updating the parameters, are iterated with one of the translation neighbors as the target sentence. In the case of decoding, one of the translation neighbors is output as a translation. The system and method may be at least partially implemented with a computer processor.
-
Citations
23 Claims
-
1. A method for training a translation scoring model for a statistical machine translation system, comprising:
-
receiving a set of source sentences, each of the source sentences comprising source words in a source language; for each source sentence, generating a target sentence comprising target words in a target language which are translations of the source words; generating a plurality of translation neighbors based on the target sentence, each translation neighbor comprising at least some of the target words of the target sentence; for each of the plurality of translation neighbors, computing a phrase alignment between the source sentence and the translation neighbor; scoring each translation neighbor with a translation scoring model, based on the computed phrase alignment between the source sentence and the translation neighbor; ranking a plurality of the translation neighbors based on the translation model scores; updating parameters of the model based on a comparison of the ranking based on the translation model scores with an external ranking of the plurality of translation neighbors; and iterating, at least once more, the generating of translation neighbors, scoring, ranking, and updating parameters, wherein in the generating of the plurality of translation neighbors of the target sentence, the target sentence is a translation neighbor from a prior iteration. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 22)
-
-
17. A machine translation system comprising
memory for storing a set of source sentences, each of the source sentences comprising source words in a source language; -
memory storing instructions for; for each source sentence, generating a target sentence comprising target words in a target language which are translations of the source words; generating a plurality of translation neighbors of the target sentence, each translation neighbor comprising at least some of the target words of the target sentence; for each of the plurality of translation neighbors, computing a phrase alignment between the source sentence and the translation neighbor; scoring each translation neighbor with a translation scoring model, based on the computed phrase alignment between the source sentence and the translation neighbor; ranking a plurality of the translation neighbors based on the translation model scores; if the source sentences are being used for training the model, updating parameters of the model based on a comparison of the translation model score-based ranking with an external ranking of the plurality of translation neighbors; iterating, at least once more, the generating of translation neighbors, scoring, ranking, and, in the case of training, updating the parameters, wherein in the generating of the plurality of translation neighbors of the target sentence, the target sentence is a translation neighbor from a prior iteration; and if the source sentence is received for decoding, outputting a translation neighbor as a translation of the source sentence; if the source sentence is received for training, storing the trained model generated in one of the iterations in memory; and a processor in communication with the memory for executing the instructions. - View Dependent Claims (18)
-
-
19. A method for machine translation comprising:
-
receiving text to be translated comprising at least one source sentence each of the source sentences comprising source words in a source language; for each of the at least one source sentence, generating a target sentence comprising target words in a target language which are translations of the source words; generating a plurality of translation neighbors of the target sentence, each translation neighbor comprising at least some of the target words of the target sentence; for each of the plurality of translation neighbors, computing a phrase alignment between the source sentence and the translation neighbor; scoring each translation neighbor with a translation scoring model, based on the computed phrase alignment between the source sentence and the translation neighbor; ranking a plurality of the translation neighbors based on the translation model scores; iterating, at least once more, the generating of translation neighbors, scoring, and ranking, wherein in the generating of the plurality of translation neighbors of the target sentence, the target sentence is a translation neighbor from a prior iteration; and outputting one of the translation neighbors as a translation of the source sentence based on the ranking. - View Dependent Claims (20, 21)
-
-
23. A machine translation system comprising
memory for storing a set of source sentences, each of the source sentences comprising source words in a source language; -
memory storing instructions for; for each source sentence, generating a target sentence comprising target words in a target language which are translations of the source words; in a first iteration, generating a plurality of translation neighbors of the target sentence, each translation neighbor comprising at least some of the target words of the target sentence; for each of the plurality of translation neighbors, computing a phrase alignment between the source sentence and the translation neighbor; scoring each translation neighbor with a translation scoring model, based on the computed phrase alignment between the source sentence and the translation neighbor; ranking a plurality of the translation neighbors based on the translation model scores; if the source sentences are being used for training the model, updating parameters of the model based on a comparison of the translation model score-based ranking with an external ranking of the plurality of translation neighbors; iterating, at least once, the generating of translation neighbors, scoring, ranking, and, in the case of training, updating the parameters, wherein in the generating of the plurality of translation neighbors of the target sentence, the target sentence is a translation neighbor from a prior iteration; and if the source sentence is received for decoding, outputting a translation neighbor as a translation of the source sentence; if the source sentence is received for training, storing the trained model generated in one of the iterations in memory; and a processor in communication with the memory for executing the instructions.
-
Specification