System and method for incrementally updating a reordering model for a statistical machine translation system
First Claim
1. A method for updating a reordering model of a statistical machine translation system comprising:
- at a first time, receiving new training data for retraining an existing statistical machine translation system, the new training data comprising at least one sentence pair, each of the at least one sentence pair comprising a source sentence in a source language and a target sentence in a target language;
extracting phrase pairs from the new training data, each phrase pair including a source language phrase and a target language phrase;
generating a new reordering file from the extracted phrase pairs, the new reordering file including a set of the phrase pairs extracted from the new training data;
updating a reordering model of the existing statistical machine translation system based on the new reordering file, the reordering model including a reordering table, the reordering table comprising phrase pairs and a set of features, the set of features comprising, for each of a set of orientation types, at least one feature which is a function of a count of the orientation type for the respective phrase pair, each phrase pair in the reordering table occurring only once, and wherein the updating of the reordering model includes merging an existing reordering table with the new reordering file or merging the existing reordering table with a new reordering table generated from the new reordering file, the merging including updating feature scores for each of the orientation types for at least some of the phrase pairs based on the counts stored in the existing reordering table;
at a second time after the first time, receiving new training data for training the existing statistical machine translation system, the new training data comprising at least one sentence pair, the sentence pair comprising a source sentence in the source language and a target sentence in the target language; and
reiterating the extracting of phrase pairs, generating of the new reordering file and the updating the reordering model based on the new training data received at the second time,wherein at least one of the extracting phrase pairs, generating the new reordering file, and updating the reordering model is performed with a computer processor.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for updating a reordering model of a statistical machine translation system includes, at a first time, receiving new training data for retraining an existing statistical machine translation system, the new training data including at least one sentence pair, each pair including a source sentence in a source language and a target sentence in a target language. Phrase pairs are extracted from the new training data and used to generate a new reordering file. A reordering model of the existing statistical machine translation system is updated, based on the new reordering file. The reordering model includes a reordering table. At a second time after the first time, new training data is received. The extracting of phrase pairs, generating of the new reordering file and the updating the reordering model is reiterated, based on the new training data received at the second time.
21 Citations
19 Claims
-
1. A method for updating a reordering model of a statistical machine translation system comprising:
-
at a first time, receiving new training data for retraining an existing statistical machine translation system, the new training data comprising at least one sentence pair, each of the at least one sentence pair comprising a source sentence in a source language and a target sentence in a target language; extracting phrase pairs from the new training data, each phrase pair including a source language phrase and a target language phrase; generating a new reordering file from the extracted phrase pairs, the new reordering file including a set of the phrase pairs extracted from the new training data; updating a reordering model of the existing statistical machine translation system based on the new reordering file, the reordering model including a reordering table, the reordering table comprising phrase pairs and a set of features, the set of features comprising, for each of a set of orientation types, at least one feature which is a function of a count of the orientation type for the respective phrase pair, each phrase pair in the reordering table occurring only once, and wherein the updating of the reordering model includes merging an existing reordering table with the new reordering file or merging the existing reordering table with a new reordering table generated from the new reordering file, the merging including updating feature scores for each of the orientation types for at least some of the phrase pairs based on the counts stored in the existing reordering table; at a second time after the first time, receiving new training data for training the existing statistical machine translation system, the new training data comprising at least one sentence pair, the sentence pair comprising a source sentence in the source language and a target sentence in the target language; and reiterating the extracting of phrase pairs, generating of the new reordering file and the updating the reordering model based on the new training data received at the second time, wherein at least one of the extracting phrase pairs, generating the new reordering file, and updating the reordering model is performed with a computer processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A system for updating a reordering model of a statistical machine translation system comprising:
-
a phrase pair extraction component which, at each of a plurality of times, extracts phrase pairs from new training data, the new training data comprising at least one sentence pair, each of the at least one sentence pair comprising a source sentence in a source language and a target sentence in a target language, each phrase pair including a source language phrase and a target language phrase; a reordering file generation component which, at each of the plurality of times, generates a new reordering file, the new reordering file including only phrase pairs extracted from the new training data and their associated orientation types; an update component which, at each of the plurality of times, updates a reordering model of an existing statistical machine translation system based on the new reordering file, the reordering model including a reordering table, the reordering table comprising phrase pairs and a set of features, the set of features comprising, for each of a set of orientation types, at least one feature which is a function of a count of the orientation type for the respective phrase pair, each phrase pair in the reordering table occurring only once, and wherein the updating of the reordering model includes one of; merging the new reordering file with an existing reordering file created in a prior iteration to generate a merged reordering file and creating the updated reordering table from the merged reordering file, merging the new reordering file with an existing reordering table created in a prior iteration, the existing reordering table tracking occurrence counts of phrase pairs in the reordering table, to generate a new reordering table for the reordering model, and merging an existing reordering table with a new reordering table generated from the new reordering file, the merging including updating feature scores for each of the orientation types for at least some of the phrase pairs based on counts stored in the existing reordering table; and a processor which implements the phrase pair extraction component, reordering file generation component, and update component.
-
-
19. A method for updating a reordering model of a statistical machine translation system comprising:
-
at a first time, receiving new training data, the new training data comprising sentence pairs, each of the sentence pairs comprising a source sentence in a source language and a target sentence in a target language; extracting phrase pairs from the new training data, each phrase pair including a source language phrase and a target language phrase; generating a new reordering file from the extracted phrase pairs, the new reordering file including only phrase pairs extracted from the new training data and their associated orientation types; updating a reordering model of the existing statistical machine translation system based on the new reordering file and an existing reordering table of the reordering model, the existing reordering table comprising phrase pairs and a set of features, the set of features comprising, for each of a set of orientation types, at least one feature which is a function of a count of the orientation type for the respective phrase pair, the updating including accumulating counts of the extracted phrase pairs and stored counts of corresponding phrase pairs in the existing reordering table, each phrase pair in the updated reordering table occurring only once; and repeating the receiving new training data, extracting of phrase pairs, generating of the new reordering file, and the updating the reordering model at least once at a subsequent time, wherein at least one of the extracting phrase pairs, generating the new reordering file, and updating the reordering model is performed with a processor.
-
Specification