Semi-supervised training for statistical word alignment
First Claim
Patent Images
1. A method for aligning words in parallel segments, the method comprising:
- calculating a first probability distribution, utilizing a processor and a memory, according to a model estimate of word alignments within a first corpus comprising word-level unaligned parallel segments, the model estimate comprising an N-best list of one or more sub-models;
modifying the model estimate according to the first probability distribution;
discriminatively re-ranking one or more sub-models associated with the modified model estimate according to word-level annotated parallel segments; and
calculating a second probability distribution of the word alignments within the first corpus according to the re-ranked sub-models associated with the modified model estimate;
wherein discriminatively re-ranking one or more sub-models within the modified model estimate according to manual alignments further comprises;
adding manual alignments to hypothesized alignments within the first corpus;
comparing the manual alignments to the hypothesized alignments; and
weighting the one or more sub-models according to the comparison; and
wherein the comparing of the manual alignments to the hypothesized alignments comprises;
comparing an updated weighting factor for each sub-model derived using the first corpus to randomly generated weighting factors; and
selecting one of the updated weighting factor and the randomly generated weighting factor that generates a least amount of error.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for aligning words in parallel segments is provided. A first probability distribution of word alignments within a first corpus comprising unaligned word-level parallel segments according to a model estimate is calculated. The model estimate is modified according to the first probability distribution. One or more sub-models associated with the modified model estimate are discriminatively re-ranked according to word-level annotated parallel segments. A second probability distribution of the word alignments within the first corpus is calculated according to the re-ranked sub-models associated with the modified model estimate.
341 Citations
14 Claims
-
1. A method for aligning words in parallel segments, the method comprising:
-
calculating a first probability distribution, utilizing a processor and a memory, according to a model estimate of word alignments within a first corpus comprising word-level unaligned parallel segments, the model estimate comprising an N-best list of one or more sub-models; modifying the model estimate according to the first probability distribution; discriminatively re-ranking one or more sub-models associated with the modified model estimate according to word-level annotated parallel segments; and calculating a second probability distribution of the word alignments within the first corpus according to the re-ranked sub-models associated with the modified model estimate; wherein discriminatively re-ranking one or more sub-models within the modified model estimate according to manual alignments further comprises; adding manual alignments to hypothesized alignments within the first corpus; comparing the manual alignments to the hypothesized alignments; and weighting the one or more sub-models according to the comparison; and wherein the comparing of the manual alignments to the hypothesized alignments comprises; comparing an updated weighting factor for each sub-model derived using the first corpus to randomly generated weighting factors; and selecting one of the updated weighting factor and the randomly generated weighting factor that generates a least amount of error. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer program embodied on a non-transitory computer readable medium having instructions for aligning words in parallel segments comprising:
-
calculating a first probability distribution of word alignments within a first corpus comprising unaligned parallel segments according to a model estimate, the model estimate comprising an N-best list of one or more sub-models; modifying the model estimate according to the probability distribution; discriminatively re-ranking one or more sub-models within the modified model estimate according to annotated parallel segments; and calculating a second probability distribution of the word alignments within the first corpus according to the re-ranked modified model estimate; wherein discriminatively re-ranking one or more sub-models within the modified model estimate according to manual alignments further comprises; adding manual alignments to hypothesized alignments within the first corpus; comparing the manual alignments to the hypothesized alignments; and weighting the one or more sub-models according to the comparison; wherein; the weighting of the one or more sub-models according to the comparison is according to at least one weighting factor; and the discriminative re-ranking of the one or more sub-models within the modified model estimate according to manual alignments further comprises refining at least one of the at least one weighting factors using a one-dimensional error minimization until there is no further error reduction; and wherein the refining of the at least one weighting factor further comprises calculating a piecewise constant function that evaluates an error of the word alignments selected by a best word alignment equation keeping the at least one weighting factor for each of the one or more sub-models constant except for one of the at least one weighting factor for the sub-model being evaluated. - View Dependent Claims (10, 11, 12, 13, 14)
-
Specification