Method and apparatus for bilingual word alignment, method and apparatus for training bilingual word alignment model
First Claim
1. A method for bilingual word alignment by a processor executing instructions, comprising:
- training a bilingual word alignment model using a word-aligned labeled bilingual corpus;
word-aligning a plurality of bilingual sentence pairs in an unlabeled bilingual corpus using said bilingual word alignment model;
determining whether the word alignment of each of said plurality of bilingual sentence pairs is correct, and when the word alignment is correct, adding the bilingual sentence pair into the labeled bilingual corpus and removing the bilingual sentence pair from the unlabeled bilingual corpus;
retraining the bilingual word alignment model using the expanded labeled bilingual corpus; and
re-word-aligning the remaining bilingual sentence pairs in the unlabeled bilingual corpus using the retrained bilingual word alignment model,said step of training a bilingual word alignment model comprising;
training a forward bilingual word alignment model using the word-aligned labeled bilingual corpus; and
training a backward bilingual word alignment model using the word-aligned labeled bilingual corpus,said step of word-aligning a plurality of bilingual sentence pairs in an unlabeled bilingual corpus comprising;
forward-word-aligning each of said plurality of bilingual sentence pairs using said forward bilingual word alignment model; and
backward-word-aligning each of said plurality of bilingual sentence pairs using said backward bilingual word alignment model, andsaid step of determining whether the word alignment of each of said plurality of bilingual sentence pairs is correct comprising;
calculating an intersection set between the forward-word-aligning result and the backward-word-aligning result of the bilingual sentence pair;
calculating a union set between the forward-word-aligning result and the backward-word-aligning result of the bilingual sentence pair; and
determining, when the ratio of an element number of said intersection set to an element number of said union set is greater than a predetermined threshold, the word alignment of said bilingual sentence pair is correct.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides method and apparatus for bilingual word alignment, method and apparatus for training bilingual word alignment model. The method for bilingual word alignment, comprising: training a bilingual word alignment model using a word-aligned labeled bilingual corpus; word-aligning a plurality of bilingual sentence pairs in a unlabeled bilingual corpus using said bilingual word alignment model; determining whether the word alignment of each of said plurality of bilingual sentence pairs is correct, and if it is correct, adding the bilingual sentence pair into the labeled bilingual corpus and removing the bilingual sentence pair from the unlabeled bilingual corpus; retraining the bilingual word alignment model using the expanded labeled bilingual corpus; and re-word-aligning the remaining bilingual sentence pairs in the unlabeled bilingual corpus using the retrained bilingual word alignment model.
21 Citations
12 Claims
-
1. A method for bilingual word alignment by a processor executing instructions, comprising:
-
training a bilingual word alignment model using a word-aligned labeled bilingual corpus; word-aligning a plurality of bilingual sentence pairs in an unlabeled bilingual corpus using said bilingual word alignment model; determining whether the word alignment of each of said plurality of bilingual sentence pairs is correct, and when the word alignment is correct, adding the bilingual sentence pair into the labeled bilingual corpus and removing the bilingual sentence pair from the unlabeled bilingual corpus; retraining the bilingual word alignment model using the expanded labeled bilingual corpus; and re-word-aligning the remaining bilingual sentence pairs in the unlabeled bilingual corpus using the retrained bilingual word alignment model, said step of training a bilingual word alignment model comprising; training a forward bilingual word alignment model using the word-aligned labeled bilingual corpus; and training a backward bilingual word alignment model using the word-aligned labeled bilingual corpus, said step of word-aligning a plurality of bilingual sentence pairs in an unlabeled bilingual corpus comprising; forward-word-aligning each of said plurality of bilingual sentence pairs using said forward bilingual word alignment model; and backward-word-aligning each of said plurality of bilingual sentence pairs using said backward bilingual word alignment model, and said step of determining whether the word alignment of each of said plurality of bilingual sentence pairs is correct comprising; calculating an intersection set between the forward-word-aligning result and the backward-word-aligning result of the bilingual sentence pair; calculating a union set between the forward-word-aligning result and the backward-word-aligning result of the bilingual sentence pair; and determining, when the ratio of an element number of said intersection set to an element number of said union set is greater than a predetermined threshold, the word alignment of said bilingual sentence pair is correct. - View Dependent Claims (2, 3)
-
-
4. A method for training a bilingual word alignment model by a processor executing instructions, comprising:
-
training an initial bilingual word alignment model using a word-aligned labeled bilingual corpus; word-aligning a plurality of bilingual sentence pairs in an unlabeled bilingual corpus using said initial bilingual word alignment model; determining whether the word alignment of each of said plurality of bilingual sentence pairs is correct, and when the word alignment is correct, adding the bilingual sentence pair into the labeled bilingual corpus and removing the bilingual sentence pair from the unlabeled bilingual corpus; and training a bilingual word alignment model using the expanded labeled bilingual corpus, said step of training an initial bilingual word alignment model comprising; training a forward initial bilingual word alignment model using the word-aligned labeled bilingual corpus; and training a backward initial bilingual word alignment model using the word-aligned labeled bilingual corpus, said step of word-aligning a plurality of bilingual sentence pairs in an unlabeled bilingual corpus comprising; forward-word-aligning each of said plurality of bilingual sentence pairs using said forward initial bilingual word alignment model; and backward-word-aligning each of said plurality of bilingual sentence pairs using said backward initial bilingual word alignment model, and said step of determining whether the word alignment of each of said plurality of bilingual sentence pairs is correct comprising; calculating an intersection set between the forward-word-aligning result and the backward-word-aligning result of the bilingual sentence pair; calculating a union set between the forward-word-aligning result and the backward-word-aligning result of the bilingual sentence pair; and determining, when the ratio of an element number of said intersection set to an element number of said union set is greater than a predetermined threshold, the word alignment of said bilingual sentence pair is correct. - View Dependent Claims (5, 6)
-
-
7. An apparatus for bilingual word alignment, comprising:
-
a model training unit configured to train a bilingual word alignment model using a word-aligned labeled bilingual corpus; a word-aligning unit configured to word-align a plurality of bilingual sentence pairs in an unlabeled bilingual corpus using said bilingual word alignment model; a determining unit configured to determine whether the word alignment of each of said plurality of bilingual sentence pairs is correct, and when the word alignment is correct, to add the bilingual sentence pair into the labeled bilingual corpus and to remove the bilingual sentence pair from the unlabeled bilingual corpus; a model retraining unit configured to retrain the bilingual word alignment model using the labeled bilingual corpus expanded by said determining unit; and a re-word-aligning unit configured to re-word-align the remaining bilingual sentence pairs in the unlabeled bilingual corpus using the retrained bilingual word alignment model, wherein said model training unit trains a forward bilingual word alignment model using the word-aligned labeled bilingual corpus, and trains a backward bilingual word alignment model using the word-aligned labeled bilingual corpus, said word-aligning unit forward-word-aligns each of said plurality of bilingual sentence pairs using said forward bilingual word alignment model, and backward-word-aligns each of said plurality of bilingual sentence pairs using said backward bilingual word alignment model, said determining unit calculates an intersection set between the forward-word-aligning result and the backward-word-aligning result of the bilingual sentence pair, calculates a union set between the forward-word-aligning result and the backward-word-aligning result of the bilingual sentence pair, and determines, when the ratio of an element number of said intersection set to an element number of said union set is greater than a predetermined threshold, the word alignment of said bilingual sentence pair is correct. - View Dependent Claims (8, 9)
-
-
10. An apparatus for training bilingual word alignment model, comprising:
-
an initial model training unit configured to train an initial bilingual word alignment model using a word-aligned labeled bilingual corpus; a word-aligning unit configured to word-align a plurality of bilingual sentence pairs in an unlabeled bilingual corpus using said initial bilingual word alignment model; a determining unit configured to determine whether the word alignment of each of said plurality of bilingual sentence pairs is correct, and when the word alignment is correct, to add the bilingual sentence pair into the labeled bilingual corpus and removing the bilingual sentence pair from the unlabeled bilingual corpus; and a model training unit configured to train a bilingual word alignment model using the labeled bilingual corpus expanded by said determining unit, wherein said initial model training unit trains a forward initial bilingual word alignment model using the word-aligned labeled bilingual corpus, and trains a backward initial bilingual word alignment model using the word-aligned labeled bilingual corpus; said word-aligning unit forward-word-aligns each of said plurality of bilingual sentence pairs using said forward initial bilingual word alignment model, and backward-word-aligns each of said plurality of bilingual sentence pairs using said backward initial bilingual word alignment model; said determining unit calculates an intersection set between the forward-word-aligning result and the backward-word-aligning result of the bilingual sentence pair, calculates a union set between the forward-word-aligning result and the backward-word-aligning result of the bilingual sentence mix, and determines the word alignment of said bilingual sentence pair is correct when the ratio of an element number of said intersection set to an element number of said union set is greater than a predetermined threshold. - View Dependent Claims (11, 12)
-
Specification