METHOD AND APPARATUS FOR BILINGUAL WORD ALIGNMENT, METHOD AND APPARATUS FOR TRAINING BILINGUAL WORD ALIGNMENT MODEL
First Claim
1. A method for bilingual word alignment, comprising:
- training a bilingual word alignment model using a word-aligned labeled bilingual corpus;
word-aligning a plurality of bilingual sentence pairs in a unlabeled bilingual corpus using said bilingual word alignment model;
determining whether the word alignment of each of said plurality of bilingual sentence pairs is correct, and if it is correct, adding the bilingual sentence pair into the labeled bilingual corpus and removing the bilingual sentence pair from the unlabeled bilingual corpus;
retraining the bilingual word alignment model using the expanded labeled bilingual corpus; and
re-word-aligning the remaining bilingual sentence pairs in the unlabeled bilingual corpus using the retrained bilingual word alignment model.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention provides method and apparatus for bilingual word alignment, method and apparatus for training bilingual word alignment model. The method for bilingual word alignment, comprising: training a bilingual word alignment model using a word-aligned labeled bilingual corpus; word-aligning a plurality of bilingual sentence pairs in a unlabeled bilingual corpus using said bilingual word alignment model; determining whether the word alignment of each of said plurality of bilingual sentence pairs is correct, and if it is correct, adding the bilingual sentence pair into the labeled bilingual corpus and removing the bilingual sentence pair from the unlabeled bilingual corpus; retraining the bilingual word alignment model using the expanded labeled bilingual corpus; and re-word-aligning the remaining bilingual sentence pairs in the unlabeled bilingual corpus using the retrained bilingual word alignment model.
-
Citations
23 Claims
-
1. A method for bilingual word alignment, comprising:
-
training a bilingual word alignment model using a word-aligned labeled bilingual corpus;
word-aligning a plurality of bilingual sentence pairs in a unlabeled bilingual corpus using said bilingual word alignment model;
determining whether the word alignment of each of said plurality of bilingual sentence pairs is correct, and if it is correct, adding the bilingual sentence pair into the labeled bilingual corpus and removing the bilingual sentence pair from the unlabeled bilingual corpus;
retraining the bilingual word alignment model using the expanded labeled bilingual corpus; and
re-word-aligning the remaining bilingual sentence pairs in the unlabeled bilingual corpus using the retrained bilingual word alignment model. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for training bilingual word alignment model, comprising:
-
training an initial bilingual word alignment model using a word-aligned labeled bilingual corpus;
word-aligning a plurality of bilingual sentence pairs in a unlabeled bilingual corpus using said initial bilingual word alignment model;
determining whether the word alignment of each of said plurality of bilingual sentence pairs is correct, and if it is correct, adding the bilingual sentence pair into the labeled bilingual corpus and removing the bilingual sentence pair from the unlabeled bilingual corpus;
training a bilingual word alignment model using the expanded labeled bilingual corpus. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. An apparatus for bilingual word alignment, comprising:
-
a model training unit configured to train a bilingual word alignment model using a word-aligned labeled bilingual corpus;
a word-aligning unit configured to word-align a plurality of bilingual sentence pairs in a unlabeled bilingual corpus using said bilingual word alignment model;
a determining unit configured to determine whether the word alignment of each of said plurality of bilingual sentence pairs is correct, and if it is correct, to add the bilingual sentence pair into the labeled bilingual corpus and to remove the bilingual sentence pair from the unlabeled bilingual corpus;
a model retraining unit configured to retrain the bilingual word alignment model using the labeled bilingual corpus expanded by said determining unit; and
a re-word-aligning unit configured to re-word-align the remaining bilingual sentence pairs in the unlabeled bilingual corpus using the retrained bilingual word alignment model. - View Dependent Claims (14, 15, 16, 17)
-
-
18. An apparatus for training bilingual word alignment model, comprising:
-
an initial model training unit configured to train an initial bilingual word alignment model using a word-aligned labeled bilingual corpus;
a word-aligning unit configured to word-align a plurality of bilingual sentence pairs in a unlabeled bilingual corpus using said initial bilingual word alignment model;
a determining unit configured to determine whether the word alignment of each of said plurality of bilingual sentence pairs is correct, and if it is correct, to add the bilingual sentence pair into the labeled bilingual corpus and removing the bilingual sentence pair from the unlabeled bilingual corpus;
a model training unit configured to train a bilingual word alignment model using the labeled bilingual corpus expanded by said determining unit. - View Dependent Claims (19, 20, 21, 22, 23)
-
Specification