Terminological adaptation of statistical machine translation system through automatic generation of phrasal contexts for bilingual terms
First Claim
1. A method for terminological adaptation of a machine translation system comprising:
- receiving a set of vocabulary pairs, each vocabulary pair including a source term in a source language and a target term in a target language, each term being in a class which includes a set of sequences;
extracting contextual phrase pairs from a bilingual training corpus, each of the contextual phrase pairs including a source phrase and a target phrase, the source phrase including a source sequence of a same class as the source term of one of the vocabulary pairs and associated source context, the target phrase including a target sequence of a same class as the target term of the one of the vocabulary pairs and associated target context;
generating templates based on the extracted contextual phrase pairs, each template replacing the source and target sequences of a contextual phrase pair with respective source and target placeholders denoting the class;
generating at least one candidate phrase pair from each template, each candidate phrase pair replacing the source and target placeholders of the template with respective source and target terms of a vocabulary pair of the same class; and
incorporating at least some of the candidate phrase pairs into a phrase table of a machine translation system,wherein at least one of the extracting contextual phrase pairs, generating templates, and generating candidate phrase pairs is performed with a processor.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for terminological adaptation includes receiving a vocabulary pair including source and target language terms. Each term is in a class which includes a set of sequences. Contextual phrase pairs are extracted from a bilingual training corpus, each including source and target phrases. The phrases each include a sequence of the same class as the respective source and target terms as well as some associated context. Templates are generated, based on the contextual phrase pairs. In each template the source and target sequences of a contextual phrase pair are replaced with respective placeholders, each denoting the respective class of the sequence. Candidate phrase pairs are generated from these templates. In each candidate phrase pair, the placeholders of one of the templates are replaced with respective terms of a vocabulary pair of the same class. Some candidate phrase pairs are incorporated into a phrase table of a machine translation system.
-
Citations
20 Claims
-
1. A method for terminological adaptation of a machine translation system comprising:
-
receiving a set of vocabulary pairs, each vocabulary pair including a source term in a source language and a target term in a target language, each term being in a class which includes a set of sequences; extracting contextual phrase pairs from a bilingual training corpus, each of the contextual phrase pairs including a source phrase and a target phrase, the source phrase including a source sequence of a same class as the source term of one of the vocabulary pairs and associated source context, the target phrase including a target sequence of a same class as the target term of the one of the vocabulary pairs and associated target context; generating templates based on the extracted contextual phrase pairs, each template replacing the source and target sequences of a contextual phrase pair with respective source and target placeholders denoting the class; generating at least one candidate phrase pair from each template, each candidate phrase pair replacing the source and target placeholders of the template with respective source and target terms of a vocabulary pair of the same class; and incorporating at least some of the candidate phrase pairs into a phrase table of a machine translation system, wherein at least one of the extracting contextual phrase pairs, generating templates, and generating candidate phrase pairs is performed with a processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system for terminological adaptation of an associated machine translation system comprising:
-
an extraction component which extracts contextual phrase pairs from an associated bilingual training corpus, each of the contextual phrase pairs including a source phrase and a target phrase, the source phrase including a source sequence of a same class as a source term of an input vocabulary pair and respective associated context, the target phrase including a target sequence of a same class as a target term of the input vocabulary pair and respective associated context; a template generator which generates templates based on the extracted contextual phrase pairs, each template replacing the source and target sequences of a contextual phrase pair with respective source and target placeholders denoting the respective class; a phrase table entry generation component which generates at least one candidate phrase pair from each template, each candidate phrase pair replacing the source and target placeholders of a template with respective source and target terms of the input vocabulary pair, the phrase table entry generation component incorporating at least some of the candidate phrase pairs into a phrase table of a machine translation system; and a processor which implements the extraction component, template generator, and phrase table entry generation component. - View Dependent Claims (17, 18)
-
-
19. A method for terminological adaptation of a machine translation system to a target domain comprising:
-
extracting contextual phrase pairs from an out-of-domain bilingual training corpus of the form α
β
γ
α
′
β
′
γ
′
, each of the contextual phrase pairs including a source phrase α
β
γ and
a target phrase α
′
β
′
γ
′
, the source phrase including a source sequence β
, which is of a same class T as a source term δ
of an input vocabulary pair in a target domain, and respective associated context α
, γ
, the target phrase including a target sequence β
′
of a same class T′
as a target term δ
′
of the input vocabulary pair and respective associated context α
′
, γ
′
;based on the extracted contextual phrase pairs, generating candidate phrase pairs of the form α
δ
γ
α
′
δ
′
γ
′
, where δ
replaces β and
δ
′
replaces β
′
, respectively, in one of the extracted contextual phrase pairs;filtering the extracted candidate phrase pairs; and incorporating at least some of the remaining candidate phrase pairs into a phrase table of a machine translation system, wherein at least one of the extracting contextual phrase pairs and generating candidate phrase pairs is performed with a processor. - View Dependent Claims (20)
-
Specification