SYNTAX-BASED AUGMENTATION OF STATISTICAL MACHINE TRANSLATION PHRASE TABLES
First Claim
1. A computer-implemented process for augmenting a machine translation phrase table with additional phrase pairs each pair of which associates a phrase in a source language with a phrase in a target language, comprising:
- using a computer to perform the following process actions;
inputting one or more syntactic transfer patterns, each of said patterns defining the syntax of a translation to the target language of a different phrase structure in the source language, wherein each source language phrase type represents a phase having a particular syntactic structure that is different from the other source language phrase types;
for each inputted syntactic transfer pattern,synthesizing phrases in the source language of the type associated with the pattern under consideration using a lexicon of the source language,eliminating synthesized phrases not found in a monolingual corpus of the source language,for each remaining synthesized phrase, translating the synthesized phrase into the target language using the syntactic transfer pattern under consideration, a bilingual source-to-target language dictionary, and a morphological synthesizer to properly inflect the words of each translated phrase, andfor each translated phrase, adding a phrase pair comprising the translated phrase and its corresponding source language phrase to the current version of the phrase table to produce an augmented version of the phrase table.
2 Assignments
0 Petitions
Accused Products
Abstract
Machine translation phrase table augmentation embodiments are described that employ an automatic syntax-based scheme to produce additional phrase pairs and insert them into a phrase table. One general process implementing this augmentation involves inputting one or more syntactic transfer patterns, and for each pattern synthesizing phrases in a source language of the type associated with the pattern using a source language lexicon. Phrases, such as those not found in a monolingual corpus of the source language, are eliminated from the synthesized phrases. Each of the remaining synthesized phrases is then translated into the target language using the syntactic transfer pattern, a bilingual source-to-target language dictionary, and a morphological synthesizer. Those translated phrases not found in a monolingual corpus of the target language are then eliminated. Phrase pairs made up of a remaining translated phrase and its corresponding source language phrase are then added to the phrase table being augmented.
31 Citations
20 Claims
-
1. A computer-implemented process for augmenting a machine translation phrase table with additional phrase pairs each pair of which associates a phrase in a source language with a phrase in a target language, comprising:
-
using a computer to perform the following process actions; inputting one or more syntactic transfer patterns, each of said patterns defining the syntax of a translation to the target language of a different phrase structure in the source language, wherein each source language phrase type represents a phase having a particular syntactic structure that is different from the other source language phrase types; for each inputted syntactic transfer pattern, synthesizing phrases in the source language of the type associated with the pattern under consideration using a lexicon of the source language, eliminating synthesized phrases not found in a monolingual corpus of the source language, for each remaining synthesized phrase, translating the synthesized phrase into the target language using the syntactic transfer pattern under consideration, a bilingual source-to-target language dictionary, and a morphological synthesizer to properly inflect the words of each translated phrase, and for each translated phrase, adding a phrase pair comprising the translated phrase and its corresponding source language phrase to the current version of the phrase table to produce an augmented version of the phrase table. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-implemented process for augmenting a machine translation phrase table with additional phrase pairs each pair of which associates a phrase in a source language with a phrase in a target language, comprising:
-
using a computer to perform the following process actions; inputting one or more syntactic transfer patterns, each of said patterns defining the syntax of a translation to the target language of a different phrase structure in the source language, wherein each source language phrase type represents a phase having a particular syntactic structure that is different from the other source language phrase types; for each inputted syntactic transfer pattern, synthesizing phrases in the source language of the type associated with the pattern under consideration using a lexicon of the source language, eliminating synthesized phrases not found in a monolingual corpus of the source language, for each remaining synthesized phrase, translating the synthesized phrase into the target language using the syntactic transfer pattern under consideration, a bilingual source-to-target language dictionary, and a morphological synthesizer to properly inflect the words of each translated phrase, eliminating translated phrases not found in a monolingual corpus of the target language, for each remaining translated phrase, adding a phrase pair comprising the translated phrase and its corresponding source language phrase to the current version of the phrase table to produce an augmented version of the phrase table. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A computer-readable storage medium having computer-executable instructions stored thereon for augmenting a machine translation phrase table with additional phrase pairs each pair of which associates a phrase in a source language with a phrase in a target language, said computer-executable instructions comprising:
-
inputting one or more syntactic transfer patterns, each of said patterns defining the syntax of a translation to the target language of a different phrase structure in the source language, wherein each source language phrase type represents a phase having a particular syntactic structure that is different from the other source language phrase types; for each inputted syntactic transfer pattern, synthesizing phrases in the source language of the type associated with the pattern under consideration using a lexicon of the source language, eliminating synthesized phrases not found in a monolingual corpus of the source language, for each remaining synthesized phrase, translating the synthesized phrase into the target language using the syntactic transfer pattern under consideration, a bilingual source-to-target language dictionary, and a morphological synthesizer to properly inflect the words of each translated phrase, eliminating translated phrases not found in a monolingual corpus of the target language, and for each remaining translated phrase, generating a phrase pair comprising the translated phrase and its corresponding source language phrase, eliminating phrase pairs that are already found in a current version of the phrase table, and adding the generated phrase pair to the current version of the phrase table. - View Dependent Claims (17, 18, 19, 20)
-
Specification