Lexical and phrasal feature domain adaptation in statistical machine translation
First Claim
1. A translation method adapted to a domain of interest comprising:
- receiving a source text string comprising a sequence of source words in a source language;
generating a set of candidate translations of the source text string, each candidate translation comprising a sequence of target words in a target language; and
with a processor, identifying an optimal translation from the set of candidate translations as a function of at least one domain-adapted feature, the at least one domain-adapted feature being computed based on;
bilingual probabilities, each bilingual probability being for a source text fragment and a target text fragment of the source text string and candidate translation respectively, the bilingual probabilities being estimated on an out-of-domain parallel corpus comprising source and target strings; and
monolingual probabilities for text fragments of one of the source text string and candidate translation, the monolingual probabilities being estimated on an in-domain monolingual corpus,wherein the domain-adapted feature comprises at least one of;
a) a forward domain-adapted lexical feature which is a function of
1 Assignment
0 Petitions
Accused Products
Abstract
A translation method is adapted to a domain of interest. The method includes receiving a source text string comprising a sequence of source words in a source language and generating a set of candidate translations of the source text string, each candidate translation comprising a sequence of target words in a target language. An optimal translation is identified from the set of candidate translations as a function of at least one domain-adapted feature computed based on bilingual probabilities and monolingual probabilities. Each bilingual probability is for a source text fragment and a target text fragment of the source text string and candidate translation respectively. The bilingual probabilities are estimated on an out-of-domain parallel corpus that includes source and target strings. The monolingual probabilities for text fragments of one of the source text string and candidate translation are estimated on an in-domain monolingual corpus.
233 Citations
24 Claims
-
1. A translation method adapted to a domain of interest comprising:
-
receiving a source text string comprising a sequence of source words in a source language; generating a set of candidate translations of the source text string, each candidate translation comprising a sequence of target words in a target language; and with a processor, identifying an optimal translation from the set of candidate translations as a function of at least one domain-adapted feature, the at least one domain-adapted feature being computed based on; bilingual probabilities, each bilingual probability being for a source text fragment and a target text fragment of the source text string and candidate translation respectively, the bilingual probabilities being estimated on an out-of-domain parallel corpus comprising source and target strings; and monolingual probabilities for text fragments of one of the source text string and candidate translation, the monolingual probabilities being estimated on an in-domain monolingual corpus, wherein the domain-adapted feature comprises at least one of; a) a forward domain-adapted lexical feature which is a function of - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A translation system adapted to a domain of interest comprising:
-
memory which stores; a bilingual probability for each of a set of biphrases estimated on an associated out-of-domain parallel corpus comprising source and target strings, each biphrase comprising a text fragment in the source language and a text fragment in a target language; and a monolingual probability for each of a set of text fragments estimated on an associated in-domain monolingual corpus, each of the text fragments occurring in at least one of the biphrases in the set of biphrases; memory which stores; a candidate translation generator for generating a set of candidate translations of a source text string, the source string comprising a sequence of source words in a source language, each candidate translation comprising a sequence of target words in a target language; a translation evaluation component for identifying an optimal translation from the set of candidate translations as a function of at least one domain-adapted feature, the at least one domain-adapted feature being computed based on; the respective bilingual probabilities for the source and target text fragments of the source text string and candidate translation, and the respective monolingual probabilities for text fragments of at least one of the source text string and candidate translation, the domain adapted feature comprising at least one domain adapted lexical feature selected from; - View Dependent Claims (22)
-
-
23. A method for adapting a machine translation system for a domain of interest, comprising:
-
providing a biphrase library comprising a set of biphrases, each biphrase comprising a source text fragment in a source language and a target text fragment in a target language, the biphrases being associated with bilingual probabilities estimated on a parallel corpus of text strings in the source and target languages; providing a first monolingual corpus for the domain of interest comprising text strings in one of the source language and target language; computing a monolingual text fragment probability for each of a set of text fragments found in the biphrase library in the one of the source language and target language, estimated on the first monolingual corpus; generating weights for features of a scoring function, at least one of the features being a domain-adapted feature that is to be computed based on a sum, over each aligned pair of a source fragment of a source text string and a target text fragment of a candidate translation, of product of; a bilingual probability retrieved from the biphrase library, the bilingual probability being for the respective source and target text fragments; and a monolingual probability for the respective text fragments of one of the source text string and the candidate translation. - View Dependent Claims (24)
-
Specification