Dynamic bi-phrases for statistical machine translation
First Claim
1. A method for phrase-based translation comprising:
- receiving an input of source text in a source language to be translated into target text in a target language;
providing at least one dynamic bi-phrase rule to be used in generation of dynamic bi-phrases for translation of the source text;
for a sentence of the source text;
after receiving the source text, applying the at least one rule to the source text to generate a dynamic bi-phrase;
associating a value of at least one dynamic feature with the at least one dynamic bi-phrase;
retrieving static bi-phrases from a static bi-phrase table stored in memory which each include at least one word of the source text, each of the static bi-phrases being associated with a value of at least one static feature, each static bi-phrase in the bi-phrase table including a pair of phrases, each phrase comprising a sequence of at least one word, one of the phrases in the pair being from the source language and the other phrase being from the target language, the static bi-phrases having been automatically extracted from a training corpus of bisentences;
retrieving any of the dynamic bi-phrases which each cover at least one word of the source text, the retrieved static bi-phrases and dynamic bi-phrases forming a set of active bi-phrases;
generating translation hypotheses for at least a part of the source sentence using active bi-phrases from the set;
scoring the translation hypotheses with a translation scoring model which takes into account the static feature values of static bi-phrases in the hypothesis and dynamic feature values of dynamic-bi-phrases in the hypothesis, wherein in scoring of a hypothesis, the scoring model considers static bi-phrases in the active set of bi-phrases in which the source phrase of the respective static bi-phrase covers at least one source word of the source sentence, wherein the static bi-phrase and a dynamic bi-phrase in the active set of bi-phrases both cover a same source word, such that each source word of the hypothesis is covered by exactly one of the retrieved bi-phrases in the active set of bi-phrases; and
outputting a translation of the source text sentence based on the scoring of the hypotheses;
wherein at least one of the generating at least one dynamic bi-phrase, associating at least one dynamic feature value of a dynamic feature with the at least one dynamic bi-phrase, retrieving static bi-phrases from a static bi-phrase table stored in memory, retrieving any of the dynamic bi-phrases which each cover at least one word of the source text, generating translation hypotheses for at least a part of the source sentence using active bi-phrases from the set, and scoring the translation hypotheses with a translation scoring model is implemented by a computer processor.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and a method for phrase-based translation are disclosed. The method includes receiving source language text to be translated into target language text. One or more dynamic bi-phrases are generated, based on the source text and the application of one or more rules, which may be based on user descriptions. A dynamic feature value is associated with each of the dynamic bi-phrases. For a sentence of the source text, static bi-phrases are retrieved from a bi-phrase table, each of the static bi-phrases being associated with one or more values of static features. Any of the dynamic bi-phrases which each cover at least one word of the source text are also retrieved, which together form a set of active bi-phrases. Translation hypotheses are generated using active bi-phrases from the set and scored with a translation scoring model which takes into account the static and dynamic feature values of the bi-phrases used in the respective hypothesis. A translation, based on the hypothesis scores, is then output.
71 Citations
25 Claims
-
1. A method for phrase-based translation comprising:
-
receiving an input of source text in a source language to be translated into target text in a target language; providing at least one dynamic bi-phrase rule to be used in generation of dynamic bi-phrases for translation of the source text; for a sentence of the source text; after receiving the source text, applying the at least one rule to the source text to generate a dynamic bi-phrase; associating a value of at least one dynamic feature with the at least one dynamic bi-phrase; retrieving static bi-phrases from a static bi-phrase table stored in memory which each include at least one word of the source text, each of the static bi-phrases being associated with a value of at least one static feature, each static bi-phrase in the bi-phrase table including a pair of phrases, each phrase comprising a sequence of at least one word, one of the phrases in the pair being from the source language and the other phrase being from the target language, the static bi-phrases having been automatically extracted from a training corpus of bisentences; retrieving any of the dynamic bi-phrases which each cover at least one word of the source text, the retrieved static bi-phrases and dynamic bi-phrases forming a set of active bi-phrases; generating translation hypotheses for at least a part of the source sentence using active bi-phrases from the set; scoring the translation hypotheses with a translation scoring model which takes into account the static feature values of static bi-phrases in the hypothesis and dynamic feature values of dynamic-bi-phrases in the hypothesis, wherein in scoring of a hypothesis, the scoring model considers static bi-phrases in the active set of bi-phrases in which the source phrase of the respective static bi-phrase covers at least one source word of the source sentence, wherein the static bi-phrase and a dynamic bi-phrase in the active set of bi-phrases both cover a same source word, such that each source word of the hypothesis is covered by exactly one of the retrieved bi-phrases in the active set of bi-phrases; and outputting a translation of the source text sentence based on the scoring of the hypotheses; wherein at least one of the generating at least one dynamic bi-phrase, associating at least one dynamic feature value of a dynamic feature with the at least one dynamic bi-phrase, retrieving static bi-phrases from a static bi-phrase table stored in memory, retrieving any of the dynamic bi-phrases which each cover at least one word of the source text, generating translation hypotheses for at least a part of the source sentence using active bi-phrases from the set, and scoring the translation hypotheses with a translation scoring model is implemented by a computer processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system for phrase-based translation comprising:
-
memory which receives an input of source text in a source language to be translated into target text in a target language; a static bi-phrase table stored in memory, each of the static bi-phrases in the table being associated with at least one static feature value, each static bi-phrase in the bi-phrase table including a pair of phrases, each phrase comprising a sequence of at least one word, one of the phrases in the pair being from the source language and the other phrase being from the target language, the static bi-phrases having been automatically extracted from a training corpus of bisentences; a dynamic bi-phrase generator which associates at least one dynamic feature value with at least one dynamic bi-phrase which has been selected for use in translation of the source text into the target language; and a translation scoring model which is input with hypotheses built from an active set of bi-phrases and scores the hypotheses, the active set including a static bi-phrase covering at least one word of the source text and a dynamic bi-phrase covering the same at least one word, whereby the static bi-phrase and dynamic bi-phrase both cover a same source word of the source text, the model taking into account the static feature values of static bi-phrases that each include at least one of the source words in each of the hypotheses and dynamic feature values of any dynamic bi-phrases in each respective hypothesis. - View Dependent Claims (22, 23)
-
-
24. A method for phrase-based translation comprising:
-
providing a static bi-phrase table, each of the static bi-phrases being associated with a value of at least one static feature based on a frequency of the static bi-phrase in a training corpus of bisentences, each static bi-phrase in the bi-phrase table including a pair of phrases, each phrase comprising a sequence of at least one word, one of the phrases in the pair being from the source language and the other phrase being from the target language; receiving an input of source text in a source language to be translated into target text in a target language; after providing the static bi-phrase table, for a sentence of the source text; applying at least one rule which, when fired, generates at least one respective dynamic bi-phrase based on the source text in the sentence, the rule comprising a source phrase, target phrase pattern which is able to be instantiated by different bi-phrases and which is fired when an instance of the source phrase of the pattern is observed in the input source sentence; associating a respective value of at least one dynamic feature with the at least one dynamic bi-phrase; retrieving static bi-phrases from the static bi-phrase table which each cover at least one word of the source text; combining the retrieved static bi-phrases and any generated dynamic bi-phrases to form a set of active bi-phrases; when the active set includes a static bi-phrase covering at least one word of the source text and a dynamic bi-phrase covering the same at least one word whereby the static bi-phrase and dynamic bi-phrase both cover a same source word of the source text, generating translation hypotheses in the target language for at least a part of the source sentence using active bi-phrases drawn from the set of active bi-phrases, where each word in the hypothesis is covered by exactly one of the active bi-phrases; scoring the translation hypotheses with a translation scoring model which takes into account the static feature values of static bi-phrases in the hypothesis and dynamic feature values of dynamic-bi-phrases in the hypothesis; and outputting a translation of the source text sentence in the target language based on the scoring of the hypotheses; wherein at least one of the applying, associating, retrieving, combining, generating, and scoring is implemented by a computer processor. - View Dependent Claims (25)
-
Specification