Method and system for natural language translation
First Claim
1. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to provide method steps for translating source text from a first language to target text in a second language different from the first language, said method steps comprising:
- measuring a value of the source text in the first language and storing the source text in a first memory buffer;
generating the target text in the second language based on a combination of a probability of occurrence of an intermediate structure of text associated with a target hypothesis selected from the second language using a target language model, and a probability of occurrence of the source text given the occurrence of said intermediate structure of text associated with said target hypothesis using a target-to-source translation model; and
performing at least one of a storing operation to save said target text in a second memory buffer and a presenting operation to make said target text available for at least one of a viewing and listening operation using an I/O device.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention is a system for translating text from a first source language into a second target language. The system assigns probabilities or scores to various target-language translations and then displays or makes otherwise available the highest scoring translations. The source text is first transduced into one or more intermediate structural representations. From these intermediate source structures a set of intermediate target-structure hypotheses is generated. These hypotheses are scored by two different models: a language model which assigns a probability or score to an intermediate target structure, and a translation model which assigns a probability or score to the event that an intermediate target structure is translated into an intermediate source structure. Scores from the translation model and language model are combined into a combined score for each intermediate target-structure hypothesis. Finally, a set of target-text hypotheses is produced by transducing the highest scoring target-structure hypotheses into portions of text in the target language. The system can either run in batch mode, in which case it translates source-language text into a target language without human assistance, or it can function as an aid to a human translator. When functioning as an aid to a human translator, the human may simply select from the various translation hypotheses provided by the system, or he may optionally provide hints or constraints on how to perform one or more of the stages of source transduction, hypothesis generation and target transduction.
327 Citations
34 Claims
-
1. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to provide method steps for translating source text from a first language to target text in a second language different from the first language, said method steps comprising:
-
measuring a value of the source text in the first language and storing the source text in a first memory buffer; generating the target text in the second language based on a combination of a probability of occurrence of an intermediate structure of text associated with a target hypothesis selected from the second language using a target language model, and a probability of occurrence of the source text given the occurrence of said intermediate structure of text associated with said target hypothesis using a target-to-source translation model; and performing at least one of a storing operation to save said target text in a second memory buffer and a presenting operation to make said target text available for at least one of a viewing and listening operation using an I/O device. - View Dependent Claims (2)
-
-
3. A computer program product, comprising:
a computer usable medium having computer readable program code means embodied in said medium for causing translation of source text from a first language to target text in a second language different from the first language, said computer readable program code means comprising, a computer readable program code means for causing a computer to receive a source text in the first language; a computer readable program code means for causing a computer to generate at least one target hypothesis, each of said target hypotheses comprising text selected from the second language; a computer readable program code means for causing a computer to estimate, for each target hypothesis, a first probability of occurrence of said text associated with said target hypothesis using a target language model; a computer readable program code means for causing a computer to estimate, for each target hypothesis, a second probability of occurrence of the source text given the occurrence of said text associated with said target hypothesis using a target-to-source translation model; a computer readable program code means for causing a computer to combine, for each target hypothesis, said first and second probabilities to produce a target hypothesis match score; and a computer readable program code means for causing a computer to perform at least one of a storing operation to save and a presenting operation to make available for at least one of a viewing and listening operation, at least one of said target hypotheses according to its associated match score. - View Dependent Claims (4, 5, 6, 7, 8)
-
9. A computer program product, comprising:
a computer usable medium having computer readable program code means embodied in said medium for causing translation of source text from a first language to target text in a second language different from the first language, said computer readable program code means comprising, a computer readable program code means for causing a computer to receive the source text in the first language and storing the source text in a first memory buffer; a computer readable program code means for causing a computer to receive one of zero or more user defined criteria pertaining to the source and target texts to thereby bound the target text; a computer readable program code means for causing a computer to access the source text from said first buffer; a computer readable program code means for causing a computer to transduce the source text into at least one intermediate source structure of text constrained by any of said user defined criteria; a computer readable program code means for causing a computer to generate at least one target hypothesis, each of said target hypotheses comprising an intermediate target structure of text selected from the second language constrained by any of said user defined criteria; a computer readable program code means for causing a computer to estimate a first score, said first score being proportional to a probability of occurrence of each intermediate target structure of text associated with said target hypotheses using a target structure language model; a computer readable program code means for causing a computer to estimate a second score, said second score being proportional to a probability that said intermediate target structure of text associated with said target hypotheses will translate into said intermediate source structure of text using a target structure-to-source structure translation model; a computer readable program code means for causing a computer to combine, for each target hypothesis, said first and second scores to produce a target hypothesis match score; a computer readable program code means for causing a computer to transduce each of said intermediate target structures of text into at least one transformed target hypothesis of text in the second language constrained by any of said user defined criteria; and a computer readable program code means for causing a computer to at least one of store said at least one transformed target hypothesis in a second memory buffer, and present said at least one transformed target hypothesis available for at least one of a viewing and listening operation according to its associated match score and said user defined criteria. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
22. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to provide method steps for text-to-text language translation, said method steps comprising:
-
building a parametric translation model to generate a modeled translation probability, comprising the steps of, storing a translation model source training text, storing a translation model target training text, and choosing a first specification of parameters for the translation model so that the modeled translation probability of the source and target training texts is a first unique local maximum value; building a parametric language model to generate a modeled probability, comprising the steps of, storing a language model training text, and choosing a second specification of parameters for the language model so that the modeled probability of the given training text is a second unique local maximum value; and performing text-to-text language translation using said parametric translation model and said parametric language model. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
-
34. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to provide method steps for translating a first text in a first language into a second text in a second language using a lexical model, said method steps comprising:
-
inputting the first text into the lexical model, wherein the lexical model comprises a parametric translation model for generating a first probability and a parametric language model for generating a second probability; and determining, using the lexical model, the second text in the second language that yields a unique local maximum value of a product of the first probability of the parametric translation model and the second probability of the parametric language model.
-
Specification