Method and apparatus for language translation
First Claim
1. A translator for translating a phrase in a source language into a phrase in a target language, comprising:
- a plurality of head transducers, each head transducer associated with a pair of head words with corresponding meanings in the source and target languages, each head transducer for converting left and right ordered source language sequences of dependents of the source language head word into left and right ordered target language sequences of dependents of the target language head word;
a bilingual lexicon that associates each transducer with the pair of head words;
a parameter table that provides costs for each action taken by each head transducer;
a transduction search engine that generates a plurality of candidate translations of the source language phrase using the head transducers and provides a total cost for each of the candidate translations, wherein the total cost of a translation is the sum of the cost for all actions taken by each transducer involved in the translation; and
a target string selector that selects a best translation from the plurality of candidate translations by searching for the translation that has the lowest cost.
4 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for language translation are disclosed. The translator is based on finite state machines that can convert a pair of input symbol sequences to a pair of output symbol sequences. The translator includes a lexicon associating a finite state machine with a pair of head words with corresponding meanings in the source and target languages. The state machine for a source language head word w and a target language head word ν reads the dependent words of w to its left and right in a source sentence and proposes corresponding dependents to the left and right of ν in a target language sentence being constructed, taking account of the required word order for the target language. The state machines are used by a transduction search engine to generate a plurality of candidate translations via a recursive process wherein, a source language head word is first translated as described above, and then the heads of each of the dependent phrases are similarly translated, and then their dependents and so on. Only the state machines corresponding to the words in the source language string are activated and used by the search engine. The translator also includes a parameter table that provides costs for actions taken by each finite state machine in converting between the source language and the target language. The costs for machine transitions are indicative of the likelihood of co-occurence of pairs of words in the source language, and between corresponding pairs of words in the target language. The transduction search engine provides a total cost, using the parameter table, for each of the candidate translations. The total cost of a translation is the sum of the cost for all actions taken by each machine involved in the translation.
181 Citations
14 Claims
-
1. A translator for translating a phrase in a source language into a phrase in a target language, comprising:
-
a plurality of head transducers, each head transducer associated with a pair of head words with corresponding meanings in the source and target languages, each head transducer for converting left and right ordered source language sequences of dependents of the source language head word into left and right ordered target language sequences of dependents of the target language head word;
a bilingual lexicon that associates each transducer with the pair of head words;
a parameter table that provides costs for each action taken by each head transducer;
a transduction search engine that generates a plurality of candidate translations of the source language phrase using the head transducers and provides a total cost for each of the candidate translations, wherein the total cost of a translation is the sum of the cost for all actions taken by each transducer involved in the translation; and
a target string selector that selects a best translation from the plurality of candidate translations by searching for the translation that has the lowest cost. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-readable storage medium comprising encoded computer-readable program instructions for use in conjunction with a programmable computer, which instructions cause the computer to translate a phrase from a source language into a phrase in a target language utilizing a plurality of finite state transducers that convert a pair of left and right ordered source language sequences of dependents of a source language head word into a pair of left and right ordered target language sequences of dependents of a target language head word.
-
7. A translator for translating a string in a source language into at least one string in a target language, comprising:
-
at least one head transducer, the head transducer associated with a pair of head words with corresponding meanings in the source and target languages, the head tranducer for converting left and right ordered source language sequences of dependents of the source language head word into left and right ordered target language sequences of dependents of the target language head word;
a bilingual lexicon that associates the transducer with the pair of head words;
a table containing a plurality of parameters selected from the group consisting of costs and constraints, wherein one parameter of the plurality is assigned to each action of the at least one head transducer;
a transduction search engine that generates a plurality of candidate translations of the source language phrase using the at least one head transducer and assigns a value to the translation wherein the value is a function of the parameters involved for all actions taken by the at least one transducer involved in the translation; and
a target language string selector that selects one translation from the plurality of candidate translations. - View Dependent Claims (8, 9, 10)
-
-
11. A method for translating a phrase in a source language to a phrase in a target language, comprising the steps of:
-
(a) activating state machines associated with each word in the source language phrase, wherein the activated state machines are selected from a plurality of such state machines associated with a plurality of words defining a lexicon, each of the activated state machines for converting a pair of source language strings into a pair of target language strings, and further wherein each state machine is characterized by an initial state;
(b) generating a first plurality of transduction records, wherein, a transduction record is generated for each word in the source language phrase and each transduction record is characterized by a state machine, a source word, a target string, two position indicators for locating the position of the source word in the source phrase, the initial state for the state machine and a cost;
(c) generating a transduction lattice by forming a data structure comprised of the transduction records of step (b);
(d) generating a plurality of extended transduction records, wherein an extended transduction record is formed when a transduction record within the transduction lattice consumes an adjacent transduction record in the transduction lattice by a state machine transition, and wherein the extended transduction record includes an extended target string constructed by concatenating the target string of the consumed and the consuming records in an order indicated by directions for the state machine transition, and a new cost that is the sum of the costs of the consumed record and the consuming record, a cost associated with the state machine transition of the consuming record and a cost associated with a stop undertaken by the consumed state machine;
(e) adding the extended transduction record to the transduction lattice;
(f) repeating steps (d) and (e) wherein a transduction record consumes an adjacent transduction record until all records have been fully extended; and
(g) selecting the lowest cost transduction record spanning the entire source language phrase. - View Dependent Claims (12, 13)
-
-
14. A method for converting an input signal representative of words, characters or identifiers characterized by a first format, into an output signal representative of words, characters or identifiers characterized by a second format, comprising the steps of:
-
splitting the input signal into two input subsignals at a position identified by an element of the input signal;
generating two output subsignals representative of the two input subsignals according to actions of a finite state machine in which transitions specify a direction indicating a choice of one of the two output subsignals so as to produce desired ordering differences between the elements of the input and output signals; and
combining the two output subsignals.
-
Specification