Machine translation using language order templates
First Claim
1. A method of generating a language translation rule set comprising at least one language order template using at least one source training corpus in a source language aligned with a parallel target training corpus in a target language on a computer having a processor, the method comprising:
- executing on the processor instructions configured to;
for respective parallel training corpora;
parse the source corpus to identify element types for respective source training corpus elements,generate a parse tree mapping source training corpus elements to parallel target training corpus elements,generate at least one candidate treelet translation pair based on the parse tree, andgenerate at least one candidate order template based on the parse tree and the element types;
add at least one treelet translation pair from the candidate treelet translation pairs to the language translation rule set; and
add at least one order template from the candidate order templates to the language translation rule set.
2 Assignments
0 Petitions
Accused Products
Abstract
Many machine translation scenarios involve the generation of a language translation rule set based on parallel training corpuses (e.g., sentences in a first language and word-for-word translations into a second language.) However, the translation of a source corpus in a source language to a target corpus in a target language involves at least two aspects: selecting elements of the target language to match the elements of the source corpus, and ordering the target elements according to the semantic organization of the source corpus and the grammatic rules of the target language. The breadth of generalization of the translation rules derived from the training may be improved, while retaining contextual information, by formulating language order templates that specify orderings of small sets of target elements according to target element types. These language order templates may be represented with varying degrees of association with the alignment rules derived from the training in order to improve the scope of target elements to which the ordering rules and alignment rules may be applied.
-
Citations
20 Claims
-
1. A method of generating a language translation rule set comprising at least one language order template using at least one source training corpus in a source language aligned with a parallel target training corpus in a target language on a computer having a processor, the method comprising:
executing on the processor instructions configured to; for respective parallel training corpora; parse the source corpus to identify element types for respective source training corpus elements, generate a parse tree mapping source training corpus elements to parallel target training corpus elements, generate at least one candidate treelet translation pair based on the parse tree, and generate at least one candidate order template based on the parse tree and the element types; add at least one treelet translation pair from the candidate treelet translation pairs to the language translation rule set; and add at least one order template from the candidate order templates to the language translation rule set. - View Dependent Claims (2, 3, 4, 5, 6)
-
7. A method of translating a source corpus in a source language into a target language using a language translation rule set comprising at least one treelet translation pair and at least one language order template on a computer having a processor, the method comprising:
executing on the processor instructions configured to; parse the source corpus to identify element types for respective source corpus elements; generate a parse tree mapping the source corpus elements; select at least one treelet translation pair mapping at least one source corpus element to at least one target corpus element; select language order templates corresponding to unordered source corpus elements; and generate a target corpus according to the parse tree, selected treelet translation pairs, and selected language order templates. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
20. A nonvolatile memory comprising processor-executable instructions that, when executed by a processor of a device, cause the device to translate a source corpus in a source language into a target language using a language translation rule set comprising at least one treelet translation pair and at least one language order template by:
-
parsing the source corpus to identify element types for respective source corpus elements; generating a parse tree mapping the source corpus elements; selecting at least one treelet translation pair mapping at least one source corpus element to at least one target corpus element; selecting language order templates corresponding to unordered source corpus elements; and generating a target corpus according to the parse tree, selected treelet translation pairs, and selected language order templates.
-
Specification