EXTRACTING TREELET TRANSLATION PAIRS
First Claim
Patent Images
1. A method of identifying treelet translation pairs for use in a machine translation system that translates a source language input into a target language output, the method comprising:
- accessing a corpus of pairs of aligned, parallel syntactic dependency structures, each pair including a source language dependency structure having nodes that represent lexical items, the nodes being aligned with nodes representing lexical items in a target language dependency structure;
enumerating individual source nodes and combinations of source nodes connected in the source language dependency structure as possible source treelets identifying lexical items, and corresponding dependencies, in the target language dependency structure, that are aligned with the enumerated nodes and combinations of connected nodes, as possible target treelets corresponding to the possible source treelets;
extracting well formed treelet translation pairs from the possible source treelets and possible target treelets; and
storing the treelet translation pairs in a data store.
1 Assignment
0 Petitions
Accused Products
Abstract
In one embodiment of the present invention, a decoder receives a dependency tree as a source language input and accesses a set of statistical models that produce outputs combined in a log linear framework. The decoder also accesses a table of treelet translation pairs and returns a target dependency tree based on the source dependency tree, based on access to the table of treelet translation pairs, and based on the application of the statistical models.
52 Citations
23 Claims
-
1. A method of identifying treelet translation pairs for use in a machine translation system that translates a source language input into a target language output, the method comprising:
-
accessing a corpus of pairs of aligned, parallel syntactic dependency structures, each pair including a source language dependency structure having nodes that represent lexical items, the nodes being aligned with nodes representing lexical items in a target language dependency structure; enumerating individual source nodes and combinations of source nodes connected in the source language dependency structure as possible source treelets identifying lexical items, and corresponding dependencies, in the target language dependency structure, that are aligned with the enumerated nodes and combinations of connected nodes, as possible target treelets corresponding to the possible source treelets; extracting well formed treelet translation pairs from the possible source treelets and possible target treelets; and storing the treelet translation pairs in a data store. - View Dependent Claims (2, 3, 4)
-
-
5. (canceled)
-
6. A system for identifying treelet translation pairs, from training data, for use in a machine translation system that translates a source language input into a target language output, the system comprising:
-
a treelet pair extractor configured to access a corpus of pairs of aligned, parallel syntactic dependency structures, each pair including a source language dependency structure having nodes that represent lexical items, the nodes being aligned with nodes representing lexical items in a target language dependency structure; and the treelet pair extractor being further configured to enumerate sets of source nodes that are connected portions of the source language dependency structure as possible source treelets. - View Dependent Claims (7, 9, 10, 11, 12)
-
-
8. (canceled)
-
13. (canceled)
-
14. A computer readable medium storing computer readable instructions which, when executed by a computer cause the computer to perform a method of identifying treelet translation pairs for use in a machine translation system that translates a source language input into a target language output, the method comprising:
-
accessing a corpus of pairs of aligned, parallel syntactic dependency structures, each pair including a source language dependency structure having nodes that represent lexical items, the nodes being aligned with nodes representing lexical items in a target language dependency structure; enumerating connected sets of source nodes in the source language dependency structure as possible source treelets and extracting well formed treelet translation pairs from the possible source treelets and aligned portions of a corresponding target language dependency structure. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
-
21. A data structure for use in a machine translation system, the data structure comprising:
a plurality of treelet translation pairs each pair having a source language portion, comprising a connected portion of a source language syntactic dependency structure based on a source text fragment, and a target language portion, having lexical items aligned with lexical items in the source language portion and syntactic dependencies, the source portion including a plurality of child nodes from the source language syntactic dependency structure that depend from a common parent node. - View Dependent Claims (22, 23)
Specification