EXTRACTING TREELET TRANSLATION PAIRS

US 20090271177A1
Filed: 07/08/2009
Published: 10/29/2009
Est. Priority Date: 11/04/2004
Status: Active Grant

First Claim

Patent Images

1. A method of identifying treelet translation pairs for use in a machine translation system that translates a source language input into a target language output, the method comprising:

accessing a corpus of pairs of aligned, parallel syntactic dependency structures, each pair including a source language dependency structure having nodes that represent lexical items, the nodes being aligned with nodes representing lexical items in a target language dependency structure;

enumerating individual source nodes and combinations of source nodes connected in the source language dependency structure as possible source treelets identifying lexical items, and corresponding dependencies, in the target language dependency structure, that are aligned with the enumerated nodes and combinations of connected nodes, as possible target treelets corresponding to the possible source treelets;

extracting well formed treelet translation pairs from the possible source treelets and possible target treelets; and

storing the treelet translation pairs in a data store.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In one embodiment of the present invention, a decoder receives a dependency tree as a source language input and accesses a set of statistical models that produce outputs combined in a log linear framework. The decoder also accesses a table of treelet translation pairs and returns a target dependency tree based on the source dependency tree, based on access to the table of treelet translation pairs, and based on the application of the statistical models.

52 Citations

View as Search Results

23 Claims

1. A method of identifying treelet translation pairs for use in a machine translation system that translates a source language input into a target language output, the method comprising:
- accessing a corpus of pairs of aligned, parallel syntactic dependency structures, each pair including a source language dependency structure having nodes that represent lexical items, the nodes being aligned with nodes representing lexical items in a target language dependency structure;
  
  enumerating individual source nodes and combinations of source nodes connected in the source language dependency structure as possible source treelets identifying lexical items, and corresponding dependencies, in the target language dependency structure, that are aligned with the enumerated nodes and combinations of connected nodes, as possible target treelets corresponding to the possible source treelets;
  
  extracting well formed treelet translation pairs from the possible source treelets and possible target treelets; and
  
  storing the treelet translation pairs in a data store.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein each child node of a parent node is considered to be connected with other child nodes of the parent node.
  - 3. The method of claim 1 wherein the source language dependency structures each represent a source language text fragment, and wherein enumerating comprises:
    - enumerating connected portions of the source language dependency structure regardless of whether the connected portions represent discontiguous sets of words in the source language text fragment.
  - 4. The method of claim 1 wherein the source language dependency structure comprises a source language dependency tree, and wherein enumerating comprises:
    - enumerating connected sets of nodes that represent a non-linear branch in the source language dependency tree.

5. (canceled)

6. A system for identifying treelet translation pairs, from training data, for use in a machine translation system that translates a source language input into a target language output, the system comprising:
- a treelet pair extractor configured to access a corpus of pairs of aligned, parallel syntactic dependency structures, each pair including a source language dependency structure having nodes that represent lexical items, the nodes being aligned with nodes representing lexical items in a target language dependency structure; and
  
  the treelet pair extractor being further configured to enumerate sets of source nodes that are connected portions of the source language dependency structure as possible source treelets.
- View Dependent Claims (7, 9, 10, 11, 12)
- - 7. The system of claim 6 wherein each child node of a parent node is considered to be connected with other child nodes of the parent node, and extract well formed treelet translation pairs from the possible source treelets and corresponding aligned portions of the target language dependency structure.
  - 9. The system of claim 6 and further comprising:
    - a data store storing the extracted treelet translation pairs.
  - 10. The system of claim 6 wherein the source language dependency structures each represent a source language text fragment.
  - 11. The system of claim 10 wherein the treelet translation pair extractor is configured to enumerate connected sets of source nodes in the source language dependency structure regardless of whether they represent discontiguous words in the source language text fragment.
  - 12. The system of claim 6 wherein the source language dependency structure comprises a source language dependency tree, and wherein the treelet pair extractor is configured to enumerate connected sets of nodes that represent a non-linear branch in the source language dependency tree.

8. (canceled)

13. (canceled)

14. A computer readable medium storing computer readable instructions which, when executed by a computer cause the computer to perform a method of identifying treelet translation pairs for use in a machine translation system that translates a source language input into a target language output, the method comprising:
- accessing a corpus of pairs of aligned, parallel syntactic dependency structures, each pair including a source language dependency structure having nodes that represent lexical items, the nodes being aligned with nodes representing lexical items in a target language dependency structure;
  
  enumerating connected sets of source nodes in the source language dependency structure as possible source treelets andextracting well formed treelet translation pairs from the possible source treelets and aligned portions of a corresponding target language dependency structure.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The computer readable medium of claim 14 wherein each child node of a parent node is considered to be connected to other child nodes of the parent node.
  - 16. The computer readable medium of claim 14 wherein extracting comprises:
    - identifying lexical items, and corresponding dependencies, in the target language dependency structure, that are aligned with the enumerated connected sets of nodes, as possible target treelets corresponding to the possible source treelets;
      
      extracting the well formed treelet translation pairs based on the possible source treelets and the possible target treelets.
  - 17. The computer readable medium of claim 14 wherein the method further comprises:
    - storing the treelet translation pairs in a data store.
  - 18. The computer readable medium of claim 16 wherein the source language dependency structures each represent a source language text fragment, and wherein enumerating comprises:
    - enumerating connected sets of source nodes in the source language dependency structure that may represent discontiguous words in the source language text fragment.
  - 19. The computer readable medium of claim 14 wherein the source language dependency structure comprises a source language dependency tree, and wherein enumerating comprises:
    - enumerating connected sets of nodes that represent a non-linear branch in the source language dependency tree.
  - 20. The computer readable medium of claim 16 wherein extracting well formed treelet translation pairs comprises:
    - extracting as a well formed treelet translation pair the possible source treelet and corresponding possible target treelet only if the lexical items in the possible source treelet are only aligned with lexical items in the possible target treelet and the lexical items in the possible target treelet are only aligned with lexical items in the possible source treelet.

21. A data structure for use in a machine translation system, the data structure comprising:
- a plurality of treelet translation pairs each pair having a source language portion, comprising a connected portion of a source language syntactic dependency structure based on a source text fragment, and a target language portion, having lexical items aligned with lexical items in the source language portion and syntactic dependencies, the source portion including a plurality of child nodes from the source language syntactic dependency structure that depend from a common parent node.
- View Dependent Claims (22, 23)
- - 22. The data structure of claim 21 wherein the connected portion of the source language syntactic dependency structure represents a non-linear branch in the source language syntactic dependency structure.
  - 23. The data structure of claim 21 wherein the connected portion of the source language syntactic dependency structure comprises connected source nodes in the source language syntactic dependency structure that may represent discontiguous words in the source text fragment.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Cherry, Colin A., Menezes, Arul A., Quirk, Christopher B.

Granted Patent

US 8,082,143 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/2
CPC Class Codes

G06F 40/40 Processing or translation o...

G06F 40/44 Statistical methods, e.g. p...

EXTRACTING TREELET TRANSLATION PAIRS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

52 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

EXTRACTING TREELET TRANSLATION PAIRS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

52 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links