Statistical machine translation processing

US 20090106015A1
Filed: 10/23/2007
Published: 04/23/2009
Est. Priority Date: 10/23/2007
Status: Active Grant

First Claim

Patent Images

1. Instructions on a computer-usable medium wherein the instructions when executed cause a computer system to perform a method of statistical machine translation (SMT), said method comprising:

receiving a word string in a first natural language;

parsing said word string into a parse tree comprising a plurality of child nodes;

reordering said plurality of child nodes to provide a plurality of reordered word strings;

evaluating each of said plurality of reordered word strings using a reordering knowledge, wherein said reordering knowledge is based on a syntax of said first natural language; and

translating a plurality of preferred reordered word strings from said plurality of reordered word strings to a second natural language based on said evaluating; and

selecting a statistically preferred translation of said word string from among translations of said plurality of preferred reordered word strings.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of statistical machine translation (SMT) is provided. The method comprises generating reordering knowledge based on the syntax of a source language (SL) and a number of alignment matrices that map sample SL sentences with sample target language (TL) sentences. The method further comprises receiving a SL word string and parsing the SL word string into a parse tree that represents the syntactic properties of the SL word string. The nodes on the parse tree are reordered based on the generated reordering knowledge in order to provide reordered word strings. The method further comprises translating a number of reordered word strings to create a number of TL word strings, and identifying a statistically preferred TL word string as a preferred translation of the SL word string.

Citations

20 Claims

1. Instructions on a computer-usable medium wherein the instructions when executed cause a computer system to perform a method of statistical machine translation (SMT), said method comprising:
- receiving a word string in a first natural language;
  
  parsing said word string into a parse tree comprising a plurality of child nodes;
  
  reordering said plurality of child nodes to provide a plurality of reordered word strings;
  
  evaluating each of said plurality of reordered word strings using a reordering knowledge, wherein said reordering knowledge is based on a syntax of said first natural language; and
  
  translating a plurality of preferred reordered word strings from said plurality of reordered word strings to a second natural language based on said evaluating; and
  
  selecting a statistically preferred translation of said word string from among translations of said plurality of preferred reordered word strings.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising:
    - accessing training data comprising sentences in said first natural language paired with sentences in said second natural language;
      
      utilizing an alignment model to match words and phrases in said first natural language to words and phrases in said second natural language;
      
      generating training samples based on results of said alignment model, said training samples identifying syntactic differences between said first natural language and said second natural language; and
      
      creating said reordering knowledge based on said training samples.
  - 3. The method of claim 2, further comprising:
    - generating a probabilistic model to represent said reordering knowledge; and
      
      utilizing said probabilistic model to estimate a probabilistic distribution of said training samples.
  - 4. The method of claim 1, further comprising:
    - identifying a group of child nodes from said plurality of child nodes associated with a same parent node;
      
      calculating an inversion probability corresponding to said group of child nodes using said reordering knowledge; and
      
      inverting child nodes from said group of child nodes based on said inversion probability.
  - 5. The method of claim 1, further comprising:
    - generating reordering probabilities by calculating probabilities of reordering words from said plurality of child nodes into reordered phrases; and
      
      identifying said plurality of preferred reordered word strings based on said reordering probabilities.
  - 6. The method of claim 5, further comprising:
    - ranking said plurality of reordered word strings based on said reordering probabilities; and
      
      identifying said plurality of preferred reordered word strings from among said ranked reordered word strings.
  - 7. The method of claim 5, further comprising:
    - scoring said plurality of preferred reordered word strings based on said reordering probabilities; and
      
      identifying said statistically preferred translation of said word string based on said scoring.

8. A statistical machine translation (SMT) system comprising:
- a parsing module configured to receive a word string in a first natural language and parse said word string into a parse tree comprising a plurality of child nodes;
  
  a preprocessing module coupled with said parsing module, said preprocessing module configured to access said plurality of child nodes and reorder words from said word string based on a syntax of said first natural language to provide a plurality of reordered word strings; and
  
  a decoding module coupled with said preprocessing module, said decoding module configured to access said plurality of reordered word strings, identify a statistically preferred reordered word string based on reordering probabilities associated with said plurality of reordered word strings, and generate a target word string based on a word sequence of said statistically preferred reordered word string.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8 wherein said preprocessing module is further configured to access training data comprising sentences in said first natural language paired with sentences in a second natural language, utilize said training data to match words and phrases in said first natural language to words and phrases in said second natural language, generate training samples identifying syntactic differences between said first natural language and said second natural language, and utilize said training samples to reorder said words from said word string.
  - 10. The system of claim 9 wherein said preprocessing module is further configured to identify a group of child nodes from said plurality of child nodes associated with a same parent node, calculate an inversion probability corresponding to said group of child nodes using said training samples, and invert child nodes from said group of child nodes based on said inversion probability.
  - 11. The system of claim 9, wherein said preprocessing module is further configured to calculate probabilities of reordering words from said plurality of child nodes into reordered phrases based on said training samples, generate said reordering probabilities based on said calculated probabilities, and forward said reordering probabilities to said decoding module.
  - 12. The system of claim 8, wherein said preprocessing module is further configured to rank said plurality of reordered word strings based on said reordering probabilities.
  - 13. The system of claim 8, wherein said preprocessing module is further configured to identify reordered word strings from said plurality of reordered word strings associated with reordering probabilities above a threshold.
  - 14. The system of claim 8, wherein said decoding module is further configured to translate said plurality of reordered word strings into corresponding target word strings and select said target word string from among said corresponding target word strings based on said reordering probabilities and other translation factors.

15. A language reordering system for use in statistical machine translation (SMT), said language reordering system comprising:
- a training database for storing training data comprising sentences in a first natural language paired with sentences in a second natural language;
  
  an alignment model configured to match words and phrases in said first natural language to words and phrases in said second natural language, said alignment model utilizing said training data to generate training samples identifying syntactic differences between said first natural language and said second natural language; and
  
  a preprocessing module coupled with said training database and said alignment model, said preprocessing module configured to create a body of reordering knowledge based on said training samples.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The language reordering system of claim 15 wherein said preprocessing module is further configured to receive a word string in said first natural language and utilize said reordering knowledge to reorder words from said word string into reordered word strings.
  - 17. The language reordering system of claim 16 wherein said preprocessing module is further configured to identify a group of child nodes corresponding to a parent node of said word string, calculate an inversion probability corresponding to said group of child nodes, and invert child nodes from said group of child nodes based on said inversion probability.
  - 18. The language reordering system of claim 16 wherein said preprocessing module is further configured to generate reordering probabilities by calculating probabilities of reordering words from said word string into said reordered word strings.
  - 19. The language reordering system of claim 18 wherein said preprocessing module is further configured to identify at least one statistically preferred reordered word string from said reordered word strings based on said reordering probabilities.
  - 20. The language reordering system of claim 18 wherein said preprocessing module is further configured to rank said reordered word strings based on said reordering probabilities and identify a statistically preferred group of reordered word strings from among said ranked reordered word strings.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Zhou, Ming, Zhang, Dongdong, Li, Chi-Ho, Li, Mu

Granted Patent

US 8,046,211 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/2
CPC Class Codes

G06F 40/44 Statistical methods, e.g. p...

Statistical machine translation processing

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Statistical machine translation processing

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links