Sequence classification for machine translation

US 20080162111A1
Filed: 12/28/2006
Published: 07/03/2008
Est. Priority Date: 12/28/2006
Status: Active Grant

First Claim

Patent Images

1. A method of classifying source symbol sequences into target symbol sequences, the method comprisingclassifying each symbol of the source sequence independently of the other symbols of the source sequence, the classifying being based on symbol models, each of at least ones of the symbol models being a function of training sequence context information, andclassifying the target symbol sequence based on the independently classified source sequence symbols.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Classification of sequences, such as the translation of natural language sentences, is carried out using an independence assumption. The independence assumption is an assumption that the probability of a correct translation of a source sentence word into a particular target sentence word is independent of the translation of other words in the sentence. Although this assumption is not a correct one, a high level of word translation accuracy is nonetheless achieved. In particular, discriminative training is used to develop models for each target vocabulary word based on a set of features of the corresponding source word in training sentences, with at least one of those features relating to the context of the source word. Each model comprises a weight vector for the corresponding target vocabulary word. The weights comprising the vectors are associated with respective ones of the features; each weight is a measure of the extent to which the presence of that feature for the source word makes it more probable that the target word in question is the correct one.

Citations

9 Claims

1. A method of classifying source symbol sequences into target symbol sequences, the method comprisingclassifying each symbol of the source sequence independently of the other symbols of the source sequence, the classifying being based on symbol models, each of at least ones of the symbol models being a function of training sequence context information, andclassifying the target symbol sequence based on the independently classified source sequence symbols.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1 wherein each symbol model is associated with a respective symbol of a target vocabulary and was generated as a function of contextual information about symbols in a plurality of training sequences.
  - 3. The method of claim 2 wherein each symbol model is a respective set of weights each associated with respective ones of a plurality of feature definitions, at least one of the feature definitions defining a relationship between a given symbol in a given source sequence and one or more of the other symbols in the source sequence.
  - 4. The method of claim 3 wherein said classifying each symbol of the source sequence independently of the other symbols of the source sequence comprises generating for at least ones of the target vocabulary symbols a respective probability, said probability for a given target vocabulary symbol being a function of the associated symbol model'"'"'s weights and being a further function of which of the feature definitions are met by the source symbol being classified.

5. A method of translating words in a source natural language sentence into corresponding words in a target natural language sentence, the method comprisingfor a particular source sentence word, determining a probability for each one of a plurality of target vocabulary words, the probability being the probability that said each one target vocabulary word is the correct translation of said particular source sentence word, said probability being a function of a set of feature values and being a further function of a set of weights associated with said each one target vocabulary word, the feature values indicating which of a plurality of feature definitions are met by said particular source sentence word, at least one of the features being contextual information about said particular source sentence word, the weights each being associated with a respective one of the features, andselecting a particular one of the target vocabulary words as being the correct translation of the source sentence as a function of the probabilities thus determined.
- View Dependent Claims (6, 7, 8, 9)
- - 6. The invention of claim 5 wherein the selected target vocabulary word is the target vocabulary word having the highest of said probabilities.
  - 7. The invention of claim 5 wherein each of said weights is a measure of the probability that a word in the source sentence translates to said each one of said target vocabulary words when the source sentence word has the feature in question.
  - 8. The invention of claim 5 wherein said probability is a function of the sum of the weights associated with feature definitions that are met by said particular source sentence word.
  - 9. The invention of claim 5 wherein the weights associated with said each one of said target vocabulary words are the result of discriminative training based on a) training sentences in the source language, b) the corresponding sentences in the target language, and c) alignment information indicating which words in each source language training sentence correspond to which words in the corresponding target language sentence.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Haffner, Patrick, Bangalore, Srinivas, Kanthak, Stephan

Granted Patent

US 7,783,473 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/2
CPC Class Codes

G06F 40/44 Statistical methods, e.g. p...

Sequence classification for machine translation

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Sequence classification for machine translation

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links