Sequence classification for machine translation
First Claim
1. A method of classifying source symbol sequences into target symbol sequences, the method comprisingclassifying each symbol of the source sequence independently of the other symbols of the source sequence, the classifying being based on symbol models, each of at least ones of the symbol models being a function of training sequence context information, andclassifying the target symbol sequence based on the independently classified source sequence symbols.
4 Assignments
0 Petitions
Accused Products
Abstract
Classification of sequences, such as the translation of natural language sentences, is carried out using an independence assumption. The independence assumption is an assumption that the probability of a correct translation of a source sentence word into a particular target sentence word is independent of the translation of other words in the sentence. Although this assumption is not a correct one, a high level of word translation accuracy is nonetheless achieved. In particular, discriminative training is used to develop models for each target vocabulary word based on a set of features of the corresponding source word in training sentences, with at least one of those features relating to the context of the source word. Each model comprises a weight vector for the corresponding target vocabulary word. The weights comprising the vectors are associated with respective ones of the features; each weight is a measure of the extent to which the presence of that feature for the source word makes it more probable that the target word in question is the correct one.
-
Citations
9 Claims
-
1. A method of classifying source symbol sequences into target symbol sequences, the method comprising
classifying each symbol of the source sequence independently of the other symbols of the source sequence, the classifying being based on symbol models, each of at least ones of the symbol models being a function of training sequence context information, and classifying the target symbol sequence based on the independently classified source sequence symbols.
-
5. A method of translating words in a source natural language sentence into corresponding words in a target natural language sentence, the method comprising
for a particular source sentence word, determining a probability for each one of a plurality of target vocabulary words, the probability being the probability that said each one target vocabulary word is the correct translation of said particular source sentence word, said probability being a function of a set of feature values and being a further function of a set of weights associated with said each one target vocabulary word, the feature values indicating which of a plurality of feature definitions are met by said particular source sentence word, at least one of the features being contextual information about said particular source sentence word, the weights each being associated with a respective one of the features, and selecting a particular one of the target vocabulary words as being the correct translation of the source sentence as a function of the probabilities thus determined.
Specification