Discriminative training of models for sequence classification

US 20080162117A1
Filed: 12/28/2006
Published: 07/03/2008
Est. Priority Date: 12/28/2006
Status: Abandoned Application

First Claim

Patent Images

1. A method comprising performing discriminative training to develop models of target language vocabulary words, said training being based on training sentences in a source language, corresponding sentences in the target language, and alignment information indicating which words in each source language training sentence correspond to which words in the corresponding target language sentence, the method comprisinggenerating a set of feature values associated with words in the source language sentences and corresponding words in the target language sentences, the feature values indicating whether the associated source word meets respective feature definitions, at least one of the feature definitions being a contextual property of the associated source word, anddeveloping said models based on said feature values.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Classification of sequences, such as the translation of natural language sentences, is carried out using an independence assumption. The independence assumption is an assumption that the probability of a correct translation of a source sentence word into a particular target sentence word is independent of the translation of other words in the sentence. Although this assumption is not a correct one, a high level of word translation accuracy is nonetheless achieved. In particular, discriminative training is used to develop models for each target vocabulary word based on a set of features of the corresponding source word in training sentences, with at least one of those features relating to the context of the source word. Each model comprises a weight vector for the corresponding target vocabulary word. The weights comprising the vectors are associated with respective ones of the features; each weight is a measure of the extent to which the presence of that feature for the source word makes it more probable that the target word in question is the correct one.

40 Citations

View as Search Results

6 Claims

1. A method comprising performing discriminative training to develop models of target language vocabulary words, said training being based on training sentences in a source language, corresponding sentences in the target language, and alignment information indicating which words in each source language training sentence correspond to which words in the corresponding target language sentence, the method comprisinggenerating a set of feature values associated with words in the source language sentences and corresponding words in the target language sentences, the feature values indicating whether the associated source word meets respective feature definitions, at least one of the feature definitions being a contextual property of the associated source word, anddeveloping said models based on said feature values.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 wherein said training is further based on alignment information indicating which words in each source language training sentence correspond to which words in the corresponding target language sentence.
  - 3. The method of claim 1 wherein the model of each target vocabulary word is a set of weights each associated with a respective one of the feature definitions, each weight being a measure of the probability that a word in a source language sentence translates to that target vocabulary word when the source language sentence word has the feature in question.
  - 4. The method of claim 3 wherein said training is further based on alignment information indicating which words in each source language training sentence correspond to which words in the corresponding target language sentence.
  - 5. A model developed using the method of claim 1.
  - 6. A model developed using the method of claim 4.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Haffner, Patrick, Bangalore, Srinivas, Kanthak, Stephan

Application Number

US11/646,983
Publication Number

US 20080162117A1
Time in Patent Office

Days
Field of Search
US Class Current

704/10
CPC Class Codes

G06F 40/44 Statistical methods, e.g. p...

Discriminative training of models for sequence classification

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

40 Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Discriminative training of models for sequence classification

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

40 Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links