Spoken language understanding that incorporates prior knowledge into boosting

US 7,152,029 B2
Filed: 05/31/2002
Issued: 12/19/2006
Est. Priority Date: 07/18/2001
Status: Active Grant

First Claim

Patent Images

1. A method for generating an enlarged corpus of training entries for a particular application, given a set of k labels and an initial corpus of training m entries, where each of said entries includes at least a data portion, comprising the steps of:

for each label l of said k labels, creating an associated rule that specifies one or more conditions that said data portion of an applied entry x must meet in order for said rule to reach a conclusion that said label l forms an attachment to said entry x, and with a weight η

p(x,l), where η

is a positive number representing a measure of confidence in said rule, and p(x,l) is a probability measure, between 0 and 1, inclusively, that the rule assigns to the said conclusion;

creating an augmented corpus of m training entries, where each entry i in said augmented corpus is created from data portion of entry i in said initial corpus of training entries, i=1,2, . . . m, with each label l of said k labelsforming an attachment to said entry i weight η

p(x_i,l) when conditions of said rule for label l are met, and a weight 1−

η

p(x_i,l) where said conditions of said rule for label l are not met;

orforming a non-attachment to said entry i weight 1−

η

p(x_i,l) when conditions of said rule for label l are met, and a weight η

p(x_i,l) where said conditions of said rule for label l are not met; and

combining said augmented corpus of m training entries with said initial corpus of training m entries to form said enlarged corpus having 2m training entries.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for understanding entries, such as speech, develops a classifier by employing prior knowledge with which a given corpus of training entries is enlarged threefold. A rule is created for each of the labels employed in the classifyier, and the created rules are applied to the given corpus to create a corpus of attachments by appending a weight of ηp(x), or 1−ηp(x), to labels of entries that meet, or fail to meet, respectively, conditions of the labels'"'"' rules, and to also create a corpus of non-attachments by appending a weight of 1−ηp(x), or ηp(x), to labels of entries that meet, or fail to meet conditions of the labels'"'"' rules.

26 Citations

View as Search Results

8 Claims

1. A method for generating an enlarged corpus of training entries for a particular application, given a set of k labels and an initial corpus of training m entries, where each of said entries includes at least a data portion, comprising the steps of:
- for each label l of said k labels, creating an associated rule that specifies one or more conditions that said data portion of an applied entry x must meet in order for said rule to reach a conclusion that said label l forms an attachment to said entry x, and with a weight η
  
  p(x,l), where η
  
  is a positive number representing a measure of confidence in said rule, and p(x,l) is a probability measure, between 0 and 1, inclusively, that the rule assigns to the said conclusion;
  
  creating an augmented corpus of m training entries, where each entry i in said augmented corpus is created from data portion of entry i in said initial corpus of training entries, i=1,2, . . . m, with each label l of said k labelsforming an attachment to said entry i weight η
  
  p(x_i,l) when conditions of said rule for label l are met, and a weight 1−
  
  η
  
  p(x_i,l) where said conditions of said rule for label l are not met;
  
  orforming a non-attachment to said entry i weight 1−
  
  η
  
  p(x_i,l) when conditions of said rule for label l are met, and a weight η
  
  p(x_i,l) where said conditions of said rule for label l are not met; and
  
  combining said augmented corpus of m training entries with said initial corpus of training m entries to form said enlarged corpus having 2m training entries.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 where said rule is created based on prior knowledge of said particular application.
  - 3. The method of claim 1 where said one or more conditions constitute a logical association of data elements that are expected in entries to a classifier constructed through interaction with said enlarged corpus of training entries.
  - 4. The method of claim 3 where said particular application involves recognizing spoken speech, and said data elements are words.
  - 5. The method of claim 1 where the conjunctive or is inclusive, thus forming two corpa that augment said initial corpus via said step of combining.
  - 6. The method of claim 1 where each entry x in said initial training corpus includes, in addition to said data portion, an indication of which of said k labels are attached to said entry x.
  - 7. The method of claim 1 further comprising a step of attaching any number of said k labels to each of said m entries of said initial training corpus.
  - 8. The method of claim 1 further comprising the step of creating a classifier from said enlarged corpus of training sequences.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Alshawi, Hiyan, DiFabbrizio, Giuseppe, Gupta, Narendra K., Rahim, Mazin G., Schapire, Robert E., Singer, Yoram
Primary Examiner(s)
Smits, Talivaldis Ivars
Assistant Examiner(s)
Ng, Eunice

Application Number

US10/160,461
Publication Number

US 20040204940A1
Time in Patent Office

1,663 Days
Field of Search

None
US Class Current

704/1
CPC Class Codes

G10L 15/063 Training

Spoken language understanding that incorporates prior knowledge into boosting

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

26 Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Spoken language understanding that incorporates prior knowledge into boosting

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

26 Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links