Spoken language understanding that incorporates prior knowledge into boosting
First Claim
1. A method for generating an enlarged corpus of training entries for a particular application, given a set of k labels and an initial corpus of training m entries, where each of said entries includes at least a data portion, comprising the steps of:
- for each label l of said k labels, creating an associated rule that specifies one or more conditions that said data portion of an applied entry x must meet in order for said rule to reach a conclusion that said label l forms an attachment to said entry x, and with a weight η
p(x,l), where η
is a positive number representing a measure of confidence in said rule, and p(x,l) is a probability measure, between 0 and 1, inclusively, that the rule assigns to the said conclusion;
creating an augmented corpus of m training entries, where each entry i in said augmented corpus is created from data portion of entry i in said initial corpus of training entries, i=1,2, . . . m, with each label l of said k labelsforming an attachment to said entry i weight η
p(xi,l) when conditions of said rule for label l are met, and a weight 1−
η
p(xi,l) where said conditions of said rule for label l are not met;
orforming a non-attachment to said entry i weight 1−
η
p(xi,l) when conditions of said rule for label l are met, and a weight η
p(xi,l) where said conditions of said rule for label l are not met; and
combining said augmented corpus of m training entries with said initial corpus of training m entries to form said enlarged corpus having 2m training entries.
7 Assignments
0 Petitions
Accused Products
Abstract
A system for understanding entries, such as speech, develops a classifier by employing prior knowledge with which a given corpus of training entries is enlarged threefold. A rule is created for each of the labels employed in the classifyier, and the created rules are applied to the given corpus to create a corpus of attachments by appending a weight of ηp(x), or 1−ηp(x), to labels of entries that meet, or fail to meet, respectively, conditions of the labels'"'"' rules, and to also create a corpus of non-attachments by appending a weight of 1−ηp(x), or ηp(x), to labels of entries that meet, or fail to meet conditions of the labels'"'"' rules.
26 Citations
8 Claims
-
1. A method for generating an enlarged corpus of training entries for a particular application, given a set of k labels and an initial corpus of training m entries, where each of said entries includes at least a data portion, comprising the steps of:
-
for each label l of said k labels, creating an associated rule that specifies one or more conditions that said data portion of an applied entry x must meet in order for said rule to reach a conclusion that said label l forms an attachment to said entry x, and with a weight η
p(x,l), where η
is a positive number representing a measure of confidence in said rule, and p(x,l) is a probability measure, between 0 and 1, inclusively, that the rule assigns to the said conclusion;creating an augmented corpus of m training entries, where each entry i in said augmented corpus is created from data portion of entry i in said initial corpus of training entries, i=1,2, . . . m, with each label l of said k labels forming an attachment to said entry i weight η
p(xi,l) when conditions of said rule for label l are met, and a weight 1−
η
p(xi,l) where said conditions of said rule for label l are not met;
orforming a non-attachment to said entry i weight 1−
η
p(xi,l) when conditions of said rule for label l are met, and a weight η
p(xi,l) where said conditions of said rule for label l are not met; andcombining said augmented corpus of m training entries with said initial corpus of training m entries to form said enlarged corpus having 2m training entries. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
Specification