Spoken language understanding that incorporates prior knowledge into boosting
First Claim
1. A method for generating an enlarged corpus of training entries for a particular application, given a set of k labels and an initial corpus of training m entries, where each of said entries includes at least a data portion, comprising the steps of:
- for each label l of said k labels, creating an associated rule that specifies one or more conditions that said data portion of an applied entry x must meet in order for said rule to reach a conclusion that said label l attaches to said entry x, and also specifies an confidence measure p(x,l), associated with said conclusion, which measure is a number between 0 and 1;
creating an augmented corpus of m training entries, where each entry i in said augmented corpus is created from data portion of entry i in said initial corpus of training entries, i=1,2, . . . m, with each of said k labels attached to said data portion of said entry i, or not attached to said data portion of said entry i, based on whether a preselected variable Z is either a +1 or a 0, respectively, and with a confidence measure associated with each of said labels being U(x,l)=[Zη
p(x,l)+(1−
Z)η
(1−
p(x,l))] when said data portion of said entry i meets said conditions of said rule for label l, η
being a preselected positive number, and being 1−
U(x,l) when said data portion of said entry i fails to meet said conditions of said rule for label l; and
combining said augmented corpus of m training entries with said initial corpus of training m entries to form said enlarged corpus having 2m training entries.
7 Assignments
0 Petitions
Accused Products
Abstract
A system for understanding entries, such as speech, develops a classifier by employing prior knowledge with which a given corpus of training entries is enlarged threefold. The prior knowledge is embodied in a rule, combined from separate rules created for each label outputted by the classifier, each of which includes a weight measure p(x). A first a set of created entries for increasing the corpus of training entries is created by attaching all labels to each entry of the original corpus of training entries, with a weight ηp(x), or η(1−p(x)), in association with each label that meets, or fails to meet, the condition specified for the label, η being a preselected positive number. The second set of is created by not attaching any of the labels to each of the original corpus of training entries, with a weight of η(1−p(x)), or ηp(x), in association with each label that meets, or fails to meet, the condition specified for the label.
34 Citations
25 Claims
-
1. A method for generating an enlarged corpus of training entries for a particular application, given a set of k labels and an initial corpus of training m entries, where each of said entries includes at least a data portion, comprising the steps of:
-
for each label l of said k labels, creating an associated rule that specifies one or more conditions that said data portion of an applied entry x must meet in order for said rule to reach a conclusion that said label l attaches to said entry x, and also specifies an confidence measure p(x,l), associated with said conclusion, which measure is a number between 0 and 1;
creating an augmented corpus of m training entries, where each entry i in said augmented corpus is created from data portion of entry i in said initial corpus of training entries, i=1,2, . . . m, with each of said k labels attached to said data portion of said entry i, or not attached to said data portion of said entry i, based on whether a preselected variable Z is either a +1 or a 0, respectively, and with a confidence measure associated with each of said labels being U(x,l)=[Zη
p(x,l)+(1−
Z)η
(1−
p(x,l))] when said data portion of said entry i meets said conditions of said rule for label l, η
being a preselected positive number, and being 1−
U(x,l) when said data portion of said entry i fails to meet said conditions of said rule for label l; and
combining said augmented corpus of m training entries with said initial corpus of training m entries to form said enlarged corpus having 2m training entries. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method of understanding presented data comprising the steps of:
-
normalizing said data to reduce variations in said presented data, to develop normalized data;
assigning portions of said normalized data to be instances of objects from a set of preselected objects when said portions of said normalized data meet predetermined conditions, thereby forming entity-extracted data; and
classifying said entity-extracted data by determining whether any of a predetermined set of labels should be attached to said entity-extracted data. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A method for understanding applied data, relative to a particular application, by classifying said applied data with a classifier developed from an enhanced corpus of training entries, the improvement comprising:
developing said enhanced corpus of training entries by creating from said provided corpus of training entries a set of auxiliary training entries that are developed with aid of a rule that is based on prior knowledge of said particular application, said set of auxiliary training entries being combined with said provided corpus of training entries to form said enhanced corpus of training entries. - View Dependent Claims (23, 24, 25)
Specification