×

Spoken language understanding that incorporates prior knowledge into boosting

  • US 20040204940A1
  • Filed: 05/31/2002
  • Published: 10/14/2004
  • Est. Priority Date: 07/18/2001
  • Status: Active Grant
First Claim
Patent Images

1. A method for generating an enlarged corpus of training entries for a particular application, given a set of k labels and an initial corpus of training m entries, where each of said entries includes at least a data portion, comprising the steps of:

  • for each label l of said k labels, creating an associated rule that specifies one or more conditions that said data portion of an applied entry x must meet in order for said rule to reach a conclusion that said label l attaches to said entry x, and also specifies an confidence measure p(x,l), associated with said conclusion, which measure is a number between 0 and 1;

    creating an augmented corpus of m training entries, where each entry i in said augmented corpus is created from data portion of entry i in said initial corpus of training entries, i=1,2, . . . m, with each of said k labels attached to said data portion of said entry i, or not attached to said data portion of said entry i, based on whether a preselected variable Z is either a +1 or a 0, respectively, and with a confidence measure associated with each of said labels being U(x,l)=[Zη

    p(x,l)+(1−

    Z)η

    (1−

    p(x,l))] when said data portion of said entry i meets said conditions of said rule for label l, η

    being a preselected positive number, and being 1−

    U(x,l) when said data portion of said entry i fails to meet said conditions of said rule for label l; and

    combining said augmented corpus of m training entries with said initial corpus of training m entries to form said enlarged corpus having 2m training entries.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×