×

Spoken language understanding that incorporates prior knowledge into boosting

  • US 7,152,029 B2
  • Filed: 05/31/2002
  • Issued: 12/19/2006
  • Est. Priority Date: 07/18/2001
  • Status: Active Grant
First Claim
Patent Images

1. A method for generating an enlarged corpus of training entries for a particular application, given a set of k labels and an initial corpus of training m entries, where each of said entries includes at least a data portion, comprising the steps of:

  • for each label l of said k labels, creating an associated rule that specifies one or more conditions that said data portion of an applied entry x must meet in order for said rule to reach a conclusion that said label l forms an attachment to said entry x, and with a weight η

    p(x,l), where η

    is a positive number representing a measure of confidence in said rule, and p(x,l) is a probability measure, between 0 and 1, inclusively, that the rule assigns to the said conclusion;

    creating an augmented corpus of m training entries, where each entry i in said augmented corpus is created from data portion of entry i in said initial corpus of training entries, i=1,2, . . . m, with each label l of said k labelsforming an attachment to said entry i weight η

    p(xi,l) when conditions of said rule for label l are met, and a weight 1−

    η

    p(xi,l) where said conditions of said rule for label l are not met;

    orforming a non-attachment to said entry i weight 1−

    η

    p(xi,l) when conditions of said rule for label l are met, and a weight η

    p(xi,l) where said conditions of said rule for label l are not met; and

    combining said augmented corpus of m training entries with said initial corpus of training m entries to form said enlarged corpus having 2m training entries.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×