×

Method and apparatus for finding the best splits in a decision tree for a language model for a speech recognizer

  • US 5,263,117 A
  • Filed: 10/26/1989
  • Issued: 11/16/1993
  • Est. Priority Date: 10/26/1989
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of automatic speech recognition comprising the steps of:

  • converting an utterance into an utterance signal representing the utterance, said utterance comprising a series of at least a predictor word and a predicted word, said utterance signal comprising at least one predictor word signal representing the predictor word;

    providing a set of M predictor feature signals, each predictor feature signal having a predictor feature value Xm, where M is an integer greater than or equal to three and m is an integer greater than zero and lens than or equal to M, each predictor feature signal in the set representing a different word;

    generating a decision set which contains a subset of the M predictor feature signals representing the words;

    comparing the predictor word signal with the predictor feature signals in the decision set;

    outputting a first category feature signal representing a first predicted word if the predictor word signal is a member of the decision set, said first category feature signal being one of N category feature signals, each category feature signal representing a different word and having a category feature value Yn, where N is an integer greater than or equal to three, and n is an integer greater than zero and less than or equal to N; and

    outputting a second category feature signal, different from the first category feature signal and representing a second predicted word different from the first predicted word if the predictor word signal is not a member of the decision set;

    characterized in that the contents of the decision set are generated by the steps of;

    providing a training text comprising a set of observed events, each event having a predictor feature X representing a predictor word and a category feature Y representing a predicted word, said predictor feature having one of M different possible values Xm, each Xm representing a different predictor word, said category feature having one of N possible values Yn, each Yn representing a different predicted word;

    (a) measuring the predictor feature value Xm and the category feature value Yn of each event in the set of events;

    (b) estimating, from the measured predictor feature values and the measured category feature values, the probability P(Xm, Yn) of occurrence of an event having a category feature value Yn and a predictor feature value Xm, for each Yn and each Xm ;

    (c) selecting a starting set SXopt (t) of predictor feature values Xm, where t has an initial value;

    (d) calculating, from the estimated probabilities P(Xm, Yn), the conditional probability P(SXopt (t)|Yn) that the predictor feature has a value in the set SXopt (t) when the category feature has a value Yn, for each Yn ;

    (e) defining a number of pairs of sets SYj (t) and SYj (t) of category feature values Yn, where j is an integer greater than zero and less than or equal to (N-1), each set SYj (t) containing only those category feature values Yn having the j lowest values of P(SXopt (t)|Yn), each set SYj (t) containing only those category feature values Yn having the (N-j) highest values of P(SXopt (t)|Yn);

    (f) finding a pair of sets SYopt (t) and SYopt (t) from among the pairs of sets SYj (t) and SYj (t) such that the pair of sets SYopt (t) and SYopt (t) have the lowest uncertainty in the value of the predictor feature;

    (g) calculating, from the estimated probabilities P(Xm, Yn), the conditional probability P(SYopt (t)|Xm) that the category feature has a value in the set SYopt (t) when the predictor feature has a value Xm, for each Xm ;

    (h) defining a number of pairs of sets SXi (t+1) and SXi (t+1) of predictor feature values Xm, where i is an integer greater than zero and less than or equal to (M-1), each set SXi (t+1) containing only those predictor feature values Xm having the i lowest values of P(SYopt (t)|Xm), each set SXi (t+1) containing only those predictor feature values Xm having the (M-i) highest values of P(SYopt (t)|Xm);

    (i) finding a pair of sets SXopt (t+1) and SXopt (t+1) from among the pairs of sets SXi (t+1) and SXi (t+1) such that the pair of sets SXopt (t+1) and SXopt (t+1) have the lowest uncertainty in the value of the category feature; and

    (1) setting the decision set equal to the set SXopt (t+1).

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×