Method and apparatus for finding the best splits in a decision tree for a language model for a speech recognizer

US 5,263,117 A
Filed: 10/26/1989
Issued: 11/16/1993
Est. Priority Date: 10/26/1989
Status: Expired due to Term

First Claim

Patent Images

1. A method of automatic speech recognition comprising the steps of:

converting an utterance into an utterance signal representing the utterance, said utterance comprising a series of at least a predictor word and a predicted word, said utterance signal comprising at least one predictor word signal representing the predictor word;

providing a set of M predictor feature signals, each predictor feature signal having a predictor feature value X_m, where M is an integer greater than or equal to three and m is an integer greater than zero and lens than or equal to M, each predictor feature signal in the set representing a different word;

generating a decision set which contains a subset of the M predictor feature signals representing the words;

comparing the predictor word signal with the predictor feature signals in the decision set;

outputting a first category feature signal representing a first predicted word if the predictor word signal is a member of the decision set, said first category feature signal being one of N category feature signals, each category feature signal representing a different word and having a category feature value Y_n, where N is an integer greater than or equal to three, and n is an integer greater than zero and less than or equal to N; and

outputting a second category feature signal, different from the first category feature signal and representing a second predicted word different from the first predicted word if the predictor word signal is not a member of the decision set;

characterized in that the contents of the decision set are generated by the steps of;

providing a training text comprising a set of observed events, each event having a predictor feature X representing a predictor word and a category feature Y representing a predicted word, said predictor feature having one of M different possible values X_m, each X_m representing a different predictor word, said category feature having one of N possible values Y_n, each Y_n representing a different predicted word;

(a) measuring the predictor feature value X_m and the category feature value Y_n of each event in the set of events;

(b) estimating, from the measured predictor feature values and the measured category feature values, the probability P(X_m, Y_n) of occurrence of an event having a category feature value Y_n and a predictor feature value X_m, for each Y_n and each X_m ;

(c) selecting a starting set SX_opt (t) of predictor feature values X_m, where t has an initial value;

(d) calculating, from the estimated probabilities P(X_m, Y_n), the conditional probability P(SX_opt (t)|Y_n) that the predictor feature has a value in the set SX_opt (t) when the category feature has a value Y_n, for each Y_n ;

(e) defining a number of pairs of sets SY_j (t) and SY_j (t) of category feature values Y_n, where j is an integer greater than zero and less than or equal to (N-1), each set SY_j (t) containing only those category feature values Y_n having the j lowest values of P(SX_opt (t)|Y_n), each set SY_j (t) containing only those category feature values Y_n having the (N-j) highest values of P(SX_opt (t)|Y_n);

(f) finding a pair of sets SY_opt (t) and SY_opt (t) from among the pairs of sets SY_j (t) and SY_j (t) such that the pair of sets SY_opt (t) and SY_opt (t) have the lowest uncertainty in the value of the predictor feature;

(g) calculating, from the estimated probabilities P(X_m, Y_n), the conditional probability P(SY_opt (t)|X_m) that the category feature has a value in the set SY_opt (t) when the predictor feature has a value X_m, for each X_m ;

(h) defining a number of pairs of sets SX_i (t+1) and SX_i (t+1) of predictor feature values X_m, where i is an integer greater than zero and less than or equal to (M-1), each set SX_i (t+1) containing only those predictor feature values X_m having the i lowest values of P(SY_opt (t)|X_m), each set SX_i (t+1) containing only those predictor feature values X_m having the (M-i) highest values of P(SY_opt (t)|X_m);

(i) finding a pair of sets SX_opt (t+1) and SX_opt (t+1) from among the pairs of sets SX_i (t+1) and SX_i (t+1) such that the pair of sets SX_opt (t+1) and SX_opt (t+1) have the lowest uncertainty in the value of the category feature; and

(1) setting the decision set equal to the set SX_opt (t+1).

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for finding the best or near best binary classification of a set of observed events, according to a predictor feature X so as to minimize the uncertainty in the value of a category feature Y. Each feature has three or more possible values. First, the predictor feature value and the category feature value of each event is measured. The events are then split, arbitrarily, into two sets of predictor feature values. From the two sets of predictor feature values, an optimum pair of sets of category feature values is found having the lowest uncertainty in the value of the predictor feature. From the two optimum sets of category feature values, an optimum pair of sets is found having the lowest uncertainty in the value of the category feature. An event is then classified according to whether its predictor feature value is a member of a set of optimal predictor feature values.

39 Citations

View as Search Results

2 Claims

1. A method of automatic speech recognition comprising the steps of:
- converting an utterance into an utterance signal representing the utterance, said utterance comprising a series of at least a predictor word and a predicted word, said utterance signal comprising at least one predictor word signal representing the predictor word;
  
  providing a set of M predictor feature signals, each predictor feature signal having a predictor feature value X_m, where M is an integer greater than or equal to three and m is an integer greater than zero and lens than or equal to M, each predictor feature signal in the set representing a different word;
  
  generating a decision set which contains a subset of the M predictor feature signals representing the words;
  
  comparing the predictor word signal with the predictor feature signals in the decision set;
  
  outputting a first category feature signal representing a first predicted word if the predictor word signal is a member of the decision set, said first category feature signal being one of N category feature signals, each category feature signal representing a different word and having a category feature value Y_n, where N is an integer greater than or equal to three, and n is an integer greater than zero and less than or equal to N; and
  
  outputting a second category feature signal, different from the first category feature signal and representing a second predicted word different from the first predicted word if the predictor word signal is not a member of the decision set;
  
  characterized in that the contents of the decision set are generated by the steps of;
  
  providing a training text comprising a set of observed events, each event having a predictor feature X representing a predictor word and a category feature Y representing a predicted word, said predictor feature having one of M different possible values X_m, each X_m representing a different predictor word, said category feature having one of N possible values Y_n, each Y_n representing a different predicted word;
  
  (a) measuring the predictor feature value X_m and the category feature value Y_n of each event in the set of events;
  
  (b) estimating, from the measured predictor feature values and the measured category feature values, the probability P(X_m, Y_n) of occurrence of an event having a category feature value Y_n and a predictor feature value X_m, for each Y_n and each X_m ;
  
  (c) selecting a starting set SX_opt (t) of predictor feature values X_m, where t has an initial value;
  
  (d) calculating, from the estimated probabilities P(X_m, Y_n), the conditional probability P(SX_opt (t)|Y_n) that the predictor feature has a value in the set SX_opt (t) when the category feature has a value Y_n, for each Y_n ;
  
  (e) defining a number of pairs of sets SY_j (t) and SY_j (t) of category feature values Y_n, where j is an integer greater than zero and less than or equal to (N-1), each set SY_j (t) containing only those category feature values Y_n having the j lowest values of P(SX_opt (t)|Y_n), each set SY_j (t) containing only those category feature values Y_n having the (N-j) highest values of P(SX_opt (t)|Y_n);
  
  (f) finding a pair of sets SY_opt (t) and SY_opt (t) from among the pairs of sets SY_j (t) and SY_j (t) such that the pair of sets SY_opt (t) and SY_opt (t) have the lowest uncertainty in the value of the predictor feature;
  
  (g) calculating, from the estimated probabilities P(X_m, Y_n), the conditional probability P(SY_opt (t)|X_m) that the category feature has a value in the set SY_opt (t) when the predictor feature has a value X_m, for each X_m ;
  
  (h) defining a number of pairs of sets SX_i (t+1) and SX_i (t+1) of predictor feature values X_m, where i is an integer greater than zero and less than or equal to (M-1), each set SX_i (t+1) containing only those predictor feature values X_m having the i lowest values of P(SY_opt (t)|X_m), each set SX_i (t+1) containing only those predictor feature values X_m having the (M-i) highest values of P(SY_opt (t)|X_m);
  
  (i) finding a pair of sets SX_opt (t+1) and SX_opt (t+1) from among the pairs of sets SX_i (t+1) and SX_i (t+1) such that the pair of sets SX_opt (t+1) and SX_opt (t+1) have the lowest uncertainty in the value of the category feature; and
  
  (1) setting the decision set equal to the set SX_opt (t+1).

2. An automatic speech recognition system comprising:
- means for converting an utterance into an utterance signal representing the utterance, said utterance comprising a series of at least a predictor word and a predicted word, said utterance signal comprising at least one predictor word signal representing the predictor word;
  
  means for storing a set of M predictor feature signals, each predictor feature signal having a predictor feature value X_m, where M is an integer greater than or equal to three and m is an integer greater than zero and less than or equal to M, each predictor feature signal in the set representing a different word;
  
  means for generating a decision set which contains a subset of the M predictor feature signals representing the words;
  
  means for comparing the predictor word signal with the predictor feature signals in the decision set;
  
  means for outputting a first category feature signal representing a first predicted word if the predictor word signal is a member of the decision set, said first category feature signal being one of N category feature signals, each category feature signal representing a different word and having a category feature value Y_n, where N is an integer greater than or equal to three, and n is an integer greater than zero and less than or equal to N; and
  
  means for outputting a second category feature signal, different from the first category feature signal and representing a second predicted word different from the first predicted word if the predictor word signal is not a member of the decision set;
  
  characterized in that the means for generating the decision set comprises;
  
  means for storing a training text comprising a set of observed events, each event having a predictor feature X representing a predictor word and a category feature Y representing a predicted word, said predictor feature having one of M different possible values X_m, each X_m representing a different predictor word, said category feature having one of N possible values Y_n, each Y_n representing a different predicted word;
  
  (a) means for measuring the predictor feature value X_m and the category feature value Y_n of each event in the set of events;
  
  (b) means for estimating, from the measured predictor feature values and the measured category feature values, the probability P(X_m, Y_n) of occurrence of an event having a category feature value Y_n and a predictor feature value X_m, for each Y_n and each X_m ;
  
  (c) means for selecting a starting set SX_opt (t) of predictor feature values X_m, where t has an initial value;
  
  (d) means for calculating, from the estimated probabilities P(X_m, Y_n), the conditional probability P(SX_opt (t)|Y_n) that the predictor feature has a value in the set SX_opt (t) when the category feature has a value Y_n, for each Y_n ;
  
  (e) means for defining a number of pairs of sets SY_j (t) and SY_j (t) of category feature values Y_n, where j is an integer greater than zero and less than or equal to (N-1), each set SY_j (t) containing only those category feature values Y_n having the j lowest values of P(SX_opt (t)|Y_n), each set SY_j (t) containing only those category feature values Y_n having the (N-j) highest values of P(SX_opt (t)|Y_n);
  
  (f) means for finding a pair of sets SY_opt (t) and SY_opt (t) from among the pairs of sets SY_j (t) and SY_j (t) such that the pair of sets SY_opt (t) and SY_opt (t) have the lowest uncertainty in the value of the predictor feature;
  
  (g) means for calculating, from the estimated probabilities P(X_m, Y_n), the conditional probability P(SY_opt (t)|X_m) that the category feature has a value in the set SY_opt (t) when the predictor feature has a value X_m, for each X_m ;
  
  (h) means for defining a number of pairs of sets SX_i (t+1) and SX_i (t+1) of predictor feature values X_m, where i is an integer greater than zero and less than or equal to (M-1), each set SX_i (t+1) containing only those predictor feature values X_m having the i lowest values of P(SY_opt (t)|X_m), each set SX_i (t+1) containing only those predictor feature values X_m having the (M-i) highest values of P(SY_opt (t)|X_m);
  
  (i) means for finding a pair of sets SX_opt (t+1) and SX_opt (t+1) from among the pairs of sets SX_i (t+1) and SX_i (t+1) such that the pair of sets SX_opt (t+1) and SX_opt (t+1) have the lowest uncertainty in the value of the category feature; and
  
  (l) means for outputting the set SX_opt (t+1) as the decision set.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Nahamoo, David, Nadas, Arthur J.
Primary Examiner(s)
Fleming, Michael R.
Assistant Examiner(s)
Doerrler, Michelle

Application Number

US07/427,420
Time in Patent Office

1,482 Days
Field of Search

381/41-46, 364/513.5
US Class Current

704/200
CPC Class Codes

G10L 15/197 Probabilistic grammars, e.g...

Method and apparatus for finding the best splits in a decision tree for a language model for a speech recognizer

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

39 Citations

2 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for finding the best splits in a decision tree for a language model for a speech recognizer

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

39 Citations

2 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links