Speech recognition system using Markov models having independent label output sets

US 5,031,217 A
Filed: 09/21/1989
Issued: 07/09/1991
Est. Priority Date: 09/30/1988
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition system comprising:

a means for generating spectrum data from input speech in every predetermined time interval;

a means for quantizing said spectrum data by using a predetermined spectrum prototype set for recognition, each spectrum prototype having an identifier, and for generating a recognition spectrum prototype identifier corresponding to each of said spectrum data;

a means for generating spectrum variation data from said input speech in each said time interval;

a means for quantizing said spectrum variation data by using a predetermined spectrum variation prototype set for recognition, each spectrum variation prototype having an identifier, and for generating a recognition spectrum variation prototype identifier corresponding to each of said spectrum variation data;

a means for storing a plurality of probabilistic models corresponding to speech of said time interval, and identified by model identifiers relating to the spectrum data and model identifiers relating to the spectrum variation data, each of which models has one or more states, transitions from said states, probabilities of said transitions, output probabilities for outputting each of said recognition spectrum prototype identifiers at each of said states or said transitions, and output probabilities for outputting each of said recognition spectrum variation prototype identifiers at each of said states or said transitions;

a means for estimating, for each of a plurality of words, each word represented by a series of probabilistic models from the storage means, a likelihood that a series of spectrum prototype identifiers and a series of spectrum variation prototype identifiers generated from an utterance of the word will be the same as the spectrum prototype identifiers and spectrum variation prototype identifiers generated from the input speech; and

a means for outputting the word having the highest likelihood.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system measures the values of at least two classes of features of an utterance: (1) a first class whose value is related to the frequency spectrum of the utterance, and (2) a second class whose value is related to the variation with time of the "first class" value of the utterance. Word baseforms are constructed from Markov model baseform units. Each output-producing transition of a baseform unit produces outputs from both classes. However, for each output-producing transition, the probabilities of producing outputs from the first class are independent of the probabilities of producing outputs from the second class.

Citations

8 Claims

1. A speech recognition system comprising:
- a means for generating spectrum data from input speech in every predetermined time interval;
  
  a means for quantizing said spectrum data by using a predetermined spectrum prototype set for recognition, each spectrum prototype having an identifier, and for generating a recognition spectrum prototype identifier corresponding to each of said spectrum data;
  
  a means for generating spectrum variation data from said input speech in each said time interval;
  
  a means for quantizing said spectrum variation data by using a predetermined spectrum variation prototype set for recognition, each spectrum variation prototype having an identifier, and for generating a recognition spectrum variation prototype identifier corresponding to each of said spectrum variation data;
  
  a means for storing a plurality of probabilistic models corresponding to speech of said time interval, and identified by model identifiers relating to the spectrum data and model identifiers relating to the spectrum variation data, each of which models has one or more states, transitions from said states, probabilities of said transitions, output probabilities for outputting each of said recognition spectrum prototype identifiers at each of said states or said transitions, and output probabilities for outputting each of said recognition spectrum variation prototype identifiers at each of said states or said transitions;
  
  a means for estimating, for each of a plurality of words, each word represented by a series of probabilistic models from the storage means, a likelihood that a series of spectrum prototype identifiers and a series of spectrum variation prototype identifiers generated from an utterance of the word will be the same as the spectrum prototype identifiers and spectrum variation prototype identifiers generated from the input speech; and
  
  a means for outputting the word having the highest likelihood.
- View Dependent Claims (2, 3)
- - 2. A speech recognition system according to claim 1, wherein each of said probabilistic models has one state, a transition from said state to the same state while outputting one of said recognition spectrum prototype identifiers and one of said recognition spectrum variation prototype identifiers, a transition from said state to a next state while outputting one of said recognition spectrum prototype identifiers and one of said recognition spectrum variation prototype identifiers, and a transition from said state to the next state without outputting said identifiers.
  - 3. A speech recognition system according to claim 2, wherein said unit to be recognized is a word.

4. A speech recognition system comprising:
- a means for generating first feature data from input speech in every predetermined time interval;
  
  a means for quantizing said first feature data by using a predetermined first feature prototype set for recognition, each first feature prototype having an identifier, and for generating a recognition first feature prototype identifier corresponding to each of said first feature data;
  
  a means for generating second feature data having a small correlation with said first feature from said input speech in each said time interval;
  
  a means for quantizing said second feature data by using a predetermined second feature prototype set for recognition each second feature prototype having an identifier, and for generating a recognition second feature prototype identifier corresponding to each of said second feature data;
  
  a means for storing a plurality of probabilistic models corresponding to speech of said time interval, and identified by model identifiers relating to said first feature and model identifiers relating to said second feature, each of which models has one or more states, transitions from said states, probabilities of said transitions, output probabilities for outputting each of said recognition first feature prototype identifiers at each of said states or said transitions, and output probabilities for outputting each of said recognition second feature prototype identifiers at each of said states or said transitions;
  
  a means for estimating, for each of a plurality of words, each word represented by a series of probabilistic models from the storage means, a likelihood that a series of first feature prototype identifiers and a series of second feature prototype identifiers generated from an utterance of the word will be the same as the first feature prototype identifiers and the second feature prototype identifiers generated from the input speech; and
  
  a means for outputting the word having the highest likelihood.

5. A speech recognition system comprising:
- means for generating a first alphabet of labels from a speech input, each label representing a sound of a selected time duration;
  
  means for generating a second alphabet of labels from a speech input, each label representing a sound of a selected time duration, the labels of the first alphabet having a small correlation to the labels of the second alphabet;
  
  means for forming a first probabilistic model for a first word, and for forming a second probabilistic model for a second word, each model comprising (a) at least first and second states, (b) at least one transition extending from the first state back to the first state, or from the first state to the second state, (c) a transition probability for each transition, (d) at least one output probability that an output label belonging to the first alphabet of labels will be produced at the transition, and (e) at least one output probability that an output label belonging to the second alphabet of labels will be produced at the transition;
  
  means for representing an utterance to be recognized as a first sequence of labels from the first alphabet and as a second sequence of labels from the second alphabet;
  
  means for determining, from the probabilistic model for each word, the probability that utterance of the word will produce the first and second sequences of labels; and
  
  means for identifying the utterance to be recognized as the word with the highest probability of producing the first and second sequences of labels.

6. An apparatus for modeling words, said apparatus comprising:
- means for measuring the values of at least first and second features of an utterance of a first word, said utterance occurring over a series of successive time intervals of equal duration Δ
  
  t, said means measuring the first and second feature values of the utterance during each time interval to produce a series of feature ectoro signals representing the first and second feature values, said first feature value having a small correlation to the second feature;
  
  means for storing a set of first label prototype signals LLP1,i, where i is a positive integer, each first label prototype signal having at least a first parameter value;
  
  means for storing a set of second label prototype signals LP2,j, where j is a positive integer, each second label prototype signal having at least a second parameter value;
  
  means for storing a finite set of probabilistic model signals Mi,j, each probabilistic model signal representing a probabilistic model of a component sound;
  
  means for comparing the first and second feature values, of each feature vector signal in the series of feature vector signals produced by the measuring means as a result of the utterance of the first word, to the parameter values of the first and second label prototype signals, respectively, to determine, for each feature vector signal, the closest pair of associated label prototype signals LP1,i and LLP2,j, respectively;
  
  means for forming a baseform of the first word from the series of feature vector signals by substituting, for each feature vector signal, the closest pair of associated label prototype signals LPq, and LP2,j to produce a baseform series of pairs of label prototype signals; and
  
  means for forming a probabilistic model of the first word from the baseform series of pairs of label prototype signals by substituting, for each pair of label prototype signals LP1,i and LP2,j an associated probabilistic model signal M1,j from the storage means, to produce a series of probabilistic model signals.
- View Dependent Claims (7, 8)
- - 7. An appearance as claimed in claim 6, characterized in that:
    - each probabilistic model signal M_i,j represents a probabilistic model comprising (a) at least first and second states, (b) at least one transition T₁ extending from the first state back to the first state, or from the first state to the second state, and (c) at least one output probability P(LP₁,i |T₁) that a first label prototype signal LP₁,i will be produced at the transition T₁ ; and
      
      there is at least one label prototype signal LP₁,1 such that the value of the probability P(LP₁,1 |,T₁) for models M₁,j is the same for all values of j.
  - 8. An apparatus as claimed in claim 6, characterized in that the value of the second feature at a time interval is a function of the variation in the value of the first feature at the time interval.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Nishimura, Masafumi
Primary Examiner(s)
NOT, DEFINED
Assistant Examiner(s)
Merecki, John A.

Application Number

US07/411,297
Time in Patent Office

656 Days
Field of Search

381/41-46, 364/513.5
US Class Current

704/256.4
CPC Class Codes

G10L 15/14 using statistical models, e...

Speech recognition system using Markov models having independent label output sets

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition system using Markov models having independent label output sets

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links