Speech recognizer having a speech coder for an acoustic match based on context-dependent speech-transition acoustic models
First Claim
1. A speech coding apparatus comprising:
- means for measuring the value of at least one feature of an utterance over each of a series of successive time intervals to produce a series of feature vector signals representing the feature values;
means for storing a plurality of prototype vector signals, each prototype vector signal having at least one parameter value;
means for comparing the closeness of the feature value of a first feature vector signal to the parameter values of the prototype vector signals to obtain prototype match scores for the first feature vector signal and each prototype vector signal;
means for storing a plurality of speech transition models, each speech transition model representing a speech transition from a vocabulary of speech transitions, each speech transition having an identification value, at least one speech transition being represented by a plurality of different speech transition models, each speech transition model having a plurality of speech transition model outputs, each speech transitions model output comprising a prototype match score for a prototype vector signal, each speech transition model having an output probability for each model output;
means for generating a model match score for the first feature vector signal and each speech transition model, each model match score comprising the output probability for at least one prototype match score for the first feature vector signal and a prototype vector signal;
means for generating a speech transition match score for the first feature vector signal and each speech transition, each speech transition match score comprising the best model match score for the first feature vector signal and all speech transition models representing the speech transition andmeans for outputting the identification value of each speech transition and the speech transition match score for the first feature vector signal and each speech transition as a coded utterance representation signal of the first feature vector signal.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech coding apparatus compares the closeness of the feature value of a feature vector signal of an utterance to the parameter values of prototype vector signals to obtain prototype match scores for the feature vector signal and each prototype vector signal. The speech coding apparatus stores a plurality of speech transition models representing speech transitions. At least one speech transition is represented by a plurality of different models. Each speech transition model has a plurality of model outputs, each comprising a prototype match score for a prototype vector signal. Each model output has an output probability. A model match score for a first feature vector signal and each speech transition model comprises the output probability for at least one prototype match score for the first feature vector signal and a prototype vector signal. A speech transition match score for the first feature vector signal and each speech transition comprises the best model match score for the first feature vector signal and all speech transition models representing the speech transition. The identification value of each speech transition and the speech transition match score for the first feature vector signal and each speech transition are output as a coded utterance representation signal of the first feature vector signal.
249 Citations
31 Claims
-
1. A speech coding apparatus comprising:
-
means for measuring the value of at least one feature of an utterance over each of a series of successive time intervals to produce a series of feature vector signals representing the feature values; means for storing a plurality of prototype vector signals, each prototype vector signal having at least one parameter value; means for comparing the closeness of the feature value of a first feature vector signal to the parameter values of the prototype vector signals to obtain prototype match scores for the first feature vector signal and each prototype vector signal; means for storing a plurality of speech transition models, each speech transition model representing a speech transition from a vocabulary of speech transitions, each speech transition having an identification value, at least one speech transition being represented by a plurality of different speech transition models, each speech transition model having a plurality of speech transition model outputs, each speech transitions model output comprising a prototype match score for a prototype vector signal, each speech transition model having an output probability for each model output; means for generating a model match score for the first feature vector signal and each speech transition model, each model match score comprising the output probability for at least one prototype match score for the first feature vector signal and a prototype vector signal; means for generating a speech transition match score for the first feature vector signal and each speech transition, each speech transition match score comprising the best model match score for the first feature vector signal and all speech transition models representing the speech transition and means for outputting the identification value of each speech transition and the speech transition match score for the first feature vector signal and each speech transition as a coded utterance representation signal of the first feature vector signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A speech coding method comprising:
-
measuring the value of at least one feature of an utterance over each of a series of successive time intervals to produce a series of feature vector signals representing the feature values; storing a plurality of prototype vector signals, each prototype vector signal having at least one parameter value; comparing the closeness of the feature value of a first feature vector signal to the parameter values of the prototype vector signals to obtain prototype match scores for the first feature vector signal and each prototype vector signal; storing a plurality of speech transition models, each speech transition model representing a speech transition from a vocabulary of speech transitions, each speech transition having an identification value, at least one speech transition being represented by a plurality of different speech transition models, each speech transition model having a plurality of speech transition model outputs, each speech transition model output comprising a prototype match score for a prototype vector signal, each speech transition model having an output probability for each speech transition model output; generating a model match score for the first feature vector signal and each speech transition model, each model match score comprising the output probability for at least one prototype match score for the first feature vector signal and a prototype vector signal; generating a speech transition match score for the first feature vector signal and each speech transition, each speech transition match score comprising the best model match score for the first feature vector signal and all speech transition models representing the speech transition; and outputting the identification value of each speech transition and the speech transition match score For the first feature vector signal and each speech transition as a coded utterance representation signal of the first feature vector signal. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A speech recognition apparatus comprising:
-
means for measuring the value of at least one feature of an utterance over each of a series of successive time intervals to produce a series of feature vector signals representing the feature values; means for storing a plurality of prototype vector signals, each prototype vector signal having at least one parameter value; means for comparing the closeness of the feature value of each feature vector signal to the parameter values of the prototype vector signals to obtain prototype match scores for each feature vector signal and each prototype vector signal; means for storing a plurality of speech transition models, each speech transition model representing a speech transition from a vocabulary of speech transitions, each speech transition having an identification value, at least one speech transition being represented by a plurality of different speech transition model, each speech transition model having a plurality of speech transitions model outputs, each speech transition model output comprising a prototype match score for a prototype vector signal, each speech transition model having an output probability for each model output; means for generating a model match score for each feature vector signal and each speech transition model, the model match score for a feature vector signal comprising the output probability for at least one prototype match score for the feature vector signal and a prototype vector signal; means for generating a speech transition match score for each feature vector signal and each speech transition, the speech transition match score for a feature vector signal. comprising the best model match score for the feature vector signal and all speech transition models representing the speech transition; means for storing a plurality of speech unit models, each speech unit model representing a speech unit comprising two or more speech transitions, each speech unit model comprising two or more speech transition models, each speech unit having an identification value; means for generating a speech unit match score for each feature vector signal and each speech unit, the speech unit match score for a feature vector signal comprising the best speech transition match score for the feature vector signal and all speech transitions in the speech unit; means for outputting the identification value of each speech unit and the speech unit match score of a feature vector signal and each speech unit as a coded utterance representation signal of the feature vector signal; means for storing probabilistic models for a plurality of words, each word model comprising at least one speech unit model, each word model having a starting state, an ending state, and a plurality of paths through the speech unit models from the starting state at least part of the way to the ending state; means for generating a word match score for the series of feature vector signals and each of a plurality of words, each word match score comprising a combination of the speech unit match scores for the series of feature vector signals and the speech units along at least one path through the series of speech unit models in the model of the word; means for identifying one or more best candidate words having the best word match scores; and means for outputting at least one best candidate word. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A speech recognition method comprising:
-
measuring the value of at least one feature of an utterance over each of a series of successive time intervals to produce a series of feature vector signals representing the feature values; storing a plurality of prototype vector signals, each prototype vector signal having at least one parameter value; comparing the closeness of the feature value of each feature vector signal to the parameter values of the prototype vector signals to obtain prototype match scores for each feature vector signal and each prototype vector signal; storing a plurality of speech transition models, each speech transition model representing a speech transition from a vocabulary of speech transitions, each speech transition having an identification value, at least one speech transition being represented by a plurality of different speech transition models, each speech transition model having a plurality of speech transition model outputs, each speech transition model output comprising a prototype match score for a prototype vector signal, each speech transition model having an output probability for each speech transition model output; generating a model match score for each feature vector signal and each speech transition model, the model match score for a feature vector signal comprising the output probability for at least one prototype match score for the feature vector signal and a prototype vector signal; generating a speech transition match score for each feature vector signal and each speech transition, the speech transition match score for a feature vector signal comprising the best model match score for the feature vector signal and all speech transition models representing the speech transition; storing a plurality of speech unit models, each speech unit model representing a speech unit comprising two or more speech transitions, each speech unit model comprising two or more speech transition models, each speech unit having an identification value; generating a speech unit match score for each feature vector signal and each speech unit, the speech unit match score for a feature vector signal comprising the best speech transition match score for the feature vector signal and all speech transitions in the speech unit; outputting the identification value of each speech unit and the speech unit match score of a feature vector signal and each speech unit as a coded utterance representation signal of the feature vector signal; storing probabilistic models for a plurality of words, each word model comprising at least one speech unit model, each word model having a starting state, an ending state, and a plurality of paths through the speech unit models from the starting state at least part of the way to the ending state; generating a word match score for the series of feature vector signals and each of a plurality of words, each word match score comprising a combination of the speech unit match scores for the series of feature vector signals and the speech units along at least one path through the series of speech unit models in the model of the word; identifying one or more best candidate words having the best word match scores; and outputting at least one best candidate word. - View Dependent Claims (27, 28, 29, 30)
-
-
31. A speech coding apparatus comprising:
-
means for measuring the value of at least one feature of an utterance over each of a series of successive time intervals to produce a series of feature vector signals representing the feature values; means for storing a plurality of prototype vector signals, each prototype vector signal having at least one parameter value; means for comparing the closeness of the feature value of a first feature vector signal to the parameter values of the prototype vector signals to obtain prototype match scores for the first feature vector signal and each prototype vector signal; means for storing a plurality of speech transition models, each speech transition model representing a speech transition from a vocabulary of speech transitions, each speech transition having an identification value, at least one speech transition being represented by a plurality of different speech transition models, each speech transition model having a plurality of speech transition model outputs, each speech transition model output comprising a prototype match score for a prototype vector signal, each speech transition model having an output probability for each speech transition model output; means for generating a model match score for the first feature vector signal and each speech transition model, each model match score comprising the output probability for at least one prototype match score for the first feature vector signal and a prototype vector signal; means for storing a plurality of speech unit models, each speech unit model representing a speech unit comprising two or more speech transitions, each speech unit model comprising two or more speech transition models, each speech unit having an identification value; means for generating a speech unit match score for the first feature vector signal and each speech unit, each speech unit match score comprising the best model match score for the first feature vector signal and all speech transition models representing speech transitions in the speech unit; and means for outputting the identification value of each speech unit and the speech unit match score for the first feature vector signal and each speech unit as a coded utterance representation signal of the first feature vector signal.
-
Specification