Speech recognition apparatus having a speech coder outputting acoustic prototype ranks
First Claim
1. A speech coding apparatus comprising:
- means for measuring the value of at least one feature of an utterance over each of a series of successive time intervals to produce a series of feature vector signals representing the feature values;
means for storing a plurality of prototype vector signals, each prototype vector signal having at least one parameter value and having a unique identification value;
means for comparing the closeness of the feature value of a first feature vector signal to the parameter values of the prototype vector signals to obtain prototype match scores for the first feature vector signal and each prototype vector signal;
ranking means for associating a first-rank score with the prototype vector signal having the best prototype match score, and for associating a second-rank score with the prototype vector signal having the second best prototype match score; and
means for outputting at least the identification value and the rank score of the prototype vector signal having the first-rank score, and the identification value and the rank score of the prototype vector signal having the second-rank score, as a coded utterance representation signal of the first feature vector signal.
2 Assignments
0 Petitions
Accused Products
Abstract
A speech coding and speech recognition apparatus. The value of at least one feature of an utterance is measured over each of a series of successive time intervals to produce a series of feature vector signals. The closeness of the feature value of each feature vector signal to the parameter value of each of a set of prototype vector signals is determined to obtain prototype match scores for each vector signal and each prototype vector signal. For each feature vector signal, first-rank and second-rank scores are associated with the prototype vector signals having the best and second best prototype match scores, respectively. For each feature vector signal, at least the identification value and the rank score of the first-ranked and second-ranked prototype vector signals are output as a coded utterance representation signal of the feature vector signal, to produce a series of coded utterance representation signals. For each of a plurality of speech units, a probabilistic model has a plurality of model outputs, and output probabilities for each model output. Each model output comprises the identification value of a prototype vector and a rank score. For each speech unit, a match score comprises an estimate of the probability that the probabilistic model of the speech unit would output a series of model outputs matching a reference series comprising the identification value and rank score of at least one prototype vector from each coded utterance representation signal in the series of coded utterance representation signals.
276 Citations
27 Claims
-
1. A speech coding apparatus comprising:
-
means for measuring the value of at least one feature of an utterance over each of a series of successive time intervals to produce a series of feature vector signals representing the feature values; means for storing a plurality of prototype vector signals, each prototype vector signal having at least one parameter value and having a unique identification value; means for comparing the closeness of the feature value of a first feature vector signal to the parameter values of the prototype vector signals to obtain prototype match scores for the first feature vector signal and each prototype vector signal; ranking means for associating a first-rank score with the prototype vector signal having the best prototype match score, and for associating a second-rank score with the prototype vector signal having the second best prototype match score; and means for outputting at least the identification value and the rank score of the prototype vector signal having the first-rank score, and the identification value and the rank score of the prototype vector signal having the second-rank score, as a coded utterance representation signal of the first feature vector signal. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A speech coding method comprising:
-
measuring the value of at least one feature of an utterance over each of a series of successive time intervals to produce a series of feature vector signals representing the feature values; storing a plurality of prototype vector signals, each prototype vector signal having at least one parameter value and having a unique identification value; comparing the closeness of the feature value of a first feature vector signal to the parameter values of the prototype vector signals to obtain prototype match scores for the first feature vector signal and each prototype vector signal; ranking the prototype vector signal having the best prototype match score with a first-rank score, and ranking the prototype vector signal having the second best prototype match score with a second-rank score; and outputting at least the identification value and the rank score of the prototype vector signal having the first-rank score, and the identification value and the rank score of the prototype vector signal having the second-rank score, as a coded utterance representation signal of the first feature vector signal. - View Dependent Claims (8, 9, 10)
-
-
11. A speech recognition apparatus comprising:
-
means for measuring the value of at least one feature of an utterance over each of a series of successive time intervals to produce a series of feature vector signals representing the feature values; means for storing a plurality of prototype vector signals, each prototype vector signal having at least one parameter value and having a unique identification value; means for comparing the closeness of the feature value of each feature vector signal to the parameter values of the prototype vector signals to obtain prototype match scores for each feature vector signal and each prototype vector signal; ranking means for associating, for each feature vector signal, a first-rank score with the prototype vector signal having the best prototype match score, and a second-rank score with the prototype vector signal having the second best prototype match score; means for outputting, for each feature vector signal, at least the identification value and the rank score of the prototype vector signal having the first-rank score, and the identification value and the rank score of the prototype vector signal having the second-rank score, as a coded utterance representation signal of the feature vector signal, to produce a series of coded utterance representation signals; means for storing probabilistic models for a plurality of speech units, at least a first model for a first speech unit having (a) at least two states, (b) at least one transition extending from a state to the same or another state, (c) a transition probability for each transition, (d) a plurality of model outputs for at least one prototype vector at a transition, each model output comprising the identification value of the prototype vector and a rank score, and (e) output probabilities at a transition for each model output; means for generating a match score for each of a plurality of speech units, each match score comprising an estimate of the probability that the probabilistic model of the speech unit would output a series of model outputs matching a reference series comprising the identification value and rank score of at least one prototype vector from each coded utterance representation signal in the series of coded utterance representation signals; means for identifying one or more best candidate speech units having the best match scores; and means for outputting at least one speech subunit of one or more of the best candidate speech units. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A speech recognition method comprising:
-
measuring the value of at least one feature of an utterance over each of a series of successive time intervals to produce a series of feature vector signals representing the feature values; storing a plurality of prototype vector signals, each prototype vector signal having at least one parameter value and having a unique identification value; comparing the closeness of the feature value of each feature vector signal to the parameter values of the prototype vector signals to obtain prototype match scores for each feature vector signal and each prototype vector signal; ranking, for each feature vector signal, the prototype vector signal having the best prototype match score with a first-rank score, and the prototype vector signal having the second best prototype match score with a second-rank score; outputting, for each feature vector signal, at least the identification value and the rank score of the prototype vector signal having the first-rank score, and the identification value and the rank score of the prototype vector signal having the second-rank score, as a coded utterance representation signal of the feature vector signal, to produce a series of coded utterance representation signals; storing probabilistic models for a plurality of speech units, at least a first model for a first speech unit having (a) at least two states, (b) at least one transition extending from a state to the same or another state, (c) a transition probability for each transition, (d) a plurality of model outputs for at least one prototype vector at a transition, each model output comprising the identification value of the prototype vector and a rank score, (e) output probabilities at a transition for each model output; generating a match score for each of a plurality of speech units, each match score comprising an estimate of the probability that the probabilistic model of the speech unit would output a series of model outputs matching a reference series comprising the identification value and rank score of at least one prototype vector from each coded utterance representation signal in the series of coded utterance representation signals; identifying one or more best candidate speech units having the best match scores; and outputting at least one speech subunit of one or more of the best candidate speech units. - View Dependent Claims (25, 26, 27)
-
Specification