Speech coding apparatus having acoustic prototype vectors generated by tying to elementary models and clustering around reference vectors
First Claim
1. A speech coding apparatus comprising:
- means for storing a model of a training script, said training script model comprising a series of elementary models from a finite set of elementary models, each elementary model in the training script having a phonetic context comprising one or more preceding or following models in the training script;
means for measuring the value of at least one feature of a training utterance of the training script over each of a series of successive time intervals for producing a series of training feature vector signals representing feature values;
means for identifying a first set of training feature vector signals corresponding to a first elementary model in the training script model;
means for storing at least a first reference vector signal and a second reference vector signal, each reference vector signal having at least one parameter value, the first reference vector signal comprising the arithmetic mean of the training feature vector signals corresponding to the first elementary model in a first phonetic context of preceding and following phonetic models, the second reference vector signal comprising the arithmetic mean of the training feature vector signals corresponding to the first elementary model in a second phonetic context of preceding and following phonetic models, different from the first context;
means for comparing the feature values of each training feature vector signal in the first set to said at least one parameter value of the first reference vector signal to obtain a first closeness score for each training feature vector signal and the first reference vector signal;
means for comparing the feature values of each training feature vector signal in the first set to said at least one parameter value of the second reference vector signal to obtain a second closeness score for each training feature vector signal and the second reference vector signal;
means for comparing, for each training feature vector signal in the first set, the first closeness score for the training feature vector signal with the second closeness score for the training feature vector signal to obtain a reference match score for each training feature vector signal and the first and second reference vector signals;
means for storing a first subset of the training feature vector signals in the first set having reference match scores greater than a threshold Q, and for storing a second subset of the training feature vector signals in the first set having reference match scores less than the threshold Q; and
means for generating one or more partition values for a first prototype vector signal from the first subset of training feature vector signals, and for generating one or more additional partition values for the first prototype vector signal from the second subset of training feature vector signals.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech coding apparatus in which measured acoustic feature vectors are each represented by the best matched prototype vector. The prototype vectors are generated by storing a model of a training script comprising a series of elementary models. The value of at least one feature of a training utterance of the training script is measured over each of a series of successive time intervals to produce a series of training feature vectors. A first set of training feature vectors corresponding to a first elementary model in the training script is identified. The feature value of each training feature vector signal in the first set is compared to the parameter value of a first reference vector signal to obtain a first closeness score, and is compared to the parameter value of a second reference vector to obtain a second closeness score for each training feature vector. For each training feature vector in the first set, the first closeness score is compared with the second closeness score to obtain a reference match score. A first subset contains those training feature vectors in the first set having reference match scores better than a threshold Q, and a second subset contains those having reference match scores less than the threshold Q. One or more partition values are generated for a first prototype vector frown the first subset of training feature vectors, and one or more additional partition values are generated for the first prototype vector from the second subset of training feature vectors.
-
Citations
17 Claims
-
1. A speech coding apparatus comprising:
-
means for storing a model of a training script, said training script model comprising a series of elementary models from a finite set of elementary models, each elementary model in the training script having a phonetic context comprising one or more preceding or following models in the training script; means for measuring the value of at least one feature of a training utterance of the training script over each of a series of successive time intervals for producing a series of training feature vector signals representing feature values; means for identifying a first set of training feature vector signals corresponding to a first elementary model in the training script model; means for storing at least a first reference vector signal and a second reference vector signal, each reference vector signal having at least one parameter value, the first reference vector signal comprising the arithmetic mean of the training feature vector signals corresponding to the first elementary model in a first phonetic context of preceding and following phonetic models, the second reference vector signal comprising the arithmetic mean of the training feature vector signals corresponding to the first elementary model in a second phonetic context of preceding and following phonetic models, different from the first context; means for comparing the feature values of each training feature vector signal in the first set to said at least one parameter value of the first reference vector signal to obtain a first closeness score for each training feature vector signal and the first reference vector signal; means for comparing the feature values of each training feature vector signal in the first set to said at least one parameter value of the second reference vector signal to obtain a second closeness score for each training feature vector signal and the second reference vector signal; means for comparing, for each training feature vector signal in the first set, the first closeness score for the training feature vector signal with the second closeness score for the training feature vector signal to obtain a reference match score for each training feature vector signal and the first and second reference vector signals; means for storing a first subset of the training feature vector signals in the first set having reference match scores greater than a threshold Q, and for storing a second subset of the training feature vector signals in the first set having reference match scores less than the threshold Q; and means for generating one or more partition values for a first prototype vector signal from the first subset of training feature vector signals, and for generating one or more additional partition values for the first prototype vector signal from the second subset of training feature vector signals. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A speech coding method comprising:
-
storing a model of a training script, said training script model comprising a series of elementary models from a finite set of elementary models, each elementary model in the training script having a phonetic context comprising one or more preceding or following models in the training script; measuring the value of at least one feature of a training utterance of the training script over each of a series of successive time intervals for producing a series of training feature vector signals representing the feature values; identifying a first set of training feature vector signals corresponding to a first elementary model in the training script model; storing at least a first reference vector signal and a second reference vector signal, each reference vector signal having at least one parameter value, the first reference vector signal comprising the arithmetic mean of the training feature vector signals corresponding to the first elementary model in a first phonetic context of preceding and following phonetic models, the second reference vector signal comprising the arithmetic mean of the training feature vector signals corresponding to the first elementary model in a second phonetic context of preceding and following phonetic models, different from the first context; comparing the feature values of each training feature vector signal in the first set to said at least one parameter value of the first reference vector signal to obtain a first closeness score for each training feature vector signal and the first reference vector signal; comparing the feature values of each training feature vector signal in the first set to said at least one parameter value of the second reference vector signal to obtain a second closeness score for each training feature vector signal and the second reference vector signal; comparing, for each training feature vector signal in the first set, the first closeness score for the training feature vector signal with the second closeness score for the training feature vector signal to obtain a reference match score for each training feature vector signal and the first and second reference vector signals; storing a first subset of the training feature vector signals in the first set having reference match scores greater than a threshold Q, and storing a second subset of the training feature vector signals in the first set having reference match scores less than the threshold Q; and generating one or more partition values for a first prototype vector signal from the first subset of training feature vector signals, and for generating one or more additional partition values for the first prototype vector signal from the second subset of training feature vector signals. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
Specification