Method and apparatus for determining articulatory parameters from speech data
First Claim
1. A method of determining the values of a series of N articulatory parameters from speech data, comprising the steps of:
- creating a plurality of speech phoneme classes, each of said speech phoneme classes including a plurality of speech phonemes sharing similar spectral and articulatory characteristics;
providing a digital speech data signal representative of speech;
selecting data segments of said speech data signal at predetermined sampling intervals according to predefined changes in energy levels in said speech data signal;
transforming said selected data segments into spectral data segments;
converting each of said spectral data segments into said speech phoneme classes so as to generate a weight for the probability that said segment corresponds to phonemes within each of said classes;
converting each of said spectral data segments into a plurality of articulatory parameters for each of said speech phoneme classes so as to generate a series of N parameter values representative of articulatory characteristics in each speech phoneme class; and
combining the weight for the probability that spectral data segments correspond to a given speech phoneme class with the output parameter values from each speech phoneme class so as to form a single series of N parameter values for selected data segments.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for determining from continuous speech, the instantaneous values of a set of articulatory parameters. The continuous speech data is a sequence of spectral profiles obtained by spectrally sampling continuous speech. The spectral samples are presented in sequence to a plurality of class transforms, each establishing a respective speech phoneme class which includes plurality of speech phoneme having similar spectral and articulatory characteristics. Each class transform converts a speech segment included in its class and contained in a spectral sample into a predetermined set of articulatory parameter values. A class-discriminating transform operates in parallel with the class transforms to produce a set of probability values, each indicating the probability that the spectral sample being transformed represents a phoneme in a respective speech phoneme class. An array of multipliers adjusts the predetermined values of the sets produced by the class transforms by multiplying the values of each set by the probability value produced for that set by the class-discriminating transform. The adjusted articulatory parameter value sets are combined by adding corresponding elements to produce a set of adjusted articulatory parameter values indicative of an articulatory tract configuration appropriate for producing the sampled speech.
-
Citations
38 Claims
-
1. A method of determining the values of a series of N articulatory parameters from speech data, comprising the steps of:
-
creating a plurality of speech phoneme classes, each of said speech phoneme classes including a plurality of speech phonemes sharing similar spectral and articulatory characteristics; providing a digital speech data signal representative of speech; selecting data segments of said speech data signal at predetermined sampling intervals according to predefined changes in energy levels in said speech data signal; transforming said selected data segments into spectral data segments; converting each of said spectral data segments into said speech phoneme classes so as to generate a weight for the probability that said segment corresponds to phonemes within each of said classes; converting each of said spectral data segments into a plurality of articulatory parameters for each of said speech phoneme classes so as to generate a series of N parameter values representative of articulatory characteristics in each speech phoneme class; and combining the weight for the probability that spectral data segments correspond to a given speech phoneme class with the output parameter values from each speech phoneme class so as to form a single series of N parameter values for selected data segments. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
9. The method of claim 2 wherein the step of converting each of said log segments into a plurality of speech phoneme classes comprises the step of multiplying spectral samples in each log segment by a class distinction matrix in the form of a linear transformation matrix having Q columns by R rows, Q being a predetermined number of spectral ranges for sampling purposes and R being a number of spectral classes used, and each element representing a weighting factor for the probability that a given spectral component falls within a given one of said speech phoneme classes, said multiplying producing a raw class vector.
-
10. The method of claim 9 wherein the step of converting each of said log segments into a plurality of articulation parameters, comprises the step of multiplying spectral samples in each log segment by a plurality of class matrixes, each class matrix being in the form of a linear transformation matrix having S columns by P rows, S being a number of predetermined spectral ranges for sampling purposes and P being equal to the number of articulatory parameters used, and each element represents a weighting factor proportional to the probability that a given spectral component represents a given one of said parameters in the class, said multiplying producing a plurality of class parameter vectors.
-
11. The method of claim 10 wherein the step of combining comprises the steps of:
-
normalizing the raw class vector; multiplying log segments by each of said normalized raw class vector elements separately before multiplying by a class matrix corresponding to said normalized vector element so as to produce a weighted segment input for each class matrix; and adding all of said parameter vectors to form a single output parameter vector.
-
-
12. The method of claim 10 wherein the step of combining further comprises the steps of:
-
normalizing the raw class vector; multiplying each of said class parameter vectors by a single element of said normalized raw class vector elements corresponding to a class matrix the parameter vector originates from to produce a plurality of weighted parameter vectors; and adding all of said weighted parameter vectors to form a single output parameter vector.
-
-
13. The method of claim 2 wherein the step of boosting high frequency comprises the step of applying a relationship:
-
space="preserve" listing-type="equation">Y.sub.n =X.sub.n -α
X.sub.n-1where Yn is an output signal, Xn is an input signal and α
is typically between 0.5 and 0.7.
-
- 14. The method of claim 2 wherein the step of transforming comprises the step of applying a function defined by
- space="preserve" listing-type="equation">W.sub.n =0.5-0.49 Cos [(π
/16)n]for n=0 . . . 31
- space="preserve" listing-type="equation">W.sub.n =0.5-0.49 Cos [(π
-
-
15. The method of claim 2 wherein said step of transforming comprises the step of transforming data samples according to relationship defined by:
- ##EQU6## where Zk represents an output signal and Yn represents an input signal.
-
16. The method of claim 1 wherein the step of monitoring the energy level of said digital speech signal further comprises the step of tracking pitch variations in said digital speech signal.
-
17. The method of claim 1 wherein said step of selecting comprises the step of transferring a predetermined number, D, of digital samples at a time.
-
18. The method of claim 17 wherein D=32.
-
19. The method of claim 1 further comprising the steps of:
-
generating an image representative of a mid-sagital view of a human articulatory tract; associating said articulatory parameters with corresponding anatomical points on said image; and altering said image according to variations in said articulatory parameter values.
-
-
20. An apparatus for determining the status of a plurality of articulatory parameters from speech data, comprising:
-
sampling means for sampling speech data at a predetermined sampling rate and for providing speech data sample segments of predetermined length at predetermined sampling intervals based upon changes in energy in said speech data; a transformation processor connected in series with said sampling means for receiving said speech data sample segments and transforming them from time varying amplitude data into spectral data segments; first mapping means connected to said transformation processor for associating spectral data in each of said spectral data segments with one or more of a plurality of predefined speech phoneme classes so as to generate a weight for the probability that said segments correspond to spectra within each of said classes; second mapping means connected in series with said transformation processor and in parallel with said first mapping means for transforming spectral data in each of said spectral data segments into a plurality of articulatary parameters for each of said plurality of classes so as to generate a series of N articulatory parameter values representative of parameters in each class to which spectra represented by said segments correspond; and combination means connected to said first and second mapping means for combining said weight for the probability of a given class with the series of N articulatory parameters so as to generate a single weighted N parameter output. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37)
-
-
38. A system for determining values of articulatory parameters that are representative of articulation tract configuration during the production of speech, comprising;
-
a speech converter for generating a series of speech spectral samples representative of continuous speech; a plurality of spectral transform means connected in parallel to said speech converter, each of said spectral transform means for establishing a respective speech phoneme class including a plurality of speech phonemes having corresponding spectral and articulatory characteristics and for converting a speech spectrum in its established class into a predetermined set of articulatory parameter values; a class distinction transform means connected to said speech converter for producing a set of probability values, each probability value of said set representing the probability that a respective speech phoneme class has a speech phoneme represented by said speech spectral sample; an arrayed combinatory modality connected to said plurality of spectral transform means and to said class distinction transform means for combining each of said articulatory parameter value sets with a respective probability value to produce a plurality of adjusted articulatory parameter value sets; and a single combinatory modality for combining said plurality of adjusted articulatory parameter value sets into a set of adjusted articulatory parameter values representative of an articulatory tract configuration.
-
Specification