Apparatus and method for determining articulatory-orperation speech parameters
First Claim
1. An apparatus for determining, from an input speech signal, parameters defining the articulatory operation of a speech production system that generated the input speech signal, the apparatus comprising:
- dividing means operable for dividing the input speech signal into a succession of frames;
a feature analyzer operable for determining acoustic features of the input speech signal during each frame;
a segmenter operable for defining a succession of segments as a number of frames by (i) comparing signals representative of the acoustic features for a current frame with signals representative of the acoustic features of previous frames in a current segment;
(ii) including he current frame in the current segment if said signals differ by less than a threshold value; and
(iii) beginning a new segment with the current frame if said signals differ by more than said threshold value; and
determining means operable for determining, for each segment, said articulatory speech production parameters using the acoustic features of the input speech signal during the segment and using stored reference data which relates said acoustic features to said articulatory speech production parameters.
1 Assignment
0 Petitions
Accused Products
Abstract
In an apparatus for extracting information from an input speech signal, a preprocessor, a buffer, a segmenter, an acoustic classifier and a feature extractor are provided. The preprocessor generates formant related information for consecutive time frames of the input speech signal. This formant related information is fed into the buffer, which can store signals representative of a plurality of frames. The segmenter monitors the signals representative of the incoming frames and identifies segments in the input speech signal during which variations in the formant related information remain within prespecified limits. The acoustic classifier then determines classification information for each segment identified by the segmenter, based on acoustic classes found in training data. The feature estimator then determines, for each segment, the information required, based on the input speech signal during that segment, training data and the classification information determined by the acoustic classifier.
-
Citations
62 Claims
-
1. An apparatus for determining, from an input speech signal, parameters defining the articulatory operation of a speech production system that generated the input speech signal, the apparatus comprising:
-
dividing means operable for dividing the input speech signal into a succession of frames; a feature analyzer operable for determining acoustic features of the input speech signal during each frame; a segmenter operable for defining a succession of segments as a number of frames by (i) comparing signals representative of the acoustic features for a current frame with signals representative of the acoustic features of previous frames in a current segment;
(ii) including he current frame in the current segment if said signals differ by less than a threshold value; and
(iii) beginning a new segment with the current frame if said signals differ by more than said threshold value; anddetermining means operable for determining, for each segment, said articulatory speech production parameters using the acoustic features of the input speech signal during the segment and using stored reference data which relates said acoustic features to said articulatory speech production parameters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 34, 62)
-
-
28. A speech processing method comprising the steps of:
-
providing a set of training speech signals having known phonetic boundaries and/or known boundaries between segments in the training speech signals, acoustic features of the training speech signals being substantially the same in each segment; dividing said training speech signals into a succession of frames; determining, for each frame, acoustic features of the training speech signal during the frame; defining a succession of segments within the training speech signals, and thus defining the location of trial boundaries between segments, by (i) comparing signals representative of the acoustic features for a current frame with signal representative of the acoustic features of previous frames in the current segment;
(ii) including the current frame in the current segment if said signals differ by less than a threshold value;
(iii) beginning a new segment with the current frame if said signals differ by more than said threshold value; and
(iv) performing steps (i) to (iii) for a plurality of different threshold values;comparing the trial boundaries between segments defined in said defining step of the different threshold values with the known boundaries of the training speech signals; storing the threshold value which gives a good correlation between the known boundaries and the trial boundaries defined in said defining steps; using the stored threshold value to segment subsequent input speech; and
determining articulatory speech production parameters of the segmented speech.
-
-
29. A speech processing method comprising the steps of:
-
dividing a set of training speech signals representative of utterances from a plurality of different users into a succession of frames for each utterance; determining acoustic features of the training speech signal during each frame; defining a succession of segments within each utterance of training speech by (i) comparing signals representative of the acoustic features for a current frame with signals representative of the acoustic features of previous frames in a current segment;
(ii) including the current frame in the current segment if said signals differ by less than a threshold value; and
(iii) beginning a new segment with the current frame if said signals differ by more than said threshold value;identifying boundaries between acoustic classes representative of acoustic characteristics identified in the set of training speech signals by using segments in all utterances of the training speech; determining reference data modelling each acoustic class by using segments in all utterances of the training speech; storing said reference data; and using the stored reference data to determine articulatory speech production parameters of subsequently input speech. - View Dependent Claims (30, 31, 32, 33)
-
-
35. A method of determining, from an input speech signal, parameters defining the articulatory operation of a speech production system that generated the input speech signal, the method comprising the steps of:
-
dividing the input speech signal into a succession of frames; determining acoustic features of the input speech signal during each frame; defining a succession of segments as a number of frames by (i) comparing signals representative of the acoustic features for a current frame with signals representative of the acoustic features of previous frames in a current segment;
(ii) including the current frame in the current segment if said signals differ by less than a threshold value; and
(iii) beginning a new segment with the current frame if said signals differ by more than said threshold value; anddetermining, for each segment, said articulatory speech production parameters using the acoustic features of the input speech signal during the segment and using stored reference data which relates said acoustic features to said articulatory speech production parameters. - View Dependent Claims (36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61)
-
Specification