Voice synthesis utilizing multi-level filter excitation
First Claim
1. A processing system for the analysis and synthesis of human speech comprising:
- means for storing a plurality of speech frames each having a predetermined number of evenly spaced samples of instantaneous amplitudes of said speech;
means for calculating a set of speech parameter signals defining a vocal tract for each speech frame;
means for designating a first subset of said plurality of speech frames as voiced and a second subset of said plurality of speech frames as unvoiced;
means for generating pitch type excitation information for each frame of said first subset of said plurality of speech frames;
means for producing a plurality of other types of excitation information for each frame of said second subset of said plurality of speech frames;
means responsive to said designating means designating each frame of said first subset of said plurality of speech frames for combining said pitch type excitation information and said set of said speech parameter signals;
said combining means further comprises means responsive to said designating means designating each frame of said second subset of said plurality of speech frames for selecting one of said other types of excitation information and means for combining the selected one of said other types of excitation information with the set of said speech parameter signals; and
means for communicating said combined excitation information including said pitch type excitation information and the set of said speech parameter signals for each frame of said first subset of said plurality of speech frames and said combined excitation information including the selected one of said other types of excitation information and the set of said speech parameter signals for each of frame of said second subset of said plurality of speech frames.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech analysis and synthesis system where pitch information for excitation is transmitted during voice segments of speech and pulse excitation or noise excitation is transmitted during unvoiced speech segments along with linear predictive coding (LPC) parameters. The decision of whether to transmit noise excitation or pulse excitation is performed by comparing the variance of the residual to the square of the mean amplitude of the rectified residual for each frame. If the result of this comparison is greater than a threshold value, pulse excitation is utilized otherwise noise excitation is used. The pulse excitation comprises a subset of samples of the LPC residual as determined by the relative amplitudes and spacing of the local maxima in the LPC residual.
-
Citations
24 Claims
-
1. A processing system for the analysis and synthesis of human speech comprising:
-
means for storing a plurality of speech frames each having a predetermined number of evenly spaced samples of instantaneous amplitudes of said speech; means for calculating a set of speech parameter signals defining a vocal tract for each speech frame; means for designating a first subset of said plurality of speech frames as voiced and a second subset of said plurality of speech frames as unvoiced; means for generating pitch type excitation information for each frame of said first subset of said plurality of speech frames; means for producing a plurality of other types of excitation information for each frame of said second subset of said plurality of speech frames; means responsive to said designating means designating each frame of said first subset of said plurality of speech frames for combining said pitch type excitation information and said set of said speech parameter signals; said combining means further comprises means responsive to said designating means designating each frame of said second subset of said plurality of speech frames for selecting one of said other types of excitation information and means for combining the selected one of said other types of excitation information with the set of said speech parameter signals; and means for communicating said combined excitation information including said pitch type excitation information and the set of said speech parameter signals for each frame of said first subset of said plurality of speech frames and said combined excitation information including the selected one of said other types of excitation information and the set of said speech parameter signals for each of frame of said second subset of said plurality of speech frames. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17)
-
-
11. A processing system for the analysis and synthesis of human speech comprising:
-
means for storing a plurality of speech frames each having a predetermined number of evenly spaced samples of instantaneous amplitudes of said speech; means for calculating a set of speech parameter signals defining a vocal tract for each speech frame; means for detecting speech resulting from a fundamental frequency and a noise-like source for each speech frame; means for forming pitch excitation information for each frame upon the frame containing said fundamental frequency; means for forming excitation information to indicate that noise excitation information is to be used to synthesize each of said frames upon speech of the frame resulting from said noise-like source in the human larynx; means for forming excitation information from another excitation source upon an absence of said fundamental frequency and said noise-like source; and means for combining the formed excitation information and the set of parameter signals of each frame for communication. - View Dependent Claims (18, 19, 20)
-
-
21. A method for analyzing and synthesizing human speech with a system comprising a quantizer for converting the speech into frames of digital samples and a digital signal processor responsive to a plurality of program instructions to analyze and synthesize the speech, said method comprising the steps of:
-
storing a plurality of speech frames each having a predetermined number of evenly spaced samples of instantaneous amplitudes of said speech; calculating a set of speech parameter signals defining a vocal tract for each speech frame; designating a first subset of said plurality of speech frames as voiced and a second subset of said plurality of speech frames as unvoiced; generating pitch type excitation information for each frame of said first subset of said plurality of speech frames; producing a plurality of other types of excitation information for each frame of said second subset of said plurality of speech frames; combining said pitch type excitation information and said set of speech parameter signals for each frame of said first subset of said plurality of speech frames designated as voiced; selecting one of said other types of excitation for each frame of said second subset of said plurality of speech frames; combining the selected one of said other type of excitation information with the set of said speech parameters for each frame of said second subset of said plurality of speech frames; and communicating said combined excitation information including said pitch-type excitation information and the set of said speech parameter signals for each frame of said first subset of said plurality of speech frames and said combined excitation information including the selected one of said other types of excitation information and the set of said speech parameter signals for each frame of said second subset of said plurality of speech frames. - View Dependent Claims (22, 23, 24)
-
Specification