Expressivity of voice synthesis by emphasizing source signal features
First Claim
1. Voice synthesiser apparatus comprising:
- a source module adapted to output, during use, a source signal;
a filter module arranged to receive said source signal as an input and to apply thereto a filter characteristic modelling the response of the vocal tract;
characterised in that the source module comprises a library of stored representations of source sound categories each corresponding to a respective morphological category, and that the source signal output by the source module corresponds to a stored representation of a selected source sound category;
wherein the source module comprises a resynthesis device adapted to output said source signal and that the stored representations in said library are in the form of resynthesis coefficients enabling said source sound categories to be regenerated by the resynthesis device;
wherein the stored representations in said library are derived by inverse filtering real vocal sounds so as to subtract the articulatory effects imposed by the vocal tract, and stored representations corresponding to a particular morphological category are derived by averaging signals that are produced by inverse filtering a plurality of examples of vocal sounds embodying the morphological category.
2 Assignments
0 Petitions
Accused Products
Abstract
Voice synthesis with improved expressivity is obtained in a voice synthesiser of source-filter type by making use of a library of source sound categories in the source module. Each source sound category corresponds to a particular morphological category and is derived from analysis of real vocal sounds, by inverse filtering so as to subtract the effect of the vocal tract. The library may be parametrical, that is, the stored data corresponds not to the inverse-filtered sounds themselves but to synthesis coefficients for resynthesising the inverse-filtered sounds using any suitable re-synthesis technique, such as the phase vocoder technique. The coefficients are derived by Short Time Fourier Transform (STFT) analysis.
190 Citations
10 Claims
-
1. Voice synthesiser apparatus comprising:
-
a source module adapted to output, during use, a source signal;
a filter module arranged to receive said source signal as an input and to apply thereto a filter characteristic modelling the response of the vocal tract;
characterised in that the source module comprises a library of stored representations of source sound categories each corresponding to a respective morphological category, and that the source signal output by the source module corresponds to a stored representation of a selected source sound category;
wherein the source module comprises a resynthesis device adapted to output said source signal and that the stored representations in said library are in the form of resynthesis coefficients enabling said source sound categories to be regenerated by the resynthesis device;
wherein the stored representations in said library are derived by inverse filtering real vocal sounds so as to subtract the articulatory effects imposed by the vocal tract, and stored representations corresponding to a particular morphological category are derived by averaging signals that are produced by inverse filtering a plurality of examples of vocal sounds embodying the morphological category. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method of voice synthesis comprising the steps of:
-
providing a source module, causing said source module to generate a source signal corresponding to a particular morphological category of sound, providing a filter module having a filter characteristic modelling the response of the vocal tract;
inputting the source signal to the filter module, characterised in that the step of providing a source module comprises providing a source module comprising a library of stored representations of source sound categories each corresponding to a respective morphological category, and that the source signal output by the source module corresponds to a stored representation of a selected source sound category, wherein the source module outputs a source signal by retrieval from the library of a stored representation in the form of resynthesis coefficients representing the corresponding morphological category, input of the retrieved resynthesis coefficients to a resynthesis device, and output of the signal generated by the resynthesis device as the source signal, wherein the stored representations in said library are derived by inverse filtering real vocal sounds so as to subtract the articulatory effects imposed by the vocal tract, and stored representations corresponding to a particular morphological category are derived by averaging signals that are produced by inverse filtering a plurality of examples of vocal sounds embodying the morphological category. - View Dependent Claims (7, 8, 9, 10)
-
Specification