Method, device and system for using statistical information to reduce computation and memory requirements of a neural network based speech synthesis system
First Claim
1. A method for providing, in response to linguistic information that includes a sequence of segment descriptions each of which includes a phonetic segment type and duration, efficient generation of a refined parametric representation of speech for providing synthetic speech, comprising the steps of:
- A) using a data selection module to retrieve representative parameter vectors for each segment description according to at least the phonetic segment type and phonetic segment types included in adjacent segment descriptions;
B) interpolating between the representative parameter vectors according to the segment descriptions to provide interpolated statistical parameters;
C) converting the interpolated statistical parameters and linguistic information to neural network input parameters;
D) utilizing a neural network with a post-processor to convert the neural network input parameters into neural network output parameters that correspond to a parametric representation of speech and converting the neural network output parameters to a refined parametric representation of speech, wherein the refined parametric representation of speech can be used to provide synthetic speech.
4 Assignments
0 Petitions
Accused Products
Abstract
A method (400), device and system (300) provide, in response to linguistic information, efficient generation of a parametric representation of speech using a neural network. The method provides, in response to linguistic information efficient generation of a refined parametric representation of speech, comprising the steps of: A) using a data selection module to retrieve representative parameter vectors for each segment description according to the phonetic segment type and the phonetic segment types included in adjacent segment descriptions; B) interpolating between the representative parameter vectors according to the segment descriptions and duration to provide interpolated statistical parameters; C) converting the interpolated statistical parameters and linguistic information to neural network input parameters; D) utilizing a statistically enhanced neural network/neural network with post-processor to provide neural network output parameters that correspond to a parametric representation of speech; and converting the neural network output parameters to a refined parametric representation of speech.
-
Citations
90 Claims
-
1. A method for providing, in response to linguistic information that includes a sequence of segment descriptions each of which includes a phonetic segment type and duration, efficient generation of a refined parametric representation of speech for providing synthetic speech, comprising the steps of:
-
A) using a data selection module to retrieve representative parameter vectors for each segment description according to at least the phonetic segment type and phonetic segment types included in adjacent segment descriptions; B) interpolating between the representative parameter vectors according to the segment descriptions to provide interpolated statistical parameters; C) converting the interpolated statistical parameters and linguistic information to neural network input parameters; D) utilizing a neural network with a post-processor to convert the neural network input parameters into neural network output parameters that correspond to a parametric representation of speech and converting the neural network output parameters to a refined parametric representation of speech, wherein the refined parametric representation of speech can be used to provide synthetic speech. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A device for providing, in response to linguistic information that includes a sequence of segment descriptions each of which includes a phonetic segment type and a duration, efficient generation of a parametric representation of speech for providing synthetic speech, comprising:
-
A) a data selection module, coupled to receive the sequence of segment descriptions, that retrieves representative parameter vectors for each segment description according to at least the phonetic segment type and phonetic segment types included in adjacent segment descriptions; B) an interpolation module, coupled to receive the sequence of segment descriptions and the representative parameter vectors, that interpolates between the representative parameter vectors according to the segment descriptions to provide interpolated statistical parameters; C) a pre-processor, coupled to receive linguistic information and the interpolated statistical parameters that generates neural network input parameters; D) a neural network with post-processor, coupled to receive neural network input parameters, that converts the neural network input parameters to neural network output parameters corresponding to a parametric representation of speech and converts the neural network output parameters to a refined parametric representation of speech, wherein the refined parametric representation of speech can be used to provide synthetic speech. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60)
-
-
61. A text-to-speech system/speech synthesis system/dialog system having a device for providing, in response to linguistic information that includes a sequence of segment descriptions each of which includes a phonetic segment type and a duration, efficient generation of a parametric representation of speech for providing synthetic speech, the device comprising:
-
A) a data selection module, coupled to receive the sequence of segment descriptions, that retrieves representative parameter vectors for each segment description according to at least the phonetic segment type and phonetic segment types included in adjacent segment descriptions; B) an interpolation module, coupled to receive the sequence of segment descriptions and the representative parameter vectors, that interpolates between the representative parameter vectors according to the segment descriptions to provide interpolated statistical parameters; C) a pre-processor, coupled to receive linguistic information and the interpolated statistical parameters that generates neural network input parameters; D) a neural network with a post-processor, coupled to receive neural network input parameters, that converts the neural network input parameters to neural network output parameters that correspond to a parametric representation of speech; and
where selected, including a post-processor, coupled to receive the neural network output parameters that converts the neural network output parameters to a refined parametric representation of speech, wherein the refined parametric representation of speech can be used to provide synthetic speech. - View Dependent Claims (62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90)
-
Specification