DEEP NETWORKS FOR UNIT SELECTION SPEECH SYNTHESIS
First Claim
1. A method comprising:
- receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features;
determining a distance between the target acoustic features and acoustic features of a stored acoustic sample;
selecting the acoustic sample to be used in speech synthesis based at least on the determined distance; and
synthesizing speech based on the selected acoustic sample.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for providing a representation based on structured data in resources. The methods, systems, and apparatus include actions of receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features. Additional actions include determining a distance between the target acoustic features and acoustic features of a stored acoustic sample. Further actions include selecting the acoustic sample to be used in speech synthesis based at least on the determined distance and synthesizing speech based on the selected acoustic sample.
206 Citations
20 Claims
-
1. A method comprising:
-
receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features; determining a distance between the target acoustic features and acoustic features of a stored acoustic sample; selecting the acoustic sample to be used in speech synthesis based at least on the determined distance; and synthesizing speech based on the selected acoustic sample. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features; determining a distance between the target acoustic features and acoustic features of a stored acoustic sample; selecting the acoustic sample to be used in speech synthesis based at least on the determined distance; and synthesizing speech based on the selected acoustic sample. - View Dependent Claims (10, 11, 12, 13, 14)
-
15. A computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
receiving target acoustic features output from a neural network that has been trained to predict acoustic features given linguistic features; determining a distance between the target acoustic features and acoustic features of a stored acoustic sample; selecting the acoustic sample to be used in speech synthesis based at least on the determined distance; and synthesizing speech based on the selected acoustic sample. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification