Multilingual prosody generation
First Claim
1. A method performed by data processing apparatus, the method comprising:
- obtaining data indicating a set of linguistic features corresponding to a text;
providing (i) data indicating the linguistic features and (ii) data indicating the language of the text as input to a neural network that has been trained to provide output indicating prosody information for multiple languages, the neural network having been trained using speech in multiple languages;
receiving, from the neural network, output indicating prosody information for the linguistic features; and
generating audio data representing the text using the output of the neural network.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for multilingual prosody generation. In some implementations, data indicating a set of linguistic features corresponding to a text is obtained. Data indicating the linguistic features and data indicating the language of the text are provided as input to a neural network that has been trained to provide output indicating prosody information for multiple languages. The neural network can be a neural network having been trained using speech in multiple languages. Output indicating prosody information for the linguistic features is received from the neural network. Audio data representing the text is generated using the output of the neural network.
-
Citations
20 Claims
-
1. A method performed by data processing apparatus, the method comprising:
-
obtaining data indicating a set of linguistic features corresponding to a text; providing (i) data indicating the linguistic features and (ii) data indicating the language of the text as input to a neural network that has been trained to provide output indicating prosody information for multiple languages, the neural network having been trained using speech in multiple languages; receiving, from the neural network, output indicating prosody information for the linguistic features; and generating audio data representing the text using the output of the neural network. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; obtaining data indicating a set of linguistic features corresponding to a text; providing (i) data indicating the linguistic features and (ii) data indicating the language of the text as input to a neural network that has been trained to provide output indicating prosody information for multiple languages, the neural network having been trained using speech in multiple languages; receiving, from the neural network, output indicating prosody information for the linguistic features; and generating audio data representing the text using the output of the neural network. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
17. A computer-readable storage device storing a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
obtaining data indicating a set of linguistic features corresponding to a text; providing (i) data indicating the linguistic features and (ii) data indicating the language of the text as input to a neural network that has been trained to provide output indicating prosody information for multiple languages, the neural network having been trained using speech in multiple languages; receiving, from the neural network, output indicating prosody information for the linguistic features; and generating audio data representing the text using the output of the neural network. - View Dependent Claims (18, 19, 20)
-
Specification