Method, apparatus and system for generating acoustic parameters in a text-to-speech system using a neural network
First Claim
Patent Images
1. A method for generating a series of acoustic descriptions in a text-to-speech system based upon a linguistic description of text comprising the steps of:
- a) generating an information vector for each segment description in the linguistic description, wherein the information vector includes a description of a sequence of segments surrounding a described segment;
b) using a neural network to generate a representation of a trajectory of acoustic parameters, said trajectory being associated with the described segment; and
c) generating the series of acoustic descriptions by computing points on the trajectory at identified instants, for each of a set of time periods making up the segment, the trajectory consists of each acoustic parameter in the space of acoustic parameters being equal to a polynomial function of time, wherein the polynomial functions are cubic functions, wherein the number of time periods making up the segment is two.
4 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a method, device and system to generate acoustic parameters in a text-to-speech system utilizing a neural network to generate a representation of a trajectory in an acoustic parameter space across a phonetic segment.
36 Citations
27 Claims
-
1. A method for generating a series of acoustic descriptions in a text-to-speech system based upon a linguistic description of text comprising the steps of:
-
a) generating an information vector for each segment description in the linguistic description, wherein the information vector includes a description of a sequence of segments surrounding a described segment;
b) using a neural network to generate a representation of a trajectory of acoustic parameters, said trajectory being associated with the described segment; and
c) generating the series of acoustic descriptions by computing points on the trajectory at identified instants, for each of a set of time periods making up the segment, the trajectory consists of each acoustic parameter in the space of acoustic parameters being equal to a polynomial function of time, wherein the polynomial functions are cubic functions, wherein the number of time periods making up the segment is two. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A device for generating a series of acoustic descriptions in a text-to-speech system based upon a linguistic description of text comprising:
-
a) a linguistic information preprocessor, operably coupled to receive the linguistic, to generate an information vector for each segment description in the linguistic description, wherein the information vector includes a description of a sequence of segments surrounding a described segment;
b) a neural network, operably coupled to the linguistic information processor, to generating a representation of a trajectory in a space of acoustic parameters, said trajectory being associated with the described segment; and
c) a trajectory computation unit, operably coupled to the neural network, to generate the series of acoustic descriptions by computing points on the trajectory at identified instants, for each of a set of time periods making up the segment, the trajectory consists of each acoustic parameter in the space of acoustic parameters being equal to a polynomial function of time, wherein the polynomial functions are cubic functions, wherein the number of time periods making up the segment is two. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A text-to-speech synthesizer to generate a series of acoustic descriptions in a text-to-speech system based upon a linguistic description of text comprising:
-
a) a linguistic information preprocessor, operably coupled to receive the linguistic description, to generate an information vector for each segment description in the linguistic description, wherein the information vector includes a description of a sequence of segments surrounding a described segment;
b) a neural network, operably coupled to the linguistic information processor, to generate a representation of a trajectory in a space of acoustic parameters, said trajectory being associated with the described segment; and
c) a trajectory computation unit, operably coupled to the neural network, to generate the series of descriptions by computing points on the trajectory at identified instants, for each of a set of time periods making up the segment, the trajectory consists of each acoustic parameter in the space of acoustic parameters being equal to a polynomial function of time, wherein the polynomial functions are cubic functions, wherein the number of time periods making up the segment is two.
-
Specification