Training of text-to-speech systems
First Claim
Patent Images
1. A method of constructing a model for use in a text-to-speech synthesis system, said method comprising the steps of:
- obtaining a set of features and a first corresponding observation value from a first training speaker;
obtaining said set of features and a second corresponding observation value from a second training speaker; and
pooling said first and second corresponding observation values to obtain the model.
8 Assignments
0 Petitions
Accused Products
Abstract
Building a data-driven text-to-speech system involves collecting a database of natural speech from which to train models or select segments for concatenation. Typically the speech in that database is produced by a single speaker. In this invention we include in our database speech from a multiplicity of speakers.
-
Citations
18 Claims
-
1. A method of constructing a model for use in a text-to-speech synthesis system, said method comprising the steps of:
-
obtaining a set of features and a first corresponding observation value from a first training speaker;
obtaining said set of features and a second corresponding observation value from a second training speaker; and
pooling said first and second corresponding observation values to obtain the model.
-
-
2. A method of constructing a model for use in a text-to-speech synthesis system, said method comprising the steps of:
-
obtaining a set of features and a corresponding observation value from a first training speaker;
repeating said step of obtaining a set of features and a corresponding observation value for each of a plurality of additional speakers; and
pooling said corresponding observation values, from said first speaker and said additional speakers, to obtain the model.
-
-
3. A method for enrolling training data for a text-to-speech synthesis system, said method comprising the steps of:
-
collecting speech data from at least two speakers;
ascertaining at least one characteristic relating to the speech data of each speaker; and
creating a target range of speech data via transforming the at least one characteristic relating to the speech data of each speaker. - View Dependent Claims (4, 5, 6, 7, 8)
-
-
9. An apparatus for constructing a model for use in a text-to-speech synthesis system, said apparatus comprising:
-
an obtaining arrangement which obtains a set of features and a first corresponding observation value from a first training speaker;
said obtaining arrangement being adapted to obtain said set of features and a second corresponding observation value from a second training speaker; and
a pooling arrangement which pools said first and second corresponding observation values to obtain the model.
-
-
10. An apparatus for constructing a model for use in a text-to-speech synthesis system, said apparatus comprising:
-
an obtaining arrangement which obtains a set of features and a corresponding observation value from a first training speaker;
said obtaining arrangement being adapted to further obtain a set of features and a corresponding observation value for each of a plurality of additional speakers; and
a pooling arrangement which pools said corresponding observation values, from said first speaker and said additional speakers, to obtain the model.
-
-
11. An apparatus for enrolling training data for a text-to-speech synthesis system, said apparatus comprising:
-
a collector arrangement which collects speech data from at least two speakers;
an ascertaining arrangement which ascertains at least one characteristic relating to the speech data of each speaker; and
a target range creator which creates a target range of speech data via transforming the at least one characteristic relating to the speech data of each speaker. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for constructing a model for use in a text-to-speech synthesis system, said method comprising the steps of:
-
obtaining a set of features and a first corresponding observation value from a first training speaker;
obtaining said set of features and a second corresponding observation value from a second training speaker; and
pooling said first and second corresponding observation values to obtain the model.
-
-
18. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for enrolling training data for a text-to-speech synthesis system, said method comprising the steps of:
-
collecting speech data from at least two speakers;
ascertaining at least one characteristic relating to the speech data of each speaker; and
creating a target range of speech data via transforming the at least one characteristic relating to the speech data of each speaker.
-
Specification