Text selection and recording by feedback and adaptation for development of personalized text-to-speech systems
First Claim
1. A voice adaptation system for use with a text-to-speech synthesizer, comprising:
- a recorded snippet database having initial snippets;
a comparison snippets set based on speech from a new speaker, wherein the comparison snippets are used to provide a comparison with current snippets;
a comparison module for performing the comparison by comparing the acoustic proximity between each one of said initial snippets and each one of said comparison snippets; and
new speaker text for adapting the voice quality of the text-to-speech synthesizer, the new speaker text based on the comparison.
4 Assignments
0 Petitions
Accused Products
Abstract
A new speaker provides speech from which comparison snippets are extracted. The comparison snippets are compared with initial snippets stored in a recorded snippet database that is associated with a concatenative synthesizer. The comparison of the snippets to the initial snippets produces required sound units. A greedy selection algorithm is performed with the required sound units for identifying the smallest subset of the input text that contains all of the text for the new speaker to read. The new speaker then reads the optimally selected text and sound units are extracted from the human speech such that the recorded snippet database is modified and the speech synthesized adopts the voice quality and characteristics of the new speaker.
206 Citations
17 Claims
-
1. A voice adaptation system for use with a text-to-speech synthesizer, comprising:
-
a recorded snippet database having initial snippets;
a comparison snippets set based on speech from a new speaker, wherein the comparison snippets are used to provide a comparison with current snippets;
a comparison module for performing the comparison by comparing the acoustic proximity between each one of said initial snippets and each one of said comparison snippets; and
new speaker text for adapting the voice quality of the text-to-speech synthesizer, the new speaker text based on the comparison. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A voice adaptation system for use with a text-to speech synthesizer, comprising:
-
a recorded snippet database having initial snippets;
a comparison snippet set based on speech from a new speaker;
required sound units for forming new speaker text;
wherein the required sound units are generated from a comparison of the snippet set with the recorded snippet;
a comparison module for performing the comparison by comparing the acoustic proximity between each one of said initial snippets and each one of said comparison snippets; and
text for adapting the recorded snippet database so that synthesized speech has a voice quality of the new speaker, the text provided by an optimal selection algorithm for selecting a limited amount of text representative of the required sound units. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A method for adapting the voice quality of a text-to-speech synthesizer having a recorded snippet database, comprising:
-
obtaining a comparison snippets set based on speech from a new speaker;
retrieving initial snippets from the recorded snippet database;
providing required sound units for generating text;
a comparison module for determining the required sound units by comparing the acoustic proximity of each one of said initial snippets and each one of said comparison snippets; and
generating text for the new speaker to read, the text is a smallest subset that contains the required sound units. - View Dependent Claims (12, 13, 14, 15)
obtaining new speech from the new speaker, the new speech based on the text;
extracting new snippets from the new speech; and
modifying the recorded snippet database with the new snippets.
-
-
15. The method of claim 14 wherein the initial snippets are based on text optimally selected to represent sound units.
-
16. A method of constructing a speech synthesizer comprising the steps of:
-
comparing the acoustic proximity between each one of a set of initial snippets and each one of a set of comparison snippets to generate a corpus labeled recorded speech;
obtaining the corpus labeled recorded speech containing a plurality of allophones in a plurality of contexts;
performing a greedy selection on said corpus to extract a portion of said plurality of allophones based on contextual information;
using said portion of said plurality of allophones to generate synthesis model components of a speech synthesizer. - View Dependent Claims (17)
-
Specification