Text selection and recording by feedback and adaptation for development of personalized text-to-speech systems
First Claim
1. A voice adaptation system for use with a text-to-speech synthesizer, comprising:
- a recorded snippet database having initial snippets;
a comparison snippets set based on speech from a new speaker;
wherein the comparison snippets are used to provide a comparison with current snippets, the comparison is based on evaluating voice quality; and
new speaker text for adapting the voice quality of the text-to-speech synthesizer, the new speaker text based on the comparison.
4 Assignments
0 Petitions
Accused Products
Abstract
A new speaker provides speech from which comparison snippets are extracted. The comparison snippets are compared with initial snippets stored in a recorded snippet database that is associated with a concatenative synthesizer. The comparison of the snippets to the initial snippets produces required sound units. A greedy selection algorithm is performed with the required sound units for identifying the smallest subset of the input text that contains all of the text for the new speaker to read. The new speaker then reads the optimally selected text and sound units are extracted from the human speech such that the recorded snippet database is modified and the speech synthesized adopts the voice quality and characteristics of the new speaker.
-
Citations
17 Claims
-
1. A voice adaptation system for use with a text-to-speech synthesizer, comprising:
-
a recorded snippet database having initial snippets;
a comparison snippets set based on speech from a new speaker;
wherein the comparison snippets are used to provide a comparison with current snippets, the comparison is based on evaluating voice quality; and
new speaker text for adapting the voice quality of the text-to-speech synthesizer, the new speaker text based on the comparison. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A voice adaptation system for use with a text-to speech synthesizer, comprising:
-
a recorded snippet database having initial snippets;
a comparison snippet set based on speech from a new speaker;
required sound units for forming new speaker text;
wherein the required sound units are generated from a comparison of the snippet set with the recorded snippet; and
text for adapting the recorded snippet database so that synthesized speech has a voice quality of the new speaker, the text provided by an optimal selection algorithm for selecting a limited amount of text representative of the required sound units. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A method for adapting the voice quality of a text-to-speech synthesizer having a recorded snippet database, comprising:
-
obtaining a comparison snippets set based on speech from a new speaker;
retrieving initial snippets from the recorded snippet database;
providing required sound units for generating text;
wherein the required sound units are based on a comparison of the initial snippets to the comparison snippet set; and
generating text for the new speaker to read, the text is a smallest subset that contains the required sound units. - View Dependent Claims (12, 13, 14, 15, 17)
-
-
16. A method of constructing a speech synthesizer comprising the steps of:
-
obtaining a corpus labeled recorded speech containing a plurality of allophones in a plurality of contexts;
performing greedy selection on said corpus to extract a portion of said plurality of allophones based on contextual information;
using said portion of said plurality of allophones to generate synthesis model components of a speech synthesizer.
-
Specification