Back-end database reorganization for application-specific concatenative text-to-speech systems
First Claim
1. A method for updating a Concatenative Text-To-Speech (CTTS) system with a speech database from a base version to a new version of a given target application, comprising:
- identifying segments of recorded speech, comprising segments of natural speech;
dissecting the recorded speech into a plurality of synthesis units, wherein speech is synthesized by the CTTS system by a concatenation and modification of the synthesis units using a base speech database that comprises a base plurality of context classes derived from the base text;
determining a new text corpus subset not completely covered by the base speech database, wherein the new text corpus is associated with a target application;
creating new context classes for the target application based upon context classes derived from the base text; and
automatically adapting the base speech database for the target application using the new context classes.
8 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to computer-generated text-to-speech conversion. It relates in particular to a method and system for updating a Concatenative Text-To-Speech (CTTS) system with a speech database from a base version to a new version. The present invention performs an application-specific re-organization of a synthesizer'"'"'s speech database by means of certain decision tree modifications. By that reorganization, certain synthesis units are made available for the new application, which are not available in prior art without a new speech session. This allows the creation of application-specific synthesizers with improved output speech quality for arbitrary domains and applications at very low cost.
31 Citations
20 Claims
-
1. A method for updating a Concatenative Text-To-Speech (CTTS) system with a speech database from a base version to a new version of a given target application, comprising:
-
identifying segments of recorded speech, comprising segments of natural speech;
dissecting the recorded speech into a plurality of synthesis units, wherein speech is synthesized by the CTTS system by a concatenation and modification of the synthesis units using a base speech database that comprises a base plurality of context classes derived from the base text;
determining a new text corpus subset not completely covered by the base speech database, wherein the new text corpus is associated with a target application;
creating new context classes for the target application based upon context classes derived from the base text; and
automatically adapting the base speech database for the target application using the new context classes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method for adapting a concatenative text to speech database for a new application comprising:
-
identifying a decision tree including synthesis units of a concatenative text to speech system, wherein speech is generated at runtime for a first application based on traversing the identified decision tree, wherein the synthesis units are dissected units obtained from previously recorded speech based upon a recording of base text;
determining a target application that includes a new text corpus subset not completely covered by the identified decision tree; and
re-indexing the decision tree to generate a new decision tree for the target application that completely covers the new text corpus subset, wherein the re-indexing is generated automatically using at least one newly generated context class, and wherein the new decision tree is generated without human intervention and without requiring a new recording of speech for the new text corpus subset.
-
-
20. A method for updating a Concatenative Text-To-Speech (CTTS) system with a speech database from a base version to a new version of a given target application, wherein the CTTS system uses segments of natural speech stored in its original form which is obtained by recording a base text, wherein recorded speech is dissected into a plurality of synthesis units, wherein speech is synthesized by a concatenation and modification of said synthesis units, and wherein the base speech database comprises a base plurality of context classes derived from and thus matching said base text, said method being characterized by the steps of:
-
collecting CTTS-quality data during runtime of said CTTS system;
checking a predetermined CTTS-update condition; and
performing a speech database update procedure according to the following steps without human intervention when said predetermined CTTS update condition is met;
specifying a new text corpus subset not completely covered by the base speech database for a target application;
collecting acoustic context classes from the base speech database that are present in said target application;
removing acoustic context classes with speech segments that remain unused when the CTTS system is used for synthesizing new text of said target application, wherein said removal of unused acoustic context classes is implemented by pruning subtrees of a tree-like data structure;
creating new context classes from the removed context classes by splitting and merging subtrees; and
re-indexing the base speech database to reflect the newly created context classes and context classes of the base speech database not included in the removal step.
-
Specification