Back-end database reorganization for application-specific concatenative text-to-speech systems
First Claim
1. A method for use in a Concatenative Text-To-Speech (CTTS) system that comprises a speech segment database comprising a plurality of speech segments organized in accordance with a context hierarchy, the method comprising:
- evaluating the context hierarchy by using new text and at least one processor, wherein the context hierarchy comprises a plurality of contexts determined using a base text, wherein each of the plurality of speech segments is associated with at least one of the plurality of contexts in the context hierarchy, and wherein the new text is different from the base text, wherein at least a portion of the new text has no corresponding acoustic data used during the evaluation;
updating the context hierarchy, based on results of evaluating the context hierarchy by using the new text, by merging two contexts in the context hierarchy to form a new coarser context and/or by splitting a context in the context hierarchy to form at least two new refined contexts; and
reorganizing the plurality of speech segments in accordance with the updated context hierarchy.
8 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to computer-generated text-to-speech conversion. It relates in particular to a method and system for updating a Concatenative Text-To-Speech (CTTS) system with a speech database from a base version to a new version. The present invention performs an application-specific re-organization of a synthesizer'"'"'s speech database by means of certain decision tree modifications. By that reorganization, certain synthesis units are made available for the new application, which are not available in prior art without a new speech session. This allows the creation of application-specific synthesizers with improved output speech quality for arbitrary domains and applications at very low cost.
13 Citations
25 Claims
-
1. A method for use in a Concatenative Text-To-Speech (CTTS) system that comprises a speech segment database comprising a plurality of speech segments organized in accordance with a context hierarchy, the method comprising:
-
evaluating the context hierarchy by using new text and at least one processor, wherein the context hierarchy comprises a plurality of contexts determined using a base text, wherein each of the plurality of speech segments is associated with at least one of the plurality of contexts in the context hierarchy, and wherein the new text is different from the base text, wherein at least a portion of the new text has no corresponding acoustic data used during the evaluation; updating the context hierarchy, based on results of evaluating the context hierarchy by using the new text, by merging two contexts in the context hierarchy to form a new coarser context and/or by splitting a context in the context hierarchy to form at least two new refined contexts; and reorganizing the plurality of speech segments in accordance with the updated context hierarchy. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A recording medium storing processor-executable instructions that, when executed by at least one processor, cause the at least one processor to perform a method for use in a Concatenative Text-To-Speech (CTTS) system that comprises a speech segment database comprising a plurality of speech segments organized in accordance with a context hierarchy, the method comprising:
-
evaluating the context hierarchy by using new text, wherein the context hierarchy comprises a plurality of contexts determined using a base text, wherein each of the plurality of speech segments is associated with at least one of the plurality of contexts in the context hierarchy, and wherein the new text is different from the base text, wherein at least a portion of the new text has no corresponding acoustic data used during the evaluation; updating the context hierarchy, based on results of evaluating the context hierarchy by using the new text, by merging two contexts in the context hierarchy to form a new coarser context and/or by splitting a context in the context hierarchy to form at least two new refined contexts; and reorganizing the plurality of speech segments in accordance with the updated context hierarchy. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A Concatenative Text-To-Speech (CTTS) system comprising:
-
at least one memory that stores a speech segment database comprising a plurality of speech segments organized in accordance with a context hierarchy; and at least one processor, coupled to the at least one memory, that; evaluates the context hierarchy by using new text, wherein the context hierarchy comprises a plurality of contexts determined using a base text, wherein each of the plurality of speech segments is associated with at least one of the plurality of contexts in the context hierarchy, and wherein the new text is different from the base text, wherein at least a portion of the new text has no corresponding acoustic data used during the evaluation; updates the context hierarchy, based on results of evaluating the context hierarchy by using the new text, by merging two contexts in the context hierarchy to form a new coarser context and/or by splitting a context in the context hierarchy to form at least two new refined contexts; and reorganizes the plurality of speech segments in accordance with the updated context hierarchy. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25)
-
Specification