Back-end database reorganization for application-specific concatenative text-to-speech systems

US 20060287861A1
Filed: 05/02/2006
Published: 12/21/2006
Est. Priority Date: 06/21/2005
Status: Active Grant

First Claim

Patent Images

1. A method for updating a Concatenative Text-To-Speech (CTTS) system with a speech database from a base version to a new version of a given target application, comprising:

identifying segments of recorded speech, comprising segments of natural speech;

dissecting the recorded speech into a plurality of synthesis units, wherein speech is synthesized by the CTTS system by a concatenation and modification of the synthesis units using a base speech database that comprises a base plurality of context classes derived from the base text;

determining a new text corpus subset not completely covered by the base speech database, wherein the new text corpus is associated with a target application;

creating new context classes for the target application based upon context classes derived from the base text; and

automatically adapting the base speech database for the target application using the new context classes.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to computer-generated text-to-speech conversion. It relates in particular to a method and system for updating a Concatenative Text-To-Speech (CTTS) system with a speech database from a base version to a new version. The present invention performs an application-specific re-organization of a synthesizer'"'"'s speech database by means of certain decision tree modifications. By that reorganization, certain synthesis units are made available for the new application, which are not available in prior art without a new speech session. This allows the creation of application-specific synthesizers with improved output speech quality for arbitrary domains and applications at very low cost.

31 Citations

View as Search Results

20 Claims

1. A method for updating a Concatenative Text-To-Speech (CTTS) system with a speech database from a base version to a new version of a given target application, comprising:
- identifying segments of recorded speech, comprising segments of natural speech;
  
  dissecting the recorded speech into a plurality of synthesis units, wherein speech is synthesized by the CTTS system by a concatenation and modification of the synthesis units using a base speech database that comprises a base plurality of context classes derived from the base text;
  
  determining a new text corpus subset not completely covered by the base speech database, wherein the new text corpus is associated with a target application;
  
  creating new context classes for the target application based upon context classes derived from the base text; and
  
  automatically adapting the base speech database for the target application using the new context classes.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
- - 2. The method of claim 1, further comprising:
    - collecting context classes from the base speech database that are present in the target application, wherein the adapting step uses the collected context classes and the new context classes.
  - 3. The method of claim 1, further comprising:
    - determining context classes from the base speech database that are unused in the target application; and
      
      excluding the determined context classes from the adapted base speech database.
  - 4. The method of claim 1, wherein the context classes derived from the base text and the new context classes comprise acoustic context classes.
  - 5. The method of claim 1, wherein the context classes derived from the base text and the new context classes comprise prosodic context classes.
  - 6. The method of claim 1, wherein the adapting step automatically occurs without obtaining new segments of natural speech for the new text corpus subset.
  - 7. The method of claim 1, wherein the base speech database and the adapted base speech database utilize decision trees that are traversed at runtime to generate synthesized speech.
  - 8. The method of claim 7, wherein the adapted base speech database is formed by re-indexing the synthesized units to form a new decision tree associated with the adapted base speech database that includes traversal pathways for the new text corpus subset.
  - 9. The method of claim 1, wherein speech segments are organized in a clustered hierarchy of subsets of speech segments.
  - 10. The method of claim 9, wherein said hierarchy is implemented in a tree-like data structure.
  - 11. The method of claim 10, wherein the creating step splits and merges subtrees of the tree-like data structure.
  - 12. The method of claim 10, further comprising:
    - pruning subtrees of the tree-like data structure during the adapting step to remove contexts from the base speech database that are unused by the target application.
  - 13. The method of claim 10, wherein the adapting step creates new acoustic context tree leafs for the adapted base speech database, said method further comprising:
    - prioritizing between speech segments present within the new acoustic context tree leafs using a weighting function specific to the target application.
  - 14. The method of claim 1, further comprising:
    - establishing a CTTS update condition;
      
      checking for the update condition at runtime of the CTTS system; and
      
      performing the determining, creating, and the adapting steps responsive to a detection of the update condition, wherein the performing step occurs automatically without human intervention.
  - 15. The method of claim 1, further comprising:
    - providing a plurality of portlets, each producing voice output; and
      
      performing the determining, creating, and adapting steps for each of the portlets, wherein each of the portlets is the target application for a corresponding adapted base speech database.
  - 16. The method of claim 1, wherein said steps of claim 1 are performed by at least one machine in accordance with at least one computer program having a plurality of code sections that are executable by the at least one machine.
  - 17. The method of claim 1, further comprising:
    - identifying a computer usable medium comprising computer readable programs, wherein the computer readable programs cause a machine to perform the steps of claim 1.
  - 18. The method of claim 1, wherein the steps of claim 1 are performed by a synthesis engine of the CTTS system in accordance with machine readable instructions contained within a computer readable medium.

19. A method for adapting a concatenative text to speech database for a new application comprising:
- identifying a decision tree including synthesis units of a concatenative text to speech system, wherein speech is generated at runtime for a first application based on traversing the identified decision tree, wherein the synthesis units are dissected units obtained from previously recorded speech based upon a recording of base text;
  
  determining a target application that includes a new text corpus subset not completely covered by the identified decision tree; and
  
  re-indexing the decision tree to generate a new decision tree for the target application that completely covers the new text corpus subset, wherein the re-indexing is generated automatically using at least one newly generated context class, and wherein the new decision tree is generated without human intervention and without requiring a new recording of speech for the new text corpus subset.

20. A method for updating a Concatenative Text-To-Speech (CTTS) system with a speech database from a base version to a new version of a given target application, wherein the CTTS system uses segments of natural speech stored in its original form which is obtained by recording a base text, wherein recorded speech is dissected into a plurality of synthesis units, wherein speech is synthesized by a concatenation and modification of said synthesis units, and wherein the base speech database comprises a base plurality of context classes derived from and thus matching said base text, said method being characterized by the steps of:
- collecting CTTS-quality data during runtime of said CTTS system;
  
  checking a predetermined CTTS-update condition; and
  
  performing a speech database update procedure according to the following steps without human intervention when said predetermined CTTS update condition is met;
  
  specifying a new text corpus subset not completely covered by the base speech database for a target application;
  
  collecting acoustic context classes from the base speech database that are present in said target application;
  
  removing acoustic context classes with speech segments that remain unused when the CTTS system is used for synthesizing new text of said target application, wherein said removal of unused acoustic context classes is implemented by pruning subtrees of a tree-like data structure;
  
  creating new context classes from the removed context classes by splitting and merging subtrees; and
  
  re-indexing the base speech database to reflect the newly created context classes and context classes of the base speech database not included in the removal step.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
International Business Machines Corporation
Inventors
Fischer, Volker, Kunzmann, Siegfried

Granted Patent

US 8,412,528 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/06 Elementary speech units use...

Back-end database reorganization for application-specific concatenative text-to-speech systems

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

31 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Back-end database reorganization for application-specific concatenative text-to-speech systems

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

31 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links