Supporting a concatenative text-to-speech synthesis

US 20070011009A1
Filed: 07/08/2005
Published: 01/11/2007
Est. Priority Date: 07/08/2005
Status: Abandoned Application

First Claim

Patent Images

1. A method of generating a speech database as a basis for a concatenative text-to-speech synthesis, said method comprising:

performing a speech processing including a segmental parametric speech encoding of speech data based on a parametric modeling of speech and resulting in compressed parameterized speech segments; and

assembling said compressed parameterized speech segments in a speech database.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention relates to a support of a concatenative TTS synthesis. In order to generate a speech database as a basis for the TTS synthesis, first, a speech processing including a segmental parametric speech encoding of speech data based on a parametric modeling of speech is performed, which results in compressed parameterized speech segments. Then, the compressed parameterized speech segments are assembled in a speech database. In order to synthesize output speech, compressed parameterized speech segments are selected from the speech database based on an available text and decompressed to regain parameterized speech segments. The parameterized speech segments are then concatenated in a parameter domain. The output speech is synthesized based on these concatenated parametric speech segments.

Citations

25 Claims

1. A method of generating a speech database as a basis for a concatenative text-to-speech synthesis, said method comprising:
- performing a speech processing including a segmental parametric speech encoding of speech data based on a parametric modeling of speech and resulting in compressed parameterized speech segments; and
  
  assembling said compressed parameterized speech segments in a speech database.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method according to claim 1, wherein said parametric modeling of speech is one of a sinusoidal modeling and a waveform interpolation modeling.
  - 3. The method according to claim 1, wherein said segmental parametric speech encoding is a very low bit rate encoding.
  - 4. The method according to claim 1, wherein said speech processing is performed by an encoder that is retrained for said speech processing based on said speech data.
  - 5. The method according to claim 1, wherein said speech processing includes a compression performed on non-continuous parameterized speech segments and a natural acoustic context for the respective parameterized speech segments.
  - 6. The method according to claim 1, wherein said speech processing includes a compression performed on continuous speech data.
  - 7. The method according to claim 1, wherein said compressed parameterized speech segments are distributed in said speech database to speech units, and wherein said assembling of compressed speech segments in a speech database comprises grouping said speech units by speech sounds in said speech database.
  - 8. The method according to claim 1, wherein said compressed parameterized speech segments are distributed in said speech database to speech units, and wherein said assembling of compressed speech segments in a speech database comprises assembling said speech units by sentences in said speech database.

9. A database generator for generating a speech database as a basis for a concatenative text-to-speech synthesis, said database generator comprising:
- processing means adapted to perform a speech processing including a segmental parametric speech encoding of speech data based on a parametric modeling of speech and resulting in compressed parameterized speech segments; and
  
  processing means adapted to assemble said compressed parameterized speech segments in a speech database.
- View Dependent Claims (10)
- - 10. An electronic device comprising the database generator of claim 9.

11. A software program product in which a software code for generating a speech database as a basis for a concatenative text-to-speech synthesis is stored, said software code realizing the following steps when being executed in a processing unit of an electronic device:
- performing a speech processing including a segmental parametric speech encoding of speech data based on a parametric modeling of speech and resulting in compressed parameterized speech segments; and
  
  assembling said compressed parameterized speech segments in a speech database.

12. A method enabling a concatenative text-to-speech synthesis based on a speech database comprising compressed parameterized speech segments obtained in a speech processing, said speech processing including a segmental parametric speech encoding of speech data using a parametric modeling of speech, said method comprising:
- selecting compressed parameterized speech segments from said speech database based on an available text;
  
  decompressing said selected compressed parameterized speech segments to regain parameterized speech segments;
  
  concatenating said parameterized speech segments in a parameter domain; and
  
  synthesizing output speech based on said concatenated parametric speech segments.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 13. The method according to claim 12, wherein said parametric modeling of speech is one of a sinusoidal modeling and a waveform interpolation modeling.
  - 14. The method according to claim 12, wherein said compressed parameterized speech segments are distributed in said speech database to compressed speech units, and wherein selecting compressed parameterized speech segments from said speech database comprises evaluating parameters of said speech units as a basis for said selection.
  - 15. The method according to claim 12, wherein said compressed parameterized speech segments are distributed in said speech database to compressed speech units, and wherein selected compressed parameterized speech segments are retrieved from said speech database for decompression based at least partly on information in said speech units.
  - 16. The method according to claim 12, comprising a further processing of said parameterized speech segments in said parameter domain.
  - 17. The method according to claim 16, wherein said further processing comprises deleting unnecessary parts of said parameterized speech segments.
  - 18. The method according to claim 16, wherein said further processing comprises smoothing parameters at concatenation boundaries between respectively two parameterized speech segments.
  - 19. The method according to claim 16, wherein said further processing comprises modifying voice characteristics of said parameterized speech segments.
  - 20. The method according to claim 12, wherein synthesizing said output speech is performed using a parametric speech codec.
  - 21. The method according to claim 12, wherein synthesizing said output speech is based on a very low bit rate decoding.

22. A text-to-speech synthesizer enabling a concatenative text-to-speech synthesis based on a speech database, said text-to-speech synthesizer comprising:
- a memory storing a speech database comprising compressed parameterized speech segments obtained in a speech processing, said speech processing including a segmental parametric speech encoding of speech data using a parametric modeling of speech;
  
  processing means adapted to select compressed parameterized speech segments from said speech database based on an available text;
  
  processing means adapted to decompress said selected compressed parameterized speech segments to regain parameterized speech segments;
  
  processing means adapted to concatenate said parameterized speech segments in a parameter domain; and
  
  processing means adapted to synthesize output speech based on said concatenated parametric speech segments.
- View Dependent Claims (23, 25)
- - 23. An electronic device comprising the text-to-speech synthesizer of claim 22.
  - 25. A system comprising the database generator of claim 9 and the text-to-speech synthesizer of claim 22.

24. A software program product in which a software code is stored on a readable medium, the software code for enabling a concatenative text-to-speech synthesis based on a speech database comprising compressed parameterized speech segments obtained in a speech processing, said speech processing including a segmental parametric speech encoding of speech data using a parametric modeling of speech, said software code realizing the following steps being executed in a processing unit of an electronic device:
- selecting compressed parameterized speech segments from said speech database based on an available text;
  
  decompressing said selected compressed parameterized speech segments to regain parameterized speech segments;
  
  concatenating said parameterized speech segments in a parameter domain; and
  
  synthesizing output speech based on said concatenated parametric speech segments.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nokia Technologies Oy (Nokia Corporation)
Original Assignee
Nokia Corporation
Inventors
Ramo, Anssi, Vainio, Janne, Himanen, Sakari, Nurminen, Jani

Application Number

US11/177,250
Publication Number

US 20070011009A1
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/06 Elementary speech units use...

Supporting a concatenative text-to-speech synthesis

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Supporting a concatenative text-to-speech synthesis

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links