Text-to-speech system and method

US 20060041429A1
Filed: 08/10/2005
Published: 02/23/2006
Est. Priority Date: 08/11/2004
Status: Active Grant

First Claim

Patent Images

1. A Text-To-Speech system comprising:

means for storing a plurality of speech segments;

means for creating a plurality of phonetic transcriptions for each word of an input text; and

means coupled to the storing means and to the creating means for selecting preferred phonetic transcriptions by operating a cost function on the plurality of speech segments.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for generating synthetic speech, which operates in a computer implemented Text-To-Speech system. The system comprises at least a speaker database that has been previously created from user recordings, a Front-End system to receive an input text and a Text-To-Speech engine. The Front-End system generates multiple phonetic transcriptions for each word of the input text, and the TTS engine uses a cost function to select which phonetic transcription is the more appropriate for searching the speech segments within the speaker database to be concatenated and synthesized.

Citations

20 Claims

1. A Text-To-Speech system comprising:
- means for storing a plurality of speech segments;
  
  means for creating a plurality of phonetic transcriptions for each word of an input text; and
  
  means coupled to the storing means and to the creating means for selecting preferred phonetic transcriptions by operating a cost function on the plurality of speech segments.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system of claim 1, wherein the means for selecting preferred phonetic transcriptions comprises means for computing a cost score for each phonetic transcription of the plurality of phonetic transcriptions and means for sorting the plurality of phonetic transcriptions according to the computed cost scores.
  - 3. The system of claim 1, wherein the means for creating a plurality of phonetic transcriptions comprises rule-based means.
  - 4. The system of claim 1, wherein the means for creating a plurality of phonetic transcriptions comprises statistical means.
  - 5. The system of claim 1, wherein the means for creating a plurality of phonetic transcriptions further comprises means to normalize the input text.
  - 6. The system of claim 1, wherein the means for creating a plurality of phonetic transcriptions further comprises means to generate prosody parameters.
  - 7. The system of claim 6, wherein the prosody parameters are input to the means for selecting the preferred phonetic transcriptions.
  - 8. The system of claim 1, wherein the means for selecting the preferred phonetic transcriptions further comprises means for selecting preferred speech segments associated to the preferred phonetic transcriptions.
  - 9. The system of claim 8, further comprising concatenation means to concatene the preferred speech segments.
  - 10. The system of claim 9, further comprising means coupled to the concatenation means to output synthetic speech from the concatenated speech segments.

11. A method for selecting preferred phonetic transcriptions of an input text in a Text-To-Speech system, the method comprising the steps of:
- storing a plurality of speech segments;
  
  creating a plurality of phonetic transcriptions for each word of an input text;
  
  computing a cost score for each phonetic transcription by operating a cost function on the plurality of speech segments; and
  
  sorting the plurality of phonetic transcriptions according to the computed cost scores.
- View Dependent Claims (12, 13, 14, 15, 16)
- - 12. The method of claim 11, further comprising the step of normalizing the input text before creating the plurality of phonetic transcriptions.
  - 13. The method of claim 11, further comprising the step of generating prosody parameters after the step of creating a plurality of phonetic transcriptions.
  - 14. The method of claim 11, further comprising the step of selecting preferred speech segments after the step of sorting the plurality of phonetic transcriptions.
  - 15. The method of claim 14, further comprising the step of concatenating the preferred speech segments.
  - 16. The method of claim 15, further comprising the step of outputting synthetic speech after the concatenating step.

17. A machine-readable storage having stored thereon, a computer program having a plurality of code sections, said code sections executable by a machine for causing the machine to perform the steps of:
- storing a plurality of speech segments;
  
  creating a plurality of phonetic transcriptions for each word of an input text;
  
  computing a cost score for each phonetic transcription by operating a cost function on the plurality of speech segments; and
  
  sorting the plurality of phonetic transcriptions according to the computed cost scores.

18. The machine-readable storage computer system for generating synthetic speech comprising the step of:
- normalizing the input text before creating the plurality of phonetic transcriptions.

19. A computer system for generating synthetic speech comprising:
- (a) a speaker database to store speech segments;
  
  (b) a front-end interface to receive an input text made of a plurality of words;
  
  (c) an output interface to audibly output the synthetic speech; and
  
  (d) computer readable program means executable by the computer for performing actions, including;
  
  (i) creating a plurality of phonetic transcriptions for each word the input text;
  
  (ii) computing a cost score for each phonetic transcription by operating a cost function on the plurality of speech segments; and
  
  (iii) sorting the plurality of phonetic transcriptions according to the computed cost scores.
- View Dependent Claims (20)
- - 20. The system of claim 19 wherein the computer readable program means is embodied on a program storage device readable by a computer machine.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
International Business Machines Corporation
Inventors
Crepy, Hubert, Amato, Christel, Revelin, Stephane, Waast-Richard, Claire

Granted Patent

US 7,869,999 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/08 Text analysis or generation...

Text-to-speech system and method

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Text-to-speech system and method

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links