Method and system for generating natural sounding concatenative synthetic speech

US 20040176957A1
Filed: 03/03/2003
Published: 09/09/2004
Est. Priority Date: 03/03/2003
Status: Active Grant

First Claim

Patent Images

1. A method for generating synthetic speech comprising the steps of:

identifying a recording of conversational speech;

identifying a plurality of acoustic units from said recording, wherein each said acoustic unit includes at least one of a phoneme and a sub-phoneme;

extracting said acoustic units from said recording; and

, storing said acoustic units for use by a concatenative text-to-speech engine to generate synthetic speech.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for generating synthetic speech can include identifying a recording of conversational speech and creating a transcription of the conversational speech. Using the transcription, rather than a predefined script, the recording can be analyzed and acoustic units extracted. Each acoustic unit can include a phoneme and/or a sub-phoneme. The acoustic units can be stored so that a concatenative text-to-speech engine can later splice the acoustic units together to produce synthetic speech.

26 Citations

View as Search Results

21 Claims

1. A method for generating synthetic speech comprising the steps of:
- identifying a recording of conversational speech;
  
  identifying a plurality of acoustic units from said recording, wherein each said acoustic unit includes at least one of a phoneme and a sub-phoneme;
  
  extracting said acoustic units from said recording; and
  
  , storing said acoustic units for use by a concatenative text-to-speech engine to generate synthetic speech.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, further comprising the steps of:
    - determining prosodic information from said recording; and
      
      , storing said prosodic information so that said prosodic information can be used by said text-to-speech engine when concatenating said acoustic units to form synthetic speech.
  - 3. The method of claim 2, further comprising the steps of:
    - generating a textual transcription from said recording, wherein said textual transcription is utilized in extracting said acoustic units, and wherein said textual transcription is utilized in determining said prosodic information.
  - 4. The method of claim 1, further comprising the step of:
    - generating synthetic speech using said concatenative text-to-speech engine using said acoustic units.
  - 5. The method of claim 1, wherein said identifying step further comprises the steps of:
    - receiving conversational speech generated by a speaker; and
      
      , recording at least a portion of said conversational speech as said recording.
  - 6. The method of claim 5, wherein said receiving step further comprises the steps of:
    - establishing an acoustic environment;
      
      disposing said speaker within said acoustic environment, wherein the signal-to-noise ratio of said recorded conversational speech to other ambient noise recorded in said acoustic environment is at least 10 decibels; and
      
      , prompting said speaker to produce free form speech.
  - 7. The method of claim 6, wherein said prompting step further comprises the step of:
    - establishing a conversation between said speaker and a second speaker.
  - 8. The method of claim 6, wherein said prompting step further comprises prompting said speaker using a prompting apparatus.
  - 9. The method of claim 6, wherein said signal-to-noise ratio is at least 30 decibels.

10. A system for synthetically generating speech comprising:
- a training corpus containing at least one conversational speech recording and at least one associated transcription;
  
  an acoustic unit store configured to store a plurality of acoustic units, wherein at least a portion of said acoustic units are generated from data contained within said training corpus, and wherein at least a portion of said acoustic units are derived from said conversational speech recording; and
  
  , a concatenative text-to-speech engine configured to utilize said acoustic unit store to synthetically generate speech.
- View Dependent Claims (11, 12)
- - 11. The system of claim 10, wherein said concatenative text-to-speech engine utilizes prosodic information extracted from said training corpus to synthetically generate speech.
  - 12. The system of claim 10, further comprising:
    - an acoustic environment within which conversational speech is recorded, wherein the signal-to-noise ratio of said recorded conversational speech to other ambient noise recorded in said acoustic environment is at least 10 decibels.

13. A machine-readable storage having stored thereon, a computer program having a plurality of code sections, said code sections executable by a machine for causing the machine to perform the steps of:
- identifying a recording of conversational speech;
  
  identifying a plurality of acoustic units from said recording, wherein each said acoustic unit includes at least one of a phoneme and a sub-phoneme;
  
  extracting said acoustic units from said recording; and
  
  , storing said acoustic units for use by a concatenative text-to-speech engine to generate synthetic speech.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
- - 14. The machine-readable storage of claim 13, further comprising the steps of:
    - determining prosodic information from said recording; and
      
      , storing said prosodic information so that said prosodic information can be used by said text-to-speech engine when concatenating said acoustic units to form synthetic speech.
  - 15. The machine-readable storage of claim 14, further comprising the steps of:
    - generating a textual transcription from said recording, wherein said textual transcription is utilized in extracting said acoustic units, and wherein said textual transcription is utilized in determining said prosodic information.
  - 16. The machine-readable storage of claim 13, further comprising the step of:
    - generating synthetic speech using said concatenative text-to-speech engine using said acoustic units.
  - 17. The machine-readable storage of claim 13, wherein said identifying step further comprises the steps of:
    - receiving conversational speech generated by a speaker; and
      
      , recording at least a portion of said conversational speech as said recording.
  - 18. The machine-readable storage of claim 17, wherein said receiving step further comprises the steps of:
    - establishing an acoustic environment;
      
      disposing said speaker within said acoustic environment, wherein the signal-to-noise ratio of said recorded conversational speech to other ambient noise recorded in said acoustic environment is at least 10 decibels; and
      
      , prompting said speaker to produce free form speech.
  - 19. The machine-readable storage of claim 18, wherein said prompting step further comprises the step of:
    - establishing a conversation between said speaker and a second speaker.
  - 20. The machine-readable storage of claim 18, wherein said prompting step further comprises prompting said speaker using a prompting apparatus.
  - 21. The machine-readable storage of claim 18, wherein said signal-to-noise ratio is at least 30 decibels.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
International Business Machines Corporation
Inventors
Reich, David E.

Granted Patent

US 7,308,407 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/07 Concatenation rules

G10L 15/02 Feature extraction for spee...

Method and system for generating natural sounding concatenative synthetic speech

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

26 Citations

21 Claims

Specification

Use Cases

Quick Links

Others

Method and system for generating natural sounding concatenative synthetic speech

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

26 Citations

21 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others