Method and system for generating natural sounding concatenative synthetic speech
First Claim
Patent Images
1. A method for generating synthetic speech comprising the steps of:
- identifying a recording of conversational speech;
identifying a plurality of acoustic units from said recording, wherein each said acoustic unit includes at least one of a phoneme and a sub-phoneme;
extracting said acoustic units from said recording; and
, storing said acoustic units for use by a concatenative text-to-speech engine to generate synthetic speech.
8 Assignments
0 Petitions
Accused Products
Abstract
A method for generating synthetic speech can include identifying a recording of conversational speech and creating a transcription of the conversational speech. Using the transcription, rather than a predefined script, the recording can be analyzed and acoustic units extracted. Each acoustic unit can include a phoneme and/or a sub-phoneme. The acoustic units can be stored so that a concatenative text-to-speech engine can later splice the acoustic units together to produce synthetic speech.
26 Citations
21 Claims
-
1. A method for generating synthetic speech comprising the steps of:
-
identifying a recording of conversational speech;
identifying a plurality of acoustic units from said recording, wherein each said acoustic unit includes at least one of a phoneme and a sub-phoneme;
extracting said acoustic units from said recording; and
,storing said acoustic units for use by a concatenative text-to-speech engine to generate synthetic speech. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for synthetically generating speech comprising:
-
a training corpus containing at least one conversational speech recording and at least one associated transcription;
an acoustic unit store configured to store a plurality of acoustic units, wherein at least a portion of said acoustic units are generated from data contained within said training corpus, and wherein at least a portion of said acoustic units are derived from said conversational speech recording; and
,a concatenative text-to-speech engine configured to utilize said acoustic unit store to synthetically generate speech. - View Dependent Claims (11, 12)
-
-
13. A machine-readable storage having stored thereon, a computer program having a plurality of code sections, said code sections executable by a machine for causing the machine to perform the steps of:
-
identifying a recording of conversational speech;
identifying a plurality of acoustic units from said recording, wherein each said acoustic unit includes at least one of a phoneme and a sub-phoneme;
extracting said acoustic units from said recording; and
,storing said acoustic units for use by a concatenative text-to-speech engine to generate synthetic speech. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
Specification