Utilization of multiple voice sources in a speech synthesizer

US 5,704,007 A
Filed: 10/04/1996
Issued: 12/30/1997
Est. Priority Date: 03/11/1994
Status: Expired due to Term

First Claim

Patent Images

1. A synthetic text-to-speech generating method comprising:

generating a set of speech synthesizer control parameters representative of text to be spoken; and

converting the speech synthesizer control parameters into output wave forms representative of the synthetic speech to be spoken by selecting and combining at least two voice sources from a multiplicity of voice sources in a speech synthesizer to generate a combined voice source and by passing the combined voice sottree through an acoustic model of a human vocal tract.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Utilization of one or more voice sources in a speech synthesizer to provide improved synthetic speech. Having a speech synthesizer with the capability to select among and between a multiplicity of voice sources provides a higher quality and greater variety of possible synthetic speech sounds. This is particularly true when the multiplicity of voice sources are predetermined to have particular speech qualities and spectral content such as may be desired to convey emotional vocal content in synthetic speech.

Citations

21 Claims

1. A synthetic text-to-speech generating method comprising:
- generating a set of speech synthesizer control parameters representative of text to be spoken; and
  
  converting the speech synthesizer control parameters into output wave forms representative of the synthetic speech to be spoken by selecting and combining at least two voice sources from a multiplicity of voice sources in a speech synthesizer to generate a combined voice source and by passing the combined voice sottree through an acoustic model of a human vocal tract.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1 wherein said step of selecting is based upon which of the multiplicity of voice sources has spectral content which most closely matches that of the generated set of speech synthesizer control parameters.
  - 3. The method of claim 1, wherein said multiplicity of voice sources includes a normal voice source and a bright voice source.
  - 4. The method of claim 3, wherein said multiplicity of voice sources includes a glottal voice source.

5. An apparatus for generating synthetic text-to-speech, the apparatus comprising:
- means for generating a set of speech synthesizer control parameters representative of text to be spoken; and
  
  means for converting the speech synthesizer control parameters into output wave forms representative of the synthetic speech to be spoken by means for selecting and combining at least two voice sources from a multiplicity of voice sources in a speech synthesizer to generate a combined voice source and means for passing the combined voice source through an acoustic model of a human vocal tract.
- View Dependent Claims (6, 7, 8)
- - 6. The apparatus of claim 5 wherein said means for selecting is based upon which of the multiplicity of voice sources has spectral content which most closely matches that of the generated set of speech synthesizer control parameters.
  - 7. The apparatus of claim 5, wherein said multiplicity of voice sources includes a normal voice source and a bright voice source.
  - 8. The apparatus of claim 7, wherein said multiplicity of voice sources includes a glottal voice source.

9. A method of generating synthetic speech in a synthetic speech system comprising a speech synthesizer, said synthetic speech generating method comprising the steps of:
- a) providing a multiplicity of synthetic voice sources to said speech synthesizer;
  
  b) providing a set of speech synthesizer control parameters to said speech synthesizer;
  
  c) said speech synthesizer selecting at least two of said multiplicity of voice sources based upon said set of speech synthesizer control parameters;
  
  d) said speech synthesizer combining the Selected voice sources to generate a combined voice source; and
  
  e) generating said synthetic speech based upon said set of speech synthesizer control parameters and using said combined voice source.
- View Dependent Claims (10, 11, 12, 13)
- - 10. The synthetic speech generating method of claim 9 wherein said multiplicity of voice sources are predetermined to have desired spectral content.
  - 11. The synthetic speech generating method of claim 10 wherein said step of selecting at least two of said multiplicity of voice sources comprises selecting at least one voice source having spectral content which most closely matches that of the provided set of speech synthesizer control parameters.
  - 12. The method of claim 9, wherein said multiplicity of voice sources includes a normal voice source and a bright voice source.
  - 13. The method of claim 12, wherein said multiplicity of voice sources includes a glottal voice source.

14. A text-to-speech synthesizer system for generating a synthetic speech signal, the synthesizer system comprising:
- a phonetic translation of text to be spoken by the text-to-speech synthesizer system;
  
  a multiplicity of audio signals to be used as voice sources by the text-to-speech synthesizer system; and
  
  an acoustic model of a human vocal tract, the acoustic model selectively receiving as input at least two of the multiplicity of audio signals and the phonetic translation, the acoustic model acoustically modifying the received audio signals based upon the phonetic translation to generate a modified voice source, and the acoustic model outputting the modified voice source as the synthetic speech signal.

15. A parametric synthetic text-to-speech system comprising:
- a memory containing a multiplicity of digitally sampled voice sources and a set of text-to-speech parameters indicative of text to be spoken by the synthetic text-to-speech system;
  
  a filter network for modulating two or more of the multiplicity of voice sources in accordance with the set of text-to-speech parameters to generate a modulated voice source, the filter network modeling the acoustic aspects of the human vocal tract;
  
  a loudspeaker for generating a waveform of the synthetic speech utilizing the modulated voice source.

16. A text-to-speech synthesizer system for generating a synthetic speech signal, the synthesizer system comprising:
- a phonetic translation of text to be spoken by the text-to-speech synthesizer system;
  
  two audio signals to be used as voice sources by the text-to-speech synthesizer system;
  
  an acoustic model of a human vocal tract for receiving the two audio signals and the phonetic translation, combining and modifying the two audio signals based upon the phonetic translation, and outputting the combined and modified two audio signals as the synthetic speech signal.
- View Dependent Claims (17, 18, 19, 20, 21)
- - 17. The system of claim 16 wherein the two audio signals each has different spectral qualities.
  - 18. The system of claim 17 wherein the acoustic model uses proportionately more of one of the two audio signals than another of the two audio signals when combining the two audio signals.
  - 19. The system of claim 18 wherein the proportionate usage of the two audio signals by the acoustic model is variable.
  - 20. The system of claim 17, wherein a first of said audio signals has spectral qualities of a normal voice and a second of said audio signals has spectral qualities of a bright voice.
  - 21. The apparatus of claim 20, further including a third audio signal to be used as a voice source by the text-to-speech synthesizer system, said third audio signal having spectral qualities of a glottal voice.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Computer Incorporated (Apple Inc.)
Original Assignee
Apple Computer Incorporated (Apple Inc.)
Inventors
Cecys, Mark L.
Primary Examiner(s)
Hafiz, Tariq R.

Application Number

US08/727,845
Time in Patent Office

452 Days
Field of Search

395/2.1, 395/2.38, 395/2.67, 395/2.69-2.78
US Class Current

704/260
CPC Class Codes

G10L 13/04 Details of speech synthesis...

G10L 13/047 Architecture of speech synt...

Utilization of multiple voice sources in a speech synthesizer

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Utilization of multiple voice sources in a speech synthesizer

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links