Method and apparatus for controlling a speech synthesis system to provide multiple styles of speech

US 6,810,378 B2
Filed: 09/24/2001
Issued: 10/26/2004
Est. Priority Date: 08/22/2001
Status: Expired due to Term

First Claim

Patent Images

1. A method for synthesizing a voice signal based on a predetermined voice control information stream, the voice signal selectively synthesized to have a particular prosodic style, the method comprising the steps of:

analyzing said predetermined voice control information stream to identify one or more portions thereof for prosody control;

selecting one or more prosody control templates based on the particular prosodic style selected for said voice signal synthesis;

applying said one or more selected prosody control templates to said one or more identified portions of said predetermined voice control information stream, thereby generating a stylized voice control information stream; and

synthesizing said voice signal based on said stylized voice control information stream so that said synthesized voice signal has said particular prosodic style, wherein said one or more prosody control templates comprise tag templates which are selected from a tag template database and wherein said step of applying said selected prosody control templates to said identified portions of said predetermined voice control information stream comprises the steps of;

expanding each of said tag templates into one or more tags;

converting said one or more tags into a time series of prosodic features; and

generating said stylized voice control information stream based on said time series of prosodic features.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for synthesizing speech from text whereby the speech may be generated in a manner so as to effectively convey a particular, selectable style. Repeated patterns of one or more prosodic features—such as, for example, pitch, amplitude, spectral tilt, and/or duration—occurring at characteristic locations in the synthesized speech, are advantageously used to convey a particular chosen style. For example, one or more of such feature patterns may be used to define a particular speaking style, and an illustrative text-to-speech system then makes use of such a defined style to adjust the specified parameter or parameters of the synthesized speech in a non-uniform manner (i.e., in accordance with the defined feature pattern or patterns).

102 Citations

View as Search Results

16 Claims

1. A method for synthesizing a voice signal based on a predetermined voice control information stream, the voice signal selectively synthesized to have a particular prosodic style, the method comprising the steps of:
- analyzing said predetermined voice control information stream to identify one or more portions thereof for prosody control;
  
  selecting one or more prosody control templates based on the particular prosodic style selected for said voice signal synthesis;
  
  applying said one or more selected prosody control templates to said one or more identified portions of said predetermined voice control information stream, thereby generating a stylized voice control information stream; and
  
  synthesizing said voice signal based on said stylized voice control information stream so that said synthesized voice signal has said particular prosodic style, wherein said one or more prosody control templates comprise tag templates which are selected from a tag template database and wherein said step of applying said selected prosody control templates to said identified portions of said predetermined voice control information stream comprises the steps of;
  
  expanding each of said tag templates into one or more tags;
  
  converting said one or more tags into a time series of prosodic features; and
  
  generating said stylized voice control information stream based on said time series of prosodic features.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 wherein said voice signal comprises a speech signal and wherein said predetermined voice control information stream comprises predetermined text.
  - 3. The method of claim 1 wherein said voice signal comprises a speech signal and wherein said predetermined voice control information stream comprises predetermined annotated text.
  - 4. The method of claim 1 wherein said voice signal comprises a singing voice signal and wherein said predetermined voice control information stream comprises a predetermined musical score.
  - 5. The method of claim 1 wherein said particular prosodic style is representative of a specific person.
  - 6. The method of claim 1 wherein said particular prosodic style is representative of a particular group of people.
  - 7. The method of claim 1 wherein said step of analyzing said predetermined voice control information stream comprises parsing said predetermined voice control information stream and extracting one or more features therefrom.
  - 8. The method of claim 1 further comprising the step of computing one or more phoneme durations, and wherein said step of synthesizing said voice signal is also based on said one or more phoneme durations.

9. An apparatus for synthesizing a voice signal based on a predetermined voice control information stream, the voice signal selectively synthesized to have a particular prosodic style, the apparatus comprising:
- means for analyzing said predetermined voice control information stream to identify one or more portions thereof for prosody control;
  
  means for selecting one or more prosody control templates based on the particular prosodic style selected for said voice signal synthesis;
  
  means for applying said one or more selected prosody control templates to said one or more identified portions of said predetermined voice control information stream, thereby generating a stylized voice control information stream; and
  
  means for synthesizing said voice signal based on said stylized voice control information stream so that said synthesized voice signal has said particular prosodic style, wherein said one or more prosody control templates comprise tag templates which are selected from a tag template database and wherein said means for applying said selected prosody control templates to said identified portions of said predetermined voice control information stream comprises;
  
  means for expanding each of said tag templates into one or more tags;
  
  means for converting said one or more tags into a time series of prosodic features; and
  
  means for generating said stylized voice control information stream based on said time series of prosodic features.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The apparatus of claim 9 wherein said voice signal comprises a speech signal and wherein said predetermined voice control information stream comprises predetermined text.
  - 11. The apparatus of claim 9 wherein said voice signal comprises a speech signal and wherein said predetermined voice control information stream comprises predetermined annotated text.
  - 12. The apparatus of claim 9 wherein said voice signal comprises a singing voice signal and wherein said predetermined voice control information stream comprises a predetermined musical score.
  - 13. The apparatus of claim 9 wherein said particular prosodic style is representative of a specific person.
  - 14. The apparatus of claim 9 wherein said particular prosodic style is representative of a particular group of people.
  - 15. The apparatus of claim 9 wherein said means for analyzing said predetermined voice control information stream comprises means for parsing said predetermined voice control information stream and means for extracting one or more features therefrom.
  - 16. The apparatus of claim 9 further comprising means for computing one or more phoneme durations, and wherein said means for synthesizing said voice signal is also based on said one or more phoneme durations.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Alcatel-Lucent USA, Inc. (Nokia Corporation)
Original Assignee
Lucent Technologies, Inc. (Nokia Corporation)
Inventors
Kochanski, Gregory P., Shih, Chi-Lin
Primary Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US09/961,923
Publication Number

US 20030078780A1
Time in Patent Office

1,128 Days
Field of Search

704/260
US Class Current

704/258
CPC Class Codes

G10L 13/10 Prosody rules derived from ...

Method and apparatus for controlling a speech synthesis system to provide multiple styles of speech

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

102 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for controlling a speech synthesis system to provide multiple styles of speech

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

102 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links