Method and apparatus for automatic generation of vocal emotion in a synthetic text-to-speech system

US 5,860,064 A
Filed: 02/24/1997
Issued: 01/12/1999
Est. Priority Date: 05/13/1993
Status: Expired due to Term

First Claim

Patent Images

1. A method for automatic application of vocal emotion to previously entered text to be outputted by a synthetic text-to-speech system, said method comprising:

selecting a portion of said previously entered text;

manipulating a visual appearance of the selected text to selectively choose a vocal emotion to be applied to said selected text;

obtaining vocal emotion parameters associated with said selected vocal emotion; and

applying said obtained vocal emotion parameters to said selected text to be outputted by said synthetic text-to-speech system.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for the automatic application of vocal emotion parameters to text in a text-to-speech system. Predefining vocal parameters for various vocal emotions allows simple selection and application of vocal emotions to text to be output from a text-to-speech system. Further, the present invention is capable of generating vocal emotion with the limited prosodic controls available in a concatenative synthesizer.

Citations

28 Claims

1. A method for automatic application of vocal emotion to previously entered text to be outputted by a synthetic text-to-speech system, said method comprising:
- selecting a portion of said previously entered text;
  
  manipulating a visual appearance of the selected text to selectively choose a vocal emotion to be applied to said selected text;
  
  obtaining vocal emotion parameters associated with said selected vocal emotion; and
  
  applying said obtained vocal emotion parameters to said selected text to be outputted by said synthetic text-to-speech system.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1 wherein said vocal emotion parameters comprise pitch mean, pitch range, volume and speaking rate.
  - 3. The method of claim 2 wherein said text-to-speech system is a concatenative system.
  - 4. The method of claim 3 wherein said vocal emotion is one of multiple vocal emotions available for selection.
  - 5. The method of claim 4 wherein said multiple vocal emotions comprises anger, happiness, curiosity, sadness, boredom, aggressiveness, tiredness and disinterest.

6. A method for providing vocal emotion to previously entered text in a concatenative synthetic text-to-speech system, said method comprising:
- selecting said previously entered text;
  
  manipulating a visual appearance of the selected text to select a vocal emotion from a set of vocal emotions;
  
  obtaining vocal emotion parameters predetermined to be associated with said selected vocal emotion, said vocal emotion parameters specifying pitch mean, pitch range, volume and speaking rate;
  
  applying said obtained vocal emotion parameters to said selected text; and
  
  synthesizing speech from the selected text.
- View Dependent Claims (7)
- - 7. The method of claim 6 wherein said set of vocal emotions comprises anger, happiness, curiosity, sadness, boredom, aggressiveness, tiredness and disinterest.

8. An apparatus for automatic application of vocal emotion parameters to previously entered text to be outputted by a synthetic text-to-speech system, said apparatus comprising:
- a display device for displaying said previously entered text;
  
  an input device for permitting a user to selectively manipulate a visual appearance of the entered text and thereby select a vocal emotion;
  
  memory for holding said vocal emotion parameters associated with said selected vocal emotion; and
  
  logic circuitry for obtaining said vocal emotion parameters associated with said selected vocal emotion from said memory and for applying said obtained vocal emotion parameters to the manipulated text to be outputted by said synthetic text-to-speech system.
- View Dependent Claims (9, 10, 11, 12)
- - 9. The apparatus of claim 8 wherein said vocal emotion parameters comprise pitch mean, pitch range, volume and speaking rate.
  - 10. The apparatus of claim 9 wherein said text-to-speech system is a concatenative system.
  - 11. The apparatus of claim 10 wherein said vocal emotion is one of multiple vocal emotions available for selection.
  - 12. The apparatus of claim 11 wherein said multiple vocal emotions comprises anger, happiness, curiosity, sadness, boredom, aggressiveness, tiredness and disinterest.

13. A method for converting text to speech that enables a user to interactively apply vocal parameters to user-selectable text, comprising the steps of:
- selecting a portion of visually displayed text;
  
  selectively manipulating the selected portion of text to modify a visual appearance of the selected portion of text and to modify certain vocal parameters associated with the selected portion of text; and
  
  applying the modified vocal parameters associated with the selected portion of text to synthesize speech from the modified text.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 14. The method of claim 13 further comprising the step of, in response to manipulation, generating corresponding vocal parameter control data for transfer, in conjunction with said text, to an electronic text-to-speech synthesizer.
  - 15. The method of claim 13 wherein said vocal parameters include a volume parameter, said control means include a volume handle and the step of responding includes, in response to said user vertically dragging said volume handle, the step of manipulating said volume parameter and modifying said selected portion of text to occupy a different amount of vertical space.
  - 16. The method of claim 15 wherein said step of manipulating modifies a text-height display characteristic.
  - 17. The method of claim 13 wherein the step of manipulation is performed by control means, said vocal parameters include a rate parameter, said control means include a rate handle and the step of responding includes, in response to said user horizontally dragging said rate handle, modifying said rate parameter and modifying said selected portion of text to occupy a different amount of horizontal space.
  - 18. The method of claim 17 wherein said step of manipulating modifies a text-width display characteristic.
  - 19. The method of claim 13 wherein said vocal parameters include a volume parameter and a rate parameter, said control means include a volume/rate handle and the step of manipulating includes, in response to said user vertically dragging said volume/rate handle, modifying said volume parameter and modifying said selected portion of text to occupy a different amount of vertical space, and, in response to said user horizontally dragging said volume/rate handle, modifying said rate parameter and modifying said selected portion of text to occupy a different amount of horizontal space.
  - 20. The method of claim 13 wherein said vocal parameters include volume, rate and pitch, each of said vocal parameters has a predetermined base value, and a plurality of predetermined combinations of said vocal parameters each defines a respective emotion grouping.
  - 21. The method of claim 20 wherein the step of manipulation is performed by control means, and said control means include a plurality of emotion controls which are each user activatable to select a corresponding one of said emotion groupings.
  - 22. The method of claim 21 wherein said emotion controls include a plurality of differently colored emotion buttons each indicating a different emotion.
  - 23. The method of claim 22 wherein said user selecting one of said emotion buttons selects one of said emotion groupings and correspondingly modifies a color characteristic of said selected portion of text.
  - 24. The method of claim 13 wherein said vocal parameters are specified as a variance from a predetermined base value.

25. A computer-readable storage medium storing program code for causing a computer to perform the steps of:
- permitting a user to select a portion of text;
  
  permitting a user to manipulate the selected text with a plurality of user-manipulatable control means;
  
  responding to each user-manipulation of one of said control means by modifying a plurality of corresponding vocal parameters of the selected text and modifying a displayed appearance of said portion of text; and
  
  synthesizing speech from the modified text.

26. A system for converting text to speech that enables a user to interactively apply vocal parameters to user-selectable text, comprising:
- means for a user to select a portion of text;
  
  a plurality of interactive user manipulatable means for controlling vocal parameters associated with the selected portion of text;
  
  means, responsive to said control means, for modifying a plurality of vocal parameters associated with the portion of text and for modifying a displayed appearance of said portion of text; and
  
  means for synthesizing speech from the modified text.

27. A method of converting text to speech, comprising:
- entering text;
  
  displaying a portion of the entered text;
  
  selecting a portion of the displayed text;
  
  manipulating an appearance of the selected text to selectively change a set of vocal emotion parameters associated with the selected text; and
  
  synthesizing speech having a vocal emotion from the manipulated portion of text;
  
  whereby the vocal emotion of the synthesized speech depends on the manner in which the appearance of the text is manipulated.
- View Dependent Claims (28)
- - 28. A method according to claim 27 wherein the step entering is followed immediately by the step of displaying.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Computer Incorporated (Apple Inc.)
Original Assignee
Apple Computer Incorporated (Apple Inc.)
Inventors
Henton, Caroline G.
Primary Examiner(s)
Dorvil, Richemond

Application Number

US08/805,893
Time in Patent Office

687 Days
Field of Search

395/2.09, 395/2.69, 395/2.79, 395/2.67, 704/260, 704/259, 704/270, 704/200, 704/266, 704/272, 704/276
US Class Current

704/260
CPC Class Codes

G10L 13/033 Voice editing, e.g. manipul...

G10L 13/04 Details of speech synthesis...

Method and apparatus for automatic generation of vocal emotion in a synthetic text-to-speech system

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for automatic generation of vocal emotion in a synthetic text-to-speech system

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links