STYLIZED PROSODY FOR SPEECH SYNTHESIS-BASED APPLICATIONS

US 20100066742A1
Filed: 09/18/2008
Published: 03/18/2010
Est. Priority Date: 09/18/2008
Status: Abandoned Application

First Claim

Patent Images

1. In a computing environment, a method comprising, outputting a visual representation including a set of one or more waveforms and corresponding text, and changing prosody of the speech based on interaction with the visual representation to change data corresponding to the prosody.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Described is a technology by which the prosody of synthesized speech may be changed by varying data associated with that speech. An interface displays a visual representation of synthesized speech as one or more waveforms, along with the corresponding text from which the speech was synthesized. The user may interact with the visual representation to change data corresponding to the prosody, e.g., to change duration, pitch and/or loudness data, with respect to a part (or all) of the speech. The part of the speech that may be varied may comprise a phoneme, a morpheme, a syllable, a word, a phrase, and/or a sentence. The changed speech can be played back to hear the change in prosody resulting from the interactive changes. The user can also change the text and hear/see newly synthesized speech, which may then be similarly edited to change data that corresponds to that speech'"'"'s prosody.

Citations

20 Claims

1. In a computing environment, a method comprising, outputting a visual representation including a set of one or more waveforms and corresponding text, and changing prosody of the speech based on interaction with the visual representation to change data corresponding to the prosody.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 wherein changing the prosody of the speech comprises changing the data corresponding to a phoneme, a morpheme, a syllable, a word, a phrase, or a sentence, or any combination of a phoneme, a morpheme, a syllable, a word, a phrase, or a sentence.
  - 3. The method of claim 1 wherein changing the prosody of the speech comprises changing the data corresponding to duration, pitch or loudness, or any combination of duration, pitch or loudness, with respect to at least one part of the speech.
  - 4. The method of claim 2 wherein changing the prosody of the speech comprises changing the data corresponding to the duration, pitch or loudness, or any combination of duration, pitch or loudness, of a phoneme, a morpheme, a syllable, a word, a phrase, or a sentence, or any combination of a phoneme, a morpheme, a syllable, a word, a phrase, or a sentence.
  - 5. The method of claim 1 further comprising, playing back at least part of the speech after changing the data corresponding to the prosody.
  - 6. The method of claim 1 further comprising, receiving the text, and generating speech from the text.
  - 7. The method of claim 6 further comprising, receiving changed text, and generating new speech from the changed text.
  - 8. The method of claim 6 further comprising, receiving changed text, and automatically changing the prosody in response to receiving the changed text.

9. In a computing environment, a system comprising, a speech synthesis mechanism that outputs speech from text, and an interface coupled to the speech synthesis mechanism, the interface configured to output a visual representation including a set of one or more waveforms and corresponding text, and to receive input, including input that changes data corresponding to prosody of the speech.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9 wherein the speech synthesis mechanism is based upon a Hidden Markov Model system.
  - 11. The system of claim 9 wherein the data corresponding to prosody of the speech comprises duration-related data, pitch-related data or loudness related data, or any combination of duration-related data, pitch-related data or loudness related data, and wherein the interface provides interaction to change the prosody of a phoneme, a morpheme, a syllable, a word, a phrase, or a sentence, or any combination of a phoneme, a morpheme, a syllable, a word, a phrase, or a sentence.
  - 12. The system of claim 9 wherein the data corresponding to prosody of the speech comprises duration-related data, wherein the interface displays the duration-related data corresponding to parts of the speech, and wherein the interface allows interaction with the duration-related data to independently vary the duration of at least one part of the speech to change the prosody.
  - 13. The system of claim 9 wherein the data corresponding to prosody of the speech comprises pitch-related data, wherein the interface displays the pitch-related data corresponding to parts of the speech, and wherein the interface allows interaction with the pitch-related data to independently vary the pitch of at least one part of the speech to change the prosody.
  - 14. The system of claim 9 wherein the data corresponding to prosody of the speech comprises loudness-related data, wherein the interface displays the loudness-related data corresponding to parts of the speech, and wherein the interface allows interaction with the loudness-related data to independently vary the loudness of separate parts of the speech to change the prosody.
  - 15. The system of claim 9 wherein the interface displays loudness-related data corresponding to a set of speech, and wherein the interface allows interaction with the loudness-related data to vary the loudness of the corresponding speech.
  - 16. The system of claim 9 wherein the interface provides interaction to change the prosody of a phoneme, a morpheme, a syllable, a word, a phrase, or a sentence, or any combination of a phoneme, a morpheme, a syllable, a word, a phrase, or a sentence.

17. One or more computer-readable media having computer-executable instructions, which when executed perform steps, comprising:
- outputting a visible representation of speech and corresponding text;
  
  receiving user interaction corresponding to at least part of the speech; and
  
  changing data corresponding to prosody associated with the speech based on the user interaction.
- View Dependent Claims (18, 19, 20)
- - 18. The one or more computer-readable media of claim 17 wherein changing the data corresponding to prosody associated with the speech comprises changing duration, pitch or loudness, or any combination of duration, pitch or loudness, with respect to at least one part of the speech.
  - 19. The one or more computer-readable media of claim 17 wherein changing the data corresponding to prosody associated with the speech comprises changing data corresponding to a phoneme, a morpheme, a syllable, a word, a phrase, or a sentence, or any combination of a phoneme, a morpheme, a syllable, a word, a phrase, or a sentence.
  - 20. The one or more computer-readable media of claim 17 having further computer-executable instructions comprising, playing back changed speech corresponding to the speech after changing the data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Soong, Frank Kao-ping, Qian, Yao

Application Number

US12/212,651
Publication Number

US 20100066742A1
Time in Patent Office

Days
Field of Search
US Class Current

345/440.100
CPC Class Codes

G10L 13/10 Prosody rules derived from ...

STYLIZED PROSODY FOR SPEECH SYNTHESIS-BASED APPLICATIONS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

STYLIZED PROSODY FOR SPEECH SYNTHESIS-BASED APPLICATIONS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links