METHOD AND APPARATUS FOR GENERATING SYNTHETIC SPEECH WITH CONTRASTIVE STRESS

US 20140025384A1
Filed: 09/24/2013
Published: 01/23/2014
Est. Priority Date: 02/12/2010
Status: Active Grant

First Claim

Patent Images

1. A method for providing speech output for a speech-enabled application, the method comprising:

receiving from the speech-enabled application a text input comprising a text transcription of a desired speech output;

generating, using at least one computer system, an audio speech output corresponding to at least a portion of the text input, the audio speech output comprising at least one portion carrying contrastive stress to contrast with at least one other portion of the audio speech output; and

providing the audio speech output for the speech-enabled application.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.

Citations

30 Claims

1. A method for providing speech output for a speech-enabled application, the method comprising:
- receiving from the speech-enabled application a text input comprising a text transcription of a desired speech output;
  
  generating, using at least one computer system, an audio speech output corresponding to at least a portion of the text input, the audio speech output comprising at least one portion carrying contrastive stress to contrast with at least one other portion of the audio speech output; and
  
  providing the audio speech output for the speech-enabled application.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the generating comprises:
    - identifying a plurality of tokens of the text input of a same text normalization type for which a contrastive stress pattern is to be applied;
      
      identifying at least one token of the plurality of tokens to be rendered with contrastive stress; and
      
      assigning contrastive stress to be carried by at least one portion of the audio speech output corresponding to at least one portion of the at least one token of the text input.
  - 3. The method of claim 2, wherein the same text normalization type is selected from the group consisting of:
    - an alphanumeric sequence type, an address type, a Boolean value type, a currency type, a date type, a digit sequence type, a fractional number type, a proper name type, a number type, an ordinal number type, a telephone number type, a flight number type, a state name type, a street name type, a street number type, a time type and a zipcode type.
  - 4. The method of claim 2, wherein the plurality of tokens are identified based at least in part on at least one indication in the text input that the contrastive stress pattern is desired in association with the plurality of tokens.
  - 5. The method of claim 4, wherein the at least one indication comprises at least one Speech Synthesis Markup Language tag.
  - 6. The method of claim 2, wherein identifying the plurality of tokens comprises:
    - tokenizing the text input;
      
      automatically identifying the text normalization type of the plurality of tokens; and
      
      automatically determining that the contrastive stress pattern is to be applied for the plurality of tokens.
  - 7. The method of claim 2, wherein the at least one token to be rendered with contrastive stress is identified based at least in part on an order of the plurality of tokens in the text input.

8-11. -11. (canceled)

12. Apparatus for providing speech output for a speech-enabled application, the apparatus comprising:
- a memory storing a plurality of processor-executable instructions; and
  
  at least one processor, operatively coupled to the memory, that executes the instructions to;
  
  receive from the speech-enabled application a text input comprising a text transcription of a desired speech output;
  
  generate an audio speech output corresponding to at least a portion of the text input, the audio speech output comprising at least one portion carrying contrastive stress to contrast with at least one other portion of the audio speech output; and
  
  provide the audio speech output for the speech-enabled application.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The apparatus of claim 12, wherein the at least one processor executes the instructions to generate the audio speech output at least in part by:
    - identifying a plurality of tokens of the text input of a same text normalization type for which a contrastive stress pattern is to be applied;
      
      identifying at least one token of the plurality of tokens to be rendered with contrastive stress; and
      
      assigning contrastive stress to be carried by at least one portion of the audio speech output corresponding to at least one portion of the at least one token of the text input.
  - 14. The apparatus of claim 13, wherein the same text normalization type is selected from the group consisting of:
    - an alphanumeric sequence type, an address type, a Boolean value type, a currency type, a date type, a digit sequence type, a fractional number type, a proper name type, a number type, an ordinal number type, a telephone number type, a flight number type, a state name type, a street name type, a street number type, a time type and a zipcode type.
  - 15. The apparatus of claim 13, wherein the at least one processor executes the instructions to identify the plurality of tokens based at least in part on at least one indication in the text input that the contrastive stress pattern is desired in association with the plurality of tokens.
  - 16. The apparatus of claim 15, wherein the at least one indication comprises at least one Speech Synthesis Markup Language tag.
  - 17. The apparatus of claim 13, wherein the at least one processor executes the instructions to identify the plurality of tokens at least in part by:
    - tokenizing the text input;
      
      automatically identifying the text normalization type of the plurality of tokens; and
      
      automatically determining that the contrastive stress pattern is to be applied for the plurality of tokens.

18-22. -22. (canceled)

23. At least one non-transitory computer-readable storage medium encoded with a plurality of computer-executable instructions that, when executed, perform a method for providing speech output for a speech-enabled application, the method comprising:
- receiving from the speech-enabled application a text input comprising a text transcription of a desired speech output;
  
  generating an audio speech output corresponding to at least a portion of the text input, the audio speech output comprising at least one portion carrying contrastive stress to contrast with at least one other portion of the audio speech output; and
  
  providing the audio speech output for the speech-enabled application.
- View Dependent Claims (24, 25, 26, 27, 28, 29)
- - 24. The at least one non-transitory computer-readable storage medium of claim 23, wherein the generating comprises:
    - identifying a plurality of tokens of the text input of a same text normalization type for which a contrastive stress pattern is to be applied;
      
      identifying at least one token of the plurality of tokens to be rendered with contrastive stress; and
      
      assigning contrastive stress to be carried by at least one portion of the audio speech output corresponding to at least one portion of the at least one token of the text input.
  - 25. The at least one non-transitory computer-readable storage medium of claim 24, wherein the same text normalization type is selected from the group consisting of:
    - an alphanumeric sequence type, an address type, a Boolean value type, a currency type, a date type, a digit sequence type, a fractional number type, a proper name type, a number type, an ordinal number type, a telephone number type, a flight number type, a state name type, a street name type, a street number type, a time type and a zipcode type.
  - 26. The at least one non-transitory computer-readable storage medium of claim 24, wherein the plurality of tokens are identified based at least in part on at least one indication in the text input that the contrastive stress pattern is desired in association with the plurality of tokens.
  - 27. The at least one non-transitory computer-readable storage medium of claim 26, wherein the at least one indication comprises at least one Speech Synthesis Markup Language tag.
  - 28. The at least one non-transitory computer-readable storage medium of claim 24, wherein identifying the plurality of tokens comprises:
    - tokenizing the text input;
      
      automatically identifying the text normalization type of the plurality of tokens; and
      
      automatically determining that the contrastive stress pattern is to be applied for the plurality of tokens.
  - 29. The at least one non-transitory computer-readable storage medium of claim 24, wherein the at least one token to be rendered with contrastive stress is identified based at least in part on an order of the plurality of tokens in the text input.

30-57. -57. (canceled)

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Meyer, Darren C., Springer, Stephen R.

Granted Patent

US 8,914,291 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/02 Methods for producing synth...

G10L 13/10 Prosody rules derived from ...

METHOD AND APPARATUS FOR GENERATING SYNTHETIC SPEECH WITH CONTRASTIVE STRESS

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND APPARATUS FOR GENERATING SYNTHETIC SPEECH WITH CONTRASTIVE STRESS

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links