METHOD AND APPARATUS FOR GENERATING SYNTHETIC SPEECH WITH CONTRASTIVE STRESS

US 20110202345A1
Filed: 08/09/2010
Published: 08/18/2011
Est. Priority Date: 02/12/2010
Status: Active Grant

First Claim

Patent Images

1. A method for use with a speech-enabled application, the method comprising:

receiving input from the speech-enabled application comprising a plurality of text strings;

generating, using at least one computer system, speech synthesis output corresponding to the plurality of text strings, the speech synthesis output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render at least one portion of at least one of the plurality of text strings as speech carrying contrastive stress, to contrast with at least one rendering of at least one other of the plurality of text strings; and

providing the speech synthesis output for the speech-enabled application.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.

55 Citations

View as Search Results

6 Claims

1. A method for use with a speech-enabled application, the method comprising:
- receiving input from the speech-enabled application comprising a plurality of text strings;
  
  generating, using at least one computer system, speech synthesis output corresponding to the plurality of text strings, the speech synthesis output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render at least one portion of at least one of the plurality of text strings as speech carrying contrastive stress, to contrast with at least one rendering of at least one other of the plurality of text strings; and
  
  providing the speech synthesis output for the speech-enabled application.

2. Apparatus for use with a speech-enabled application, the apparatus comprising:
- a memory storing a plurality of processor-executable instructions; and
  
  at least one processor, operatively coupled to the memory, that executes the instructions to;
  
  receive input from the speech-enabled application comprising a plurality of text strings;
  
  generate speech synthesis output corresponding to the plurality of text strings, the speech synthesis output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render at least one portion of at least one of the plurality of text strings as speech carrying contrastive stress, to contrast with at least one rendering of at least one other of the plurality of text strings; and
  
  provide the speech synthesis output for the speech-enabled application.

3. At least one non-transitory computer-readable storage medium encoded with a plurality of computer-executable instructions that, when executed, perform a method for use with a speech-enabled application, the method comprising:
- receiving input from the speech-enabled application comprising a plurality of text strings;
  
  generating speech synthesis output corresponding to the plurality of text strings, the speech synthesis output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render at least one portion of at least one of the plurality of text strings as speech carrying contrastive stress, to contrast with at least one rendering of at least one other of the plurality of text strings; and
  
  providing the speech synthesis output for the speech-enabled application.

4. A method for generating speech output via a speech-enabled application, the to method comprising:
- generating, using at least one computer system executing the speech-enabled application, a plurality of text strings, each of the plurality of text strings corresponding to a portion of a desired speech output;
  
  inputting the plurality of text strings to at least one software module for rendering contrastive stress;
  
  receiving output from the at least one software module, the output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render at least one portion of at least one of the plurality of text strings as speech carrying contrastive stress, to contrast with at least one rendering of at least one other of the plurality of text strings; and
  
  generating, using the plurality of audio recordings, an audio speech output corresponding to the desired speech output.

5. Apparatus for generating speech output via a speech-enabled application, the apparatus comprising:
- a memory storing a plurality of processor-executable instructions; and
  
  at least one processor, operatively coupled to the memory, that executes the instructions to;
  
  generate a plurality of text strings, each of the plurality of text strings corresponding to a portion of a desired speech output;
  
  input the plurality of text strings to at least one software module for rendering contrastive stress;
  
  receive output from the at least one software module, the output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render at least one portion of at least one of the plurality of text strings as speech carrying contrastive stress, to contrast with at least one rendering of at least one other of the plurality of text strings; and
  
  generate, using the plurality of audio recordings, an audio speech output corresponding to the desired speech output.

6. At least one non-transitory computer-readable storage medium encoded with a plurality of computer-executable instructions that, when executed, perform a method for generating speech output via a speech-enabled application, the method comprising:
- generating a plurality of text strings, each of the plurality of text strings corresponding to a portion of a desired speech output;
  
  inputting the plurality of text strings to at least one software module for rendering contrastive stress;
  
  receiving output from the at least one software module, the output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render at least one portion of at least one of the plurality of text strings as speech carrying contrastive stress, to contrast with at least one rendering of at least one other of the plurality of text strings; and
  
  generating, using the plurality of audio recordings, an audio speech output corresponding to the desired speech output.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Meyer, Darren C., Springer, Stephen R.

Granted Patent

US 8,447,610 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G10L 13/00   Speech synthesis; Text to s...

G10L 13/02   Methods for producing synth...

G10L 13/033   Voice editing, e.g. manipul...

G10L 13/04   Details of speech synthesis...

METHOD AND APPARATUS FOR GENERATING SYNTHETIC SPEECH WITH CONTRASTIVE STRESS

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

55 Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND APPARATUS FOR GENERATING SYNTHETIC SPEECH WITH CONTRASTIVE STRESS

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

55 Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links