Method and apparatus for generating synthetic speech with contrastive stress

US 8,447,610 B2
Filed: 08/09/2010
Issued: 05/21/2013
Est. Priority Date: 02/12/2010
Status: Active Grant

First Claim

Patent Images

1. A method for use with a speech-enabled application, the method comprising:

receiving, from the speech-enabled application, input comprising a plurality of text strings;

identifying a first portion of a first text string of the plurality of text strings that differs from a corresponding first portion of a second text string of the plurality of text strings, and a second portion of the first text string that does not differ from a corresponding second portion of the second text string;

assigning contrastive stress to the identified first portion of the first text string, but not to the identified second portion of the first text string;

generating, using at least one computer system, speech synthesis output corresponding to the plurality of text strings, the speech synthesis output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render the first portion of the first text string as speech carrying contrastive stress, to contrast with the rendering of the second text string; and

providing the speech synthesis output for the speech-enabled application.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.

Citations

15 Claims

1. A method for use with a speech-enabled application, the method comprising:
- receiving, from the speech-enabled application, input comprising a plurality of text strings;
  
  identifying a first portion of a first text string of the plurality of text strings that differs from a corresponding first portion of a second text string of the plurality of text strings, and a second portion of the first text string that does not differ from a corresponding second portion of the second text string;
  
  assigning contrastive stress to the identified first portion of the first text string, but not to the identified second portion of the first text string;
  
  generating, using at least one computer system, speech synthesis output corresponding to the plurality of text strings, the speech synthesis output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render the first portion of the first text string as speech carrying contrastive stress, to contrast with the rendering of the second text string; and
  
  providing the speech synthesis output for the speech-enabled application.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, wherein the identifying comprises identifying the first portion of the first text string that differs from the corresponding first portion of the second text string based at least in part on a normalized orthography of the first and second text strings.
  - 3. The method of claim 1, wherein the first and second text strings represent different numerical fields within a larger text string.
  - 4. The method of claim 1, wherein the receiving comprises receiving the first and second text strings as first and second parameters passed to a function called by the speech-enabled application to render the first and second text strings with a contrastive stress pattern.

5. Apparatus for use with a speech-enabled application, the apparatus comprising:
- a memory storing a plurality of processor-executable instructions; and
  
  at least one processor, operatively coupled to the memory, configured to execute the instructions to;
  
  receive from the speech-enabled application, input comprising a plurality of text strings;
  
  identify a first portion of a first text string of the plurality of text strings that differs from a corresponding first portion of a second text string of the plurality of text strings, and a second portion of the first text string that does not differ from a corresponding second portion of the second text string;
  
  assign contrastive stress to the identified first portion of the first text string, but not to the identified second portion of the first text string;
  
  generate speech synthesis output corresponding to the plurality of text strings, the speech synthesis output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render the first portion of the first text string as speech carrying contrastive stress, to contrast with the rendering of the second text string; and
  
  provide the speech synthesis output for the speech-enabled application.
- View Dependent Claims (6, 7, 8)
- - 6. The apparatus of claim 5, wherein the at least one processor is configured to execute the instructions to identify the first portion of the first text string that differs from the corresponding first portion of the second text string based at least in part on a normalized orthography of the first and second text strings.
  - 7. The apparatus of claim 5, wherein the first and second text strings represent different numerical fields within a larger text string.
  - 8. The apparatus of claim 5, wherein the at least one processor is configured to execute the instructions to receive the first and second text strings as first and second parameters passed to a function called by the speech-enabled application to render the first and second text strings with a contrastive stress pattern.

9. At least one non-transitory computer-readable storage medium encoded with a plurality of computer-executable instructions that, when executed, perform a method for use with a speech-enabled application, the method comprising:
- receiving, from the speech-enabled application, input comprising a plurality of text strings;
  
  identifying a first portion of a first text string of the plurality of text strings that differs from a corresponding first portion of a second text string of the plurality of text strings, and a second portion of the first text string that does not differ from a corresponding second portion of the second text string;
  
  assigning contrastive stress to the identified first portion of the first text string, but not to the identified second portion of the first text string;
  
  generating speech synthesis output corresponding to the plurality of text strings, the speech synthesis output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render the first portion of the first text string as speech carrying contrastive stress, to contrast with the rendering of the second text string; and
  
  providing the speech synthesis output for the speech-enabled application.
- View Dependent Claims (10, 11, 12)
- - 10. The at least one non-transitory computer-readable storage medium of claim 9, wherein the identifying comprises identifying the first portion of the first text string that differs from the corresponding first portion of the second text string based at least in part on a normalized orthography of the first and second text strings.
  - 11. The at least one non-transitory computer-readable storage medium of claim 9, wherein the first and second text strings represent different numerical fields within a larger text string.
  - 12. The at least one non-transitory computer-readable storage medium of claim 9, wherein the receiving comprises receiving the first and second text strings as first and second parameters passed to a function called by the speech-enabled application to render the first and second text strings with a contrastive stress pattern.

13. A method for generating speech output via a speech-enabled application, the method comprising:
- generating, using at least one computer system executing the speech-enabled application, a plurality of text strings, each of the plurality of text strings corresponding to a portion of a desired speech output, wherein a first portion of a first text string of the plurality of text strings differs from a corresponding first portion of a second text string of the plurality of text strings, and a second portion of the first text string does not differ from a corresponding second portion of the second text string;
  
  inputting the plurality of text strings to at least one software module for rendering contrastive stress;
  
  receiving output from the at least one software module, the output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render the first portion of the first text string as speech carrying contrastive stress, to contrast with the rendering of the second text string, and at least one other of the plurality of audio recordings being selected to render the second portion of the first text string as speech not carrying contrastive stress; and
  
  generating, using the plurality of audio recordings, an audio speech output corresponding to the desired speech output.

14. Apparatus for generating speech output via a speech-enabled application, the apparatus comprising:
- a memory storing a plurality of processor-executable instructions; and
  
  at least one processor, operatively coupled to the memory, configured to execute the instructions to;
  
  generate a plurality of text strings, each of the plurality of text strings corresponding to a portion of a desired speech output, wherein a first portion of a first text string of the plurality of text strings differs from a corresponding first portion of a second text string of the plurality of text strings, and a second portion of the first text string does not differ from a corresponding second portion of the second text string;
  
  input the plurality of text strings to at least one software module for rendering contrastive stress;
  
  receive output from the at least one software module, the output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render the first portion of the first text string as speech carrying contrastive stress, to contrast with the rendering of the second text string, and at least one other of the plurality of audio recordings being selected to render the second portion of the first text string as speech not carrying contrastive stress; and
  
  generate, using the plurality of audio recordings, an audio speech output corresponding to the desired speech output.

15. At least one non-transitory computer-readable storage medium encoded with a plurality of computer-executable instructions that, when executed, perform a method for generating speech output via a speech-enabled application, the method comprising:
- generating a plurality of text strings, each of the plurality of text strings corresponding to a portion of a desired speech output, wherein a first portion of a first text string of the plurality of text strings differs from a corresponding first portion of a second text string of the plurality of text strings, and a second portion of the first text string does not differ from a corresponding second portion of the second text string;
  
  inputting the plurality of text strings to at least one software module for rendering contrastive stress;
  
  receiving output from the at least one software module, the output identifying a plurality of audio recordings to render the plurality of text strings as speech, at least one of the plurality of audio recordings being selected to render the first portion of the first text string as speech carrying contrastive stress, to contrast with the rendering of the second text string, and at least one other of the plurality of audio recordings being selected to render the second portion of the first text string as speech not carrying contrastive stress; and
  
  generating, using the plurality of audio recordings, an audio speech output corresponding to the desired speech output.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cerence Operating Company (Cerence Inc.)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Meyer, Darren C., Springer, Stephen R.
Primary Examiner(s)
COLUCCI, MICHAEL C

Application Number

US12/853,026
Publication Number

US 20110202345A1
Time in Patent Office

1,016 Days
Field of Search

704/260, 704/271, 704/258, 704/234, 704/209, 434/236, 434/178
US Class Current

704/260
CPC Class Codes

G10L 13/00   Speech synthesis; Text to s...

G10L 13/02   Methods for producing synth...

G10L 13/033   Voice editing, e.g. manipul...

G10L 13/04   Details of speech synthesis...

Method and apparatus for generating synthetic speech with contrastive stress

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for generating synthetic speech with contrastive stress

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links