Methods and systems for adaptation of synthetic speech in an environment

US 8,571,871 B1
Filed: 10/02/2012
Issued: 10/29/2013
Est. Priority Date: 10/02/2012
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

determining one or more characteristics of an environment of a device, wherein the device includes a text-to-speech module, wherein the one or more characteristics include one or more characteristics of a background sound in the environment of the device, and wherein the one or more characteristics of the environment are time-varying;

determining, based on the one or more characteristics of the environment, one or more speech parameters that characterize a voice output of the text-to-speech module, wherein determining the one or more speech parameters comprises;

determining a transform to convert a first set of speech parameters determined for a substantially sound-free background environment to a second set of speech parameters that includes Lombard parameters determined for a given environment with a previously determined background sound condition, wherein the Lombard parameters are determined such that the voice output is intelligible in the previously determined background sound condition,modifying, based on the one or more characteristics, the transform, andapplying the modified transform to one of (i) the first set of speech parameters, and (ii) the Lombard parameters to obtain the one or more speech parameters; and

processing, by the text-to-speech module, a text to obtain the voice output corresponding to the text based on the one or more speech parameters to account for the one or more characteristics of the environment.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and systems for adaptation of synthetic speech in an environment are described. In an example, a device, which may include a text-to-speech (TTS) module, may be configured to determine characteristics of an environment of the device. The device also may be configured to determine, based on the one or more characteristics of the environment, speech parameters that characterize a voice output of the text-to-speech module. Further, the device may be configured to process a text to obtain the voice output corresponding to the text based on the speech parameters to account for the one or more characteristics of the environment.

Citations

17 Claims

1. A method, comprising:
- determining one or more characteristics of an environment of a device, wherein the device includes a text-to-speech module, wherein the one or more characteristics include one or more characteristics of a background sound in the environment of the device, and wherein the one or more characteristics of the environment are time-varying;
  
  determining, based on the one or more characteristics of the environment, one or more speech parameters that characterize a voice output of the text-to-speech module, wherein determining the one or more speech parameters comprises;
  
  determining a transform to convert a first set of speech parameters determined for a substantially sound-free background environment to a second set of speech parameters that includes Lombard parameters determined for a given environment with a previously determined background sound condition, wherein the Lombard parameters are determined such that the voice output is intelligible in the previously determined background sound condition,modifying, based on the one or more characteristics, the transform, andapplying the modified transform to one of (i) the first set of speech parameters, and (ii) the Lombard parameters to obtain the one or more speech parameters; and
  
  processing, by the text-to-speech module, a text to obtain the voice output corresponding to the text based on the one or more speech parameters to account for the one or more characteristics of the environment.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The method of claim 1, wherein the one or more speech parameters include one or more of volume, duration, pitch, and spectrum.
  - 3. The method of claim 1, wherein the one or more characteristics of the background sound include one or more of (i) signal-to-noise ratio (SNR) relating to the background sound, (ii) background sound pressure level, or (iii) type of the background sound.
  - 4. The method of claim 1, wherein determining the one or more speech parameters comprises extrapolating or interpolating between the first set of speech parameters and the Lombard parameters, based on the one or more characteristics.
  - 5. The method of claim 1, wherein processing the text comprises:
    - synthesizing a voice signal from the text based on one or more speech waveforms pre-recorded in a given environment having one or more predetermined characteristics; and
      
      modifying, using the one or more speech parameters, the voice signal to obtain the voice output of the text.
  - 6. The method of claim 5, wherein the one or more speech waveforms are pre-recorded in the substantially sound-free background environment.
  - 7. The method of claim 5, wherein the voice output corresponding to the text differs from the synthesized voice signal in one or more of volume, duration, pitch, and spectrum to account for the one or more characteristics of the environment of the device.
  - 8. The method of claim 5, wherein modifying the synthesized voice signal comprises scaling, based on the one or more speech parameters, one or more signal parameters of the synthesized voice signal by a factor, wherein the one or more speech parameters include one or more of volume, duration, pitch, and spectrum.
  - 9. The method of claim 1, wherein processing the text comprises:
    - determining, using the one or more speech parameters, a Hidden Markov Model generated to model a parametric representation of spectral and excitation parameters of speech; and
      
      synthesizing, using the Hidden Markov Model, a speech waveform to generate the voice output corresponding to the text.
  - 10. The method of claim 1, wherein the one or more speech parameters are time-varying.
  - 11. The method of claim 10, wherein determining the one or more speech parameters comprises determining the one or more speech parameters in real-time to account for the time-varying characteristics of the environment.

12. A system comprising:
- a device including a text-to-speech module; and
  
  a processor coupled to the device, and the processor is configured to;
  
  determine one or more characteristics of an environment of the device, wherein the one or more characteristics include one or more characteristics of a background sound in the environment of the device, and wherein the one or more characteristics of the environment are time-varying;
  
  determine, based on the one or more characteristics of the environment, one or more speech parameters that characterize a voice output of the text-to-speech module, wherein, to determine the one or more speech parameters, the processor is configured to;
  
  determine a transform to convert a first set of speech parameters determined for a substantially sound-free background environment to a second set of speech parameters that includes Lombard parameters determined for a given environment with a previously determined background sound condition, wherein the Lombard parameters are determined such that the voice output is intelligible in the previously determined background sound condition,modify, based on the one or more characteristics, the transform, andapply the modified transform to one of (i) the first set of speech parameters, and (ii) the Lombard parameters to obtain the one or more speech parameters; and
  
  process a text to obtain the voice output corresponding to the text based on the one or more speech parameters to account for the one or more characteristics of the environment.
- View Dependent Claims (13)
- - 13. The system of claim 12, further comprising:
    - an audio capture unit coupled to the device, wherein the one or more characteristics include one or more characteristics of a background sound received from the audio capture unit; and
      
      a memory coupled to the processor, and the memory is configured to store (i) the first set of speech parameters corresponding to the substantially sound-free background environment, and (ii) the second set of speech parameters that are Lombard parameters determined for the given environment with the previously determined background sound condition.

14. A non-transitory computer readable medium having stored thereon instructions that, when executed by a computing device, cause the computing device to perform functions comprising:
- determining one or more characteristics of an environment, wherein the one or more characteristics include one or more characteristics of a background sound in the environment of the device, and wherein the one or more characteristics of the environment are time-varying;
  
  determining, based on the one or more characteristics of the environment, one or more speech parameters that characterize a voice output of a text-to-speech module coupled to the computing device, wherein determining the one or more speech parameters comprises extrapolating or interpolating, based on the one or more characteristics, between a first set of speech parameters determined for a substantially background sound-free environment and a second set of speech parameters that are Lombard parameters determined for a given environment with a previously determined background sound condition, wherein the Lombard parameters are determined such that the voice output is intelligible in the previously determined background sound condition;
  
  processing, by the text-to-speech module, a text to obtain the voice output corresponding to the text based on the one or more speech parameters to account for the one or more characteristics of the environment.
- View Dependent Claims (15, 16, 17)
- - 15. The non-transitory computer readable medium of claim 14, wherein the function of processing the text to obtain the voice output comprises:
    - synthesizing a voice signal from the text based on one or more speech waveforms pre-recorded in a substantially sound-free background environment; and
      
      modifying, using the one or more speech parameters, the synthesized voice signal to obtain the voice output corresponding to the text such that the voice output corresponding to the text is intelligible in the environment.
  - 16. The non-transitory computer readable medium of claim 15, wherein the function of modifying the synthesized voice signal comprises scaling, based on the one or more speech parameters, one or more signal parameters of the synthesized voice signal by a factor, wherein the one or more speech parameters include one or more of volume, duration, pitch, and spectrum.
  - 17. The non-transitory computer readable medium of claim 14, wherein the one or more characteristics of the background sound include one or more of (i) signal-to-noise ratio (SNR) relating to the background sound, and (ii) type of the background sound, and wherein the functions further comprise updating the one or more speech parameters in real-time to account for the time-varying characteristics of the environment.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Stuttle, Matthew Nicholas, Agiomyrgiannakis, Ioannis
Primary Examiner(s)
JACKSON, JAKIEDA R

Application Number

US13/633,231
Time in Patent Office

392 Days
Field of Search

704/260
US Class Current

704/260
CPC Class Codes

G10L 13/033 Voice editing, e.g. manipul...

G10L 21/003 Changing voice quality, e.g...

Methods and systems for adaptation of synthetic speech in an environment

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and systems for adaptation of synthetic speech in an environment

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links