Methods and systems for adaptation of synthetic speech in an environment
First Claim
Patent Images
1. A method, comprising:
- determining one or more characteristics of an environment of a device, wherein the device includes a text-to-speech module, wherein the one or more characteristics include one or more characteristics of a background sound in the environment of the device, and wherein the one or more characteristics of the environment are time-varying;
determining, based on the one or more characteristics of the environment, one or more speech parameters that characterize a voice output of the text-to-speech module, wherein determining the one or more speech parameters comprises;
determining a transform to convert a first set of speech parameters determined for a substantially sound-free background environment to a second set of speech parameters that includes Lombard parameters determined for a given environment with a previously determined background sound condition, wherein the Lombard parameters are determined such that the voice output is intelligible in the previously determined background sound condition,modifying, based on the one or more characteristics, the transform, andapplying the modified transform to one of (i) the first set of speech parameters, and (ii) the Lombard parameters to obtain the one or more speech parameters; and
processing, by the text-to-speech module, a text to obtain the voice output corresponding to the text based on the one or more speech parameters to account for the one or more characteristics of the environment.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for adaptation of synthetic speech in an environment are described. In an example, a device, which may include a text-to-speech (TTS) module, may be configured to determine characteristics of an environment of the device. The device also may be configured to determine, based on the one or more characteristics of the environment, speech parameters that characterize a voice output of the text-to-speech module. Further, the device may be configured to process a text to obtain the voice output corresponding to the text based on the speech parameters to account for the one or more characteristics of the environment.
-
Citations
17 Claims
-
1. A method, comprising:
-
determining one or more characteristics of an environment of a device, wherein the device includes a text-to-speech module, wherein the one or more characteristics include one or more characteristics of a background sound in the environment of the device, and wherein the one or more characteristics of the environment are time-varying; determining, based on the one or more characteristics of the environment, one or more speech parameters that characterize a voice output of the text-to-speech module, wherein determining the one or more speech parameters comprises; determining a transform to convert a first set of speech parameters determined for a substantially sound-free background environment to a second set of speech parameters that includes Lombard parameters determined for a given environment with a previously determined background sound condition, wherein the Lombard parameters are determined such that the voice output is intelligible in the previously determined background sound condition, modifying, based on the one or more characteristics, the transform, and applying the modified transform to one of (i) the first set of speech parameters, and (ii) the Lombard parameters to obtain the one or more speech parameters; and processing, by the text-to-speech module, a text to obtain the voice output corresponding to the text based on the one or more speech parameters to account for the one or more characteristics of the environment. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system comprising:
-
a device including a text-to-speech module; and a processor coupled to the device, and the processor is configured to; determine one or more characteristics of an environment of the device, wherein the one or more characteristics include one or more characteristics of a background sound in the environment of the device, and wherein the one or more characteristics of the environment are time-varying; determine, based on the one or more characteristics of the environment, one or more speech parameters that characterize a voice output of the text-to-speech module, wherein, to determine the one or more speech parameters, the processor is configured to; determine a transform to convert a first set of speech parameters determined for a substantially sound-free background environment to a second set of speech parameters that includes Lombard parameters determined for a given environment with a previously determined background sound condition, wherein the Lombard parameters are determined such that the voice output is intelligible in the previously determined background sound condition, modify, based on the one or more characteristics, the transform, and apply the modified transform to one of (i) the first set of speech parameters, and (ii) the Lombard parameters to obtain the one or more speech parameters; and process a text to obtain the voice output corresponding to the text based on the one or more speech parameters to account for the one or more characteristics of the environment. - View Dependent Claims (13)
-
-
14. A non-transitory computer readable medium having stored thereon instructions that, when executed by a computing device, cause the computing device to perform functions comprising:
-
determining one or more characteristics of an environment, wherein the one or more characteristics include one or more characteristics of a background sound in the environment of the device, and wherein the one or more characteristics of the environment are time-varying; determining, based on the one or more characteristics of the environment, one or more speech parameters that characterize a voice output of a text-to-speech module coupled to the computing device, wherein determining the one or more speech parameters comprises extrapolating or interpolating, based on the one or more characteristics, between a first set of speech parameters determined for a substantially background sound-free environment and a second set of speech parameters that are Lombard parameters determined for a given environment with a previously determined background sound condition, wherein the Lombard parameters are determined such that the voice output is intelligible in the previously determined background sound condition; processing, by the text-to-speech module, a text to obtain the voice output corresponding to the text based on the one or more speech parameters to account for the one or more characteristics of the environment. - View Dependent Claims (15, 16, 17)
-
Specification