Devices and methods for noise modulation in a universal vocoder synthesizer
First Claim
Patent Images
1. A method comprising:
- receiving, by a device that includes one or more processors, an input indicative of acoustic feature parameters associated with speech;
identifying, using the input, a speech frame having an acoustic feature representation of the speech at a given time within a duration of the speech, wherein identifying the speech frame includes determining the acoustic feature parameters based on samples of the acoustic feature representation at harmonic frequencies associated with the speech frame;
based on the speech frame being a voiced speech frame, modifying aperiodicity parameters of the speech frame to correspond to;
a first value for first harmonic frequencies greater than a first threshold, a second value for second harmonic frequencies less than a second threshold, and one or more values between the first value and the second value for given harmonic frequencies less than the first threshold and greater than the second threshold;
based on the modified aperiodicity parameters, determining a dispersion factor for phase parameters of the speech frame, wherein determining the dispersion factor includes modifying the phase parameters of the speech frame based on the determined dispersion factor;
determining, for a harmonic frequency of the speech, based on the acoustic feature parameters, the modified phase parameters and the modified aperiodicity parameters, a modulated noise representation for modulating noise pertaining to one or more of an aspirate or a fricative in the speech, wherein the aspirate is associated with a characteristic of an exhalation of at least a threshold amount of breath, and wherein the fricative is associated with a characteristic of airflow between two or more vocal tract articulators; and
providing, by the device, an audio signal indicative of a synthetic audio pronunciation of the speech based on the modulated noise representation.
2 Assignments
0 Petitions
Accused Products
Abstract
A device may receive an input indicative of acoustic feature parameters associated with speech. The device may determine a modulated noise representation for noise pertaining to one or more of an aspirate or a fricative in the speech based on the acoustic feature parameters. The aspirate may be associated with a characteristic of an exhalation of at least a threshold amount of breath. The fricative may be associated with a characteristic of airflow between two or more vocal tract articulators. The device may also provide an audio signal indicative of a synthetic audio pronunciation of the speech based on the modulated noise representation.
-
Citations
18 Claims
-
1. A method comprising:
-
receiving, by a device that includes one or more processors, an input indicative of acoustic feature parameters associated with speech; identifying, using the input, a speech frame having an acoustic feature representation of the speech at a given time within a duration of the speech, wherein identifying the speech frame includes determining the acoustic feature parameters based on samples of the acoustic feature representation at harmonic frequencies associated with the speech frame; based on the speech frame being a voiced speech frame, modifying aperiodicity parameters of the speech frame to correspond to;
a first value for first harmonic frequencies greater than a first threshold, a second value for second harmonic frequencies less than a second threshold, and one or more values between the first value and the second value for given harmonic frequencies less than the first threshold and greater than the second threshold;based on the modified aperiodicity parameters, determining a dispersion factor for phase parameters of the speech frame, wherein determining the dispersion factor includes modifying the phase parameters of the speech frame based on the determined dispersion factor; determining, for a harmonic frequency of the speech, based on the acoustic feature parameters, the modified phase parameters and the modified aperiodicity parameters, a modulated noise representation for modulating noise pertaining to one or more of an aspirate or a fricative in the speech, wherein the aspirate is associated with a characteristic of an exhalation of at least a threshold amount of breath, and wherein the fricative is associated with a characteristic of airflow between two or more vocal tract articulators; and providing, by the device, an audio signal indicative of a synthetic audio pronunciation of the speech based on the modulated noise representation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer readable medium having stored therein instructions, that when executed by a computing device, cause the computing device to perform functions comprising:
-
receiving an input indicative of acoustic feature parameters associated with speech; identifying, using the input, a speech frame having an acoustic feature representation of the speech at a given time within a duration of the speech, wherein identifying the speech frame includes determining the acoustic feature parameters based on samples of the acoustic feature representation at harmonic frequencies associated with the speech frame; based on the speech frame being a voiced speech frame, modifying aperiodicity parameters of the speech frame to correspond to;
a first value for first harmonic frequencies greater than a first threshold, a second value for second harmonic frequencies less than a second threshold, and one or more values between the first value and the second value for given harmonic frequencies less than the first threshold and greater than the second threshold;based on the modified aperiodicity parameters, determining a dispersion factor for phase parameters of the speech frame, wherein determining the dispersion factor includes modifying the phase parameters of the speech frame based on the determined dispersion factor; determining, for a harmonic frequency of the speech, based on the acoustic feature parameters, the modified phase parameters and the modified aperiodicity parameters, a modulated noise representation for modulating noise pertaining to one or more of an aspirate or a fricative in the speech, wherein the aspirate is associated with a characteristic of an exhalation of at least a threshold amount of breath, and wherein the fricative is associated with a characteristic of airflow between two or more vocal tract articulators; and providing an audio signal indicative of a synthetic audio pronunciation of the speech based on the modulated noise representation. - View Dependent Claims (14, 15)
-
-
16. A device comprising:
-
one or more processors; and data storage configured to store instructions executable by the one or more processors to cause the device to; receive an input indicative of acoustic feature parameters associated with speech; identify, using the input, a speech frame having an acoustic feature representation of the speech at a given time within a duration of the speech, wherein identifying the speech frame includes determining the acoustic feature parameters based on samples of the acoustic feature representation at harmonic frequencies associated with the speech frame; based on the speech frame being a voiced speech frame, modify aperiodicity parameters of the speech frame to correspond to;
a first value for first harmonic frequencies greater than a first threshold, a second value for second harmonic frequencies less than a second threshold, and one or more values between the first value and the second value for given harmonic frequencies less than the first threshold and greater than the second threshold;based on the modified aperiodicity parameters, determine a dispersion factor for phase parameters of the speech frame, wherein determining the dispersion factor includes modifying the phase parameters of the speech frame based on the determined dispersion factor; determine, for a harmonic frequency of the speech, based on the acoustic feature parameters, the modified phase parameters and the modified aperiodicity parameters, a modulated noise representation for modulating noise pertaining to one or more of an aspirate or a fricative in the speech, wherein the aspirate is associated with a characteristic of an exhalation of at least a threshold amount of breath, and wherein the fricative is associated with a characteristic of airflow between two or more vocal tract articulators; and provide an audio signal indicative of a synthetic audio pronunciation of the speech based on the modulated noise representation. - View Dependent Claims (17, 18)
-
Specification