Devices and methods for noise modulation in a universal vocoder synthesizer

US 9,607,610 B2
Filed: 02/26/2015
Issued: 03/28/2017
Est. Priority Date: 07/03/2014
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, by a device that includes one or more processors, an input indicative of acoustic feature parameters associated with speech;

identifying, using the input, a speech frame having an acoustic feature representation of the speech at a given time within a duration of the speech, wherein identifying the speech frame includes determining the acoustic feature parameters based on samples of the acoustic feature representation at harmonic frequencies associated with the speech frame;

based on the speech frame being a voiced speech frame, modifying aperiodicity parameters of the speech frame to correspond to;

a first value for first harmonic frequencies greater than a first threshold, a second value for second harmonic frequencies less than a second threshold, and one or more values between the first value and the second value for given harmonic frequencies less than the first threshold and greater than the second threshold;

based on the modified aperiodicity parameters, determining a dispersion factor for phase parameters of the speech frame, wherein determining the dispersion factor includes modifying the phase parameters of the speech frame based on the determined dispersion factor;

determining, for a harmonic frequency of the speech, based on the acoustic feature parameters, the modified phase parameters and the modified aperiodicity parameters, a modulated noise representation for modulating noise pertaining to one or more of an aspirate or a fricative in the speech, wherein the aspirate is associated with a characteristic of an exhalation of at least a threshold amount of breath, and wherein the fricative is associated with a characteristic of airflow between two or more vocal tract articulators; and

providing, by the device, an audio signal indicative of a synthetic audio pronunciation of the speech based on the modulated noise representation.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A device may receive an input indicative of acoustic feature parameters associated with speech. The device may determine a modulated noise representation for noise pertaining to one or more of an aspirate or a fricative in the speech based on the acoustic feature parameters. The aspirate may be associated with a characteristic of an exhalation of at least a threshold amount of breath. The fricative may be associated with a characteristic of airflow between two or more vocal tract articulators. The device may also provide an audio signal indicative of a synthetic audio pronunciation of the speech based on the modulated noise representation.

Citations

18 Claims

1. A method comprising:
- receiving, by a device that includes one or more processors, an input indicative of acoustic feature parameters associated with speech;
  
  identifying, using the input, a speech frame having an acoustic feature representation of the speech at a given time within a duration of the speech, wherein identifying the speech frame includes determining the acoustic feature parameters based on samples of the acoustic feature representation at harmonic frequencies associated with the speech frame;
  
  based on the speech frame being a voiced speech frame, modifying aperiodicity parameters of the speech frame to correspond to;
  
  a first value for first harmonic frequencies greater than a first threshold, a second value for second harmonic frequencies less than a second threshold, and one or more values between the first value and the second value for given harmonic frequencies less than the first threshold and greater than the second threshold;
  
  based on the modified aperiodicity parameters, determining a dispersion factor for phase parameters of the speech frame, wherein determining the dispersion factor includes modifying the phase parameters of the speech frame based on the determined dispersion factor;
  
  determining, for a harmonic frequency of the speech, based on the acoustic feature parameters, the modified phase parameters and the modified aperiodicity parameters, a modulated noise representation for modulating noise pertaining to one or more of an aspirate or a fricative in the speech, wherein the aspirate is associated with a characteristic of an exhalation of at least a threshold amount of breath, and wherein the fricative is associated with a characteristic of airflow between two or more vocal tract articulators; and
  
  providing, by the device, an audio signal indicative of a synthetic audio pronunciation of the speech based on the modulated noise representation.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, further comprising:
    - determining a representation of the speech that includes the acoustic feature parameters mapped to harmonic frequencies of the speech, wherein the representation includes modulated noise representations mapped also to the harmonic frequencies, and wherein the audio signal is based on the representation of the speech.
  - 3. The method of claim 1, further comprising:
    - determining, based on the input, the acoustic feature parameters including spectral parameters associated with the speech, aperiodicity parameters associated with the speech, and phase parameters associated with the speech.
  - 4. The method of claim 3, wherein the phase parameters are based on measured phase values indicated in the input and associated with one or more particular times within a duration of the speech.
  - 5. The method of claim 3, further comprising:
    - receiving, by the device, a selection indicative of selected types of the acoustic feature parameters from one or more of Cepstrum, Mel-Cepstrum, Generalized-Mel-Cepstrum, Discrete Mel-Cepstrum, Log-Spectral, Auto-Regressive, Line-Spectrum-Pairs, Line-Spectrum-Frequencies, Mel-Line-Spectrum-Pairs, Reflection Coefficients, Log-Area-Ratio Coefficients, minimum-phase, maximum-phase, sum-of-cosines pulse, sum-of-sines pulse, constant random pulse, log-aperiodicity, filterbank-based quantization, or maximum voiced frequency, wherein determining the acoustic feature parameters is based on the selection.
  - 6. The method of claim 1, wherein the given time corresponds to one or more of a time-instant associated with a characteristic of a glottal cycle of the speech or a given time-instant associated with an unvoiced portion of the speech.
  - 7. The method of claim 6, further comprising:
    - determining, based on the input, a voiced glottal closure time-instant of the speech, wherein identifying the given speech frame is based on the given time corresponding to the voiced glottal closure time-instant, and wherein the voiced glottal closure time-instant is associated with a characteristic of a closure of at least a portion of a glottis for articulation of at least a portion of the speech.
  - 8. The method of claim 6, further comprising:
    - determining, based on the input, an unvoiced time-instant of the speech, wherein identifying the given speech frame is based on the given time corresponding to the unvoiced time-instant.
  - 9. The method of claim 1, further comprising:
    - based on the given speech frame being an unvoiced speech frame, modifying the acoustic feature parameters of the given speech frame for given harmonic frequencies less than a threshold; and
      
      modifying phase parameters of the given speech frame to correspond to random phase values, wherein determining the modulated noise representation is based on modifying the acoustic feature parameters and modifying the phase parameters.
  - 10. The method of claim 1, wherein modifying the aperiodicity parameters includes monotonically increasing the one or more values associated with the given harmonic frequencies.
  - 11. The method of claim 1, further comprising:
    - receiving a sequence of speech frames indicative of the speech, wherein a first speech frame includes a first acoustic feature representation of the speech at a first time within a duration of the speech, and wherein receiving the input includes receiving the sequence, and wherein the sequence is associated with a given time-period between adjacent speech frames of the sequence;
      
      based on the first speech frame being a voiced speech frame, determining a pitch period of the first speech frame based on a pitch frequency indicated by the first acoustic feature representation;
      
      based on the first speech frame being an unvoiced speech frame, providing a given pitch period as the pitch period of the first speech frame; and
      
      identifying, from within the sequence, a second speech frame associated with a second time within the duration, wherein the second time is based on a sum of the first time and the pitch period, and wherein determining the modulated noise representation is based on the first acoustic feature representation and a second acoustic feature representation of the second speech frame.
  - 12. The method of claim 11, further comprising:
    - determining a plurality of synthetic audio sounds associated with portions of the speech, wherein a given synthetic audio sound has a given duration that corresponds to the given time-period between the adjacent speech frames in the sequence, and wherein providing the audio signal includes providing the plurality of synthetic audio sounds.

13. A non-transitory computer readable medium having stored therein instructions, that when executed by a computing device, cause the computing device to perform functions comprising:
- receiving an input indicative of acoustic feature parameters associated with speech;
  
  identifying, using the input, a speech frame having an acoustic feature representation of the speech at a given time within a duration of the speech, wherein identifying the speech frame includes determining the acoustic feature parameters based on samples of the acoustic feature representation at harmonic frequencies associated with the speech frame;
  
  based on the speech frame being a voiced speech frame, modifying aperiodicity parameters of the speech frame to correspond to;
  
  a first value for first harmonic frequencies greater than a first threshold, a second value for second harmonic frequencies less than a second threshold, and one or more values between the first value and the second value for given harmonic frequencies less than the first threshold and greater than the second threshold;
  
  based on the modified aperiodicity parameters, determining a dispersion factor for phase parameters of the speech frame, wherein determining the dispersion factor includes modifying the phase parameters of the speech frame based on the determined dispersion factor;
  
  determining, for a harmonic frequency of the speech, based on the acoustic feature parameters, the modified phase parameters and the modified aperiodicity parameters, a modulated noise representation for modulating noise pertaining to one or more of an aspirate or a fricative in the speech, wherein the aspirate is associated with a characteristic of an exhalation of at least a threshold amount of breath, and wherein the fricative is associated with a characteristic of airflow between two or more vocal tract articulators; and
  
  providing an audio signal indicative of a synthetic audio pronunciation of the speech based on the modulated noise representation.
- View Dependent Claims (14, 15)
- - 14. The non-transitory computer readable medium of claim 13, the functions further comprising:
    - determining a representation of the speech that includes the acoustic feature parameters mapped to harmonic frequencies of the speech, wherein the representation includes modulated noise representations mapped also to the harmonic frequencies, and wherein the audio signal is based on the representation of the speech.
  - 15. The non-transitory computer readable medium of claim 13, the functions further comprising:
    - determining, based on the input, the acoustic feature parameters including spectral parameters associated with the speech, aperiodicity parameters associated with the speech, and phase parameters associated with the speech.

16. A device comprising:
- one or more processors; and
  
  data storage configured to store instructions executable by the one or more processors to cause the device to;
  
  receive an input indicative of acoustic feature parameters associated with speech;
  
  identify, using the input, a speech frame having an acoustic feature representation of the speech at a given time within a duration of the speech, wherein identifying the speech frame includes determining the acoustic feature parameters based on samples of the acoustic feature representation at harmonic frequencies associated with the speech frame;
  
  based on the speech frame being a voiced speech frame, modify aperiodicity parameters of the speech frame to correspond to;
  
  a first value for first harmonic frequencies greater than a first threshold, a second value for second harmonic frequencies less than a second threshold, and one or more values between the first value and the second value for given harmonic frequencies less than the first threshold and greater than the second threshold;
  
  based on the modified aperiodicity parameters, determine a dispersion factor for phase parameters of the speech frame, wherein determining the dispersion factor includes modifying the phase parameters of the speech frame based on the determined dispersion factor;
  
  determine, for a harmonic frequency of the speech, based on the acoustic feature parameters, the modified phase parameters and the modified aperiodicity parameters, a modulated noise representation for modulating noise pertaining to one or more of an aspirate or a fricative in the speech, wherein the aspirate is associated with a characteristic of an exhalation of at least a threshold amount of breath, and wherein the fricative is associated with a characteristic of airflow between two or more vocal tract articulators; and
  
  provide an audio signal indicative of a synthetic audio pronunciation of the speech based on the modulated noise representation.
- View Dependent Claims (17, 18)
- - 17. The device of claim 16, wherein the instructions further cause the device to:
    - determine a representation of the speech that includes the acoustic feature parameters mapped to harmonic frequencies of the speech, wherein the representation includes modulated noise representations mapped also to the harmonic frequencies, and wherein the audio signal is based on the representation of the speech.
  - 18. The device of claim 16, wherein the instructions further cause the device to:
    - determine, based on the input, the acoustic feature parameters including spectral parameters associated with the speech, aperiodicity parameters associated with the speech, and phase parameters associated with the speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Agiomyrgiannakis, Ioannis
Primary Examiner(s)
WOZNIAK, JAMES S

Application Number

US14/632,890
Publication Number

US 20160005392A1
Time in Patent Office

761 Days
Field of Search

704/258, 704/260, 704/261, 704/268
US Class Current

1/1
CPC Class Codes

G10L 13/027   Concept to speech synthesis...

G10L 13/033   Voice editing, e.g. manipul...

G10L 19/02   using spectral analysis, e....

G10L 19/08   Determination or coding of ...

G10L 19/16   Vocoder architecture

G10L 25/75   for modelling vocal tract p...

Devices and methods for noise modulation in a universal vocoder synthesizer

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Devices and methods for noise modulation in a universal vocoder synthesizer

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links