Wideband speech parameterization for high quality synthesis, transformation and quantization

US 9,224,402 B2
Filed: 09/30/2013
Issued: 12/29/2015
Est. Priority Date: 09/30/2013
Status: Expired due to Fees

First Claim

Patent Images

1. A method for speech parameterization and coding of a continuous speech signal, comprising:

receiving a continuous speech signal representing speech recorded by at least one microphone,dividing said continuous speech signal into a plurality of speech frames, and for each one of said plurality of speech frames;

modeling said speech frame by a first harmonic modeling to produce a plurality of harmonic model parameter values, wherein said first harmonic modeling is estimated by computing a cost function between a plurality of sine function signals and said speech frame, wherein each of said plurality of sine function signals comprises one of a plurality of harmonic frequencies, an amplitude value and a phrase value;

reconstructing an estimated frame signal from said plurality of harmonic model parameter values;

subtracting said estimated frame signal from said speech frame to produce a harmonic model residual signal;

performing at least one second harmonic modeling analysis on said first harmonic model residual to determine at least one set of second harmonic model component values;

removing said at least one set of second harmonic model component values from said first harmonic model residual signal to produce a harmonically-filtered residual signal; and

processing said harmonically-filtered residual signal with analysis by synthesis techniques to produce vectors of codebook indices and corresponding gains, andsending said plurality of harmonic model parameter values and said codebook vector indices and corresponding gains to a speech processor configured to compute at least one of a speech transformation, a signal compression and a conversion to an audible sound output.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for speech parameterization and coding of a continuous speech signal. The method comprises dividing said speech signal into a plurality of speech frames, and for each one of the plurality of speech frames, modeling said speech frame by a first harmonic modeling to produce a plurality of harmonic model parameters, reconstructing an estimated frame signal from the plurality of harmonic model parameters, subtracting the estimated frame signal from the speech frame to produce a harmonic model residual, performing at least one second harmonic modeling analysis on the first harmonic model residual to determine at least one set of second harmonic model components, removing the at least one set of second harmonic model components from the first harmonic model residual to produce a harmonically-filtered residual signal, and processing the harmonically-filtered residual signal with analysis by synthesis techniques to produce vectors of codebook indices and corresponding gains.

18 Citations

View as Search Results

20 Claims

1. A method for speech parameterization and coding of a continuous speech signal, comprising:
- receiving a continuous speech signal representing speech recorded by at least one microphone,dividing said continuous speech signal into a plurality of speech frames, and for each one of said plurality of speech frames;
  
  modeling said speech frame by a first harmonic modeling to produce a plurality of harmonic model parameter values, wherein said first harmonic modeling is estimated by computing a cost function between a plurality of sine function signals and said speech frame, wherein each of said plurality of sine function signals comprises one of a plurality of harmonic frequencies, an amplitude value and a phrase value;
  
  reconstructing an estimated frame signal from said plurality of harmonic model parameter values;
  
  subtracting said estimated frame signal from said speech frame to produce a harmonic model residual signal;
  
  performing at least one second harmonic modeling analysis on said first harmonic model residual to determine at least one set of second harmonic model component values;
  
  removing said at least one set of second harmonic model component values from said first harmonic model residual signal to produce a harmonically-filtered residual signal; and
  
  processing said harmonically-filtered residual signal with analysis by synthesis techniques to produce vectors of codebook indices and corresponding gains, andsending said plurality of harmonic model parameter values and said codebook vector indices and corresponding gains to a speech processor configured to compute at least one of a speech transformation, a signal compression and a conversion to an audible sound output.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein said harmonic modeling is performed by using speech frame'"'"'s energy envelope estimated signal.
  - 3. The method of claim 1, wherein said at least one set of second harmonic model component values is removed in a plurality of iterations so that during each one of said plurality of iterations the following is performed until a remaining harmonic component cost function is below a threshold:
    - analyzing new harmonic model of previous harmonic model residual to produce new set of harmonic model component values,removing said new set of harmonic component values from said previous harmonic model residual to produce a new harmonic model residual for further iterations.
  - 4. The method of claim 1, wherein said removed at least one set of harmonic component values is stored for later use during decoding of signal and reconstruction of audible output.
  - 5. The method of claim 3, wherein said new harmonic modeling uses at least one estimated energy envelope signal.
  - 6. The method of claim 1, wherein said speech frame is spectrally whitened prior to said first harmonic modeling, and said spectrally whitening is reversed prior to said speech coding analysis.
  - 7. The method of claim 1, wherein said speech frame is spectrally whitened after said first harmonic modeling, and said spectrally whitening is reversed prior to said speech coding analysis.
  - 8. The method of claim 1, wherein said harmonically-filtered residual signal is further processed to remove periodic energy envelope modulation by modeling using a sum of multiple instances of a periodic function at arbitrary frequencies taking into account the time-domain energy envelope signal estimate with imposed periodicity before analysis by synthesis coding.
  - 9. The method of claim 8, wherein said harmonically-filtered residual signal is frequency range filtered before performing said modeling to remove only the frequency range specific periodic energy envelope modulation.
  - 10. The method of claim 1, where said first harmonic model parameter values undergo further processing for speech transformation.

11. A method for speech parameterization and coding of a continuous speech signal, comprising:
- receiving a continuous speech signal representing speech recorded by at least one microphone,dividing said speech signal into a plurality of speech frames;
  
  for each one of said plurality of speech frames;
  
  modeling said speech frame by a first harmonic modeling to produce a plurality of harmonic model parameter values, wherein said first harmonic modeling is estimated by computing a cost function between a plurality of sine function signals and said speech frame, wherein each of said plurality of sine function signals comprises one of a plurality of harmonic frequencies, an amplitude value and a phrase value;
  
  reconstructing an estimated frame signal from said plurality of harmonic model parameter values;
  
  subtracting said estimated frame signal from said speech frame to produce a harmonic model residual signal;
  
  removing at least one harmonic component value from said first harmonic model residual signal to produce a harmonically-filtered residual signal;
  
  removing periodic energy envelope modulation using a second modeling of said harmonically-filtered residual signal using a sum of multiple instances of a periodic function at arbitrary frequencies taking into account the time-domain energy envelope signal estimate with imposed periodicity; and
  
  processing said harmonically-filtered residual signal with analysis by synthesis techniques to produce vectors of codebook indices and corresponding gains, andsending said plurality of harmonic model parameter values and said codebook vector indices and corresponding gains to a speech processor configured to compute at least one of a speech transformation, a signal compression and a conversion to an audible sound output.
- View Dependent Claims (12, 13, 14, 15, 16)
- - 12. The method of claim 11, wherein said first harmonic modeling is performed by using speech frame'"'"'s energy envelope estimated signal.
  - 13. The method of claim 11, wherein said speech frame is spectrally whitened prior to said first harmonic modeling, and said spectrally whitening is reversed prior to said speech coding analysis.
  - 14. The method of claim 11, wherein said harmonic model residual is spectrally whitened after said first harmonic modeling, and said spectrally whitening is reversed prior to said speech coding analysis.
  - 15. The method of claim 11, wherein said harmonically-filtered residual signal is frequency range filtered before performing said second modeling to remove only the frequency range specific periodic energy envelope modulation.
  - 16. The method of claim 11, where said first harmonic model parameter values undergo further processing for speech transformation.

17. An apparatus for speech parameterization and coding of a continuous speech signal, comprising:
- at least one input interface for receiving and digitizing said continuous speech signal;
  
  at least one processing unit for performing the actions of;
  
  receiving a continuous speech signal representing speech recorded by at least one microphone,dividing said continuous speech signal into a plurality of speech frames, and for each one of said plurality of speech frames;
  
  modeling said speech frame by a first harmonic model to produce a plurality of frame model parameter values and harmonic model residual, wherein said first harmonic modeling is estimated by computing a cost function between a plurality of sine function signals and said speech frame, wherein each of said plurality of sine function signals comprises one of a plurality of harmonic frequencies, an amplitude value and a phrase value;
  
  performing at least one second harmonic modeling analysis on said first harmonic model residual to remove at least one set of second harmonic model component values from said first harmonic model residual signal to produce a harmonically-filtered residual signal; and
  
  processing said harmonically-filtered residual signal with analysis by synthesis techniques to produce vectors of codebook indices and corresponding gains, andsending said plurality of harmonic model parameter values and said codebook vector indices and corresponding gains to a speech processor configured to compute at least one of a speech transformation, a signal compression and a conversion to an audible sound output;
  
  at least one output interface to send said plurality of speech parameter values and codes; and
  
  a housing for containing said at least one input interface, said at least one processing unit, and said at least one output interface, said housing being configured and suitable for the apparatus environment.
- View Dependent Claims (18, 19, 20)
- - 18. The apparatus of claim 17, wherein said harmonically-filtered residual signal is further processed to remove periodic energy envelope modulation using a modeling action using a sum of multiple instances of a periodic function at arbitrary frequencies taking into account the time-domain energy envelope signal estimate with imposed periodicity before analysis by synthesis coding.
  - 19. The apparatus of claim 17, wherein said at least one input interface is any member of the group comprising:
    - said at least one microphone;
      
      an analog communication interface; and
      
      a digital communication interface.
  - 20. The apparatus of claim 17, wherein said at least one output interface is any member of the group comprising:
    - a digital communication interface; and
      
      an audio output interface.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Shechtman, Slava
Primary Examiner(s)
Singh, Satwant

Application Number

US14/040,765
Publication Number

US 20150095035A1
Time in Patent Office

820 Days
Field of Search

704/266, 704/209, 704/233, 704/270, 704/205, 704/220, 704/264, 704/500
US Class Current

1/1
CPC Class Codes

G10L 19/02   using spectral analysis, e....

G10L 19/038   Vector quantisation, e.g. T...

G10L 19/093   using sinusoidal excitation...

Wideband speech parameterization for high quality synthesis, transformation and quantization

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

18 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Wideband speech parameterization for high quality synthesis, transformation and quantization

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

18 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links