Vector adaptive predictive coder for speech and audio

US 4,969,192 A
Filed: 04/06/1987
Issued: 11/06/1990
Est. Priority Date: 04/06/1987
Status: Expired due to Term

First Claim

Patent Images

1. An improvement in the method for compressing digitally encoded input speech or audio vectors at a transmitter by using a scaling unit controlled by a quantized residual gain factor QG, a synthesis filter controlled by a set of quantized linear protective coefficient parameters QLPC, a pitch predictor controlled by pitch and pitch predictor parameters QP and QPP, a weighting filter controlled by a set of perceptual weighting parameters W, and a permanent indexed codebook containing a predetermined number M of codebook vectors, each having an assigned codebook index, to find an index which identifies the best match between an input speech or audio vector s_n that is to be coded and a synthesized vector s_n generated from a stored vector in said indexed codebook, wherein each of said digitally encoded input vectors consists of a predetermined number K of digitally coded samples, comprising the steps ofbuffering and grouping said input speech or audio vectors into frames of vectors with a predetermined number N of vectors in each frame,performing an initial analysis for each successive frame, said analysis including the computation of a residual gain factor G, a set of perceptual weighting parameters W, a pitch parameter P, a pitch predictor parameter PP, and a set of said linear predictive coefficient parameters LPC, and the computation of quantized values QG, QP, QPP and QLPC of parameters G, P, PP and LPC using one or more indexed quantizing tables for the computation of each quantized parameter or set of parametersfor each frame transmitting indices of said quantized parameters QG, QP, QPP and QLPC determined in the initial analysis step as side information about vectors analyzed for later use in looking up in one or more identical tables said quantized parameters QG, QP QPP and QLPC while reconstructing speech and audio vectors from encoded vectors in a frame, where each index for a quantized parameter points to a location in one or more of said identical tables where said quantized parameter may be found,computing a zero-state response vector from the vector output of a zero-input response filter comprising a scaling unit, synthesis filter and weighting filter identical in operation to said scaling unit, synthesis filter and weighting filter used for encoding said input vectors, said zero-state response vector being computed for each vector in said permanent codebook by first setting to zero the initial condition of said zero-input response filter so that the response computed is not influenced by a preceding one of said codebook vectors processed by said zero-input response filter, and the using said quanitized values of said residual gain factor, set of linear predictive coefficient parameters, and said set of perceptual weighting parameters computed in said initial analysis step by processing each vector in said permanent codebook through said zero-input response filter to compute a zero-state response vector, and storing each zero-state response vector computed in a zero-state response codebook at or together with an index corresponding to the index of said vector in said permanent codebook used for this zero-state response computation step, andafter thus performing an initial analysis of and computing a zero-state response codebook for each successive frame of input speech or audio vectors, encode each input vector s_n of a frame in sequence by transmitting the codebook index of the vector in said permanent codebook which corresponds to the index of a zero-state response vector in said zero-state response codebook that best matches a vector v_n obtained from an input vector s_n bysubtracting a long term pitch prediction vector s_n from the input vector s_n to produce a difference vector d_n and filtering said difference vector d_n by said perceptual weighting filter to produce a final input vector f_n, where said long term pitch prediction s_n is computed by taking a vector from said permanent codebook at the address specified by the preceding particular index transmitted as a compressed vector code and performing gain scaling of this vector using said quantized gain factor QG, then synthesis filtering the vector obtained from said scaling using said quantized values QLPC of said set of linear predictive coefficient parameters to obtain a vector d_n and from vector d_n producing a long term pitch predicted vector s_n of the next input vector s_n through a pitch synthesis filter using said quantized values of pitch predictor parameters QP and QPP, said long term prediction vector s_n being a prediction of the next input vector s_n, andproducing said vector v_n by subtracting from said final input vector f_n the vector output of said zero-input response filter generated in response to a permanent codebook vector at the codebook address of the last transmitted index code, said vector output being generated by processing through said zero input response filter, said permanent codebook vector located at said last transmitted index code where the output of said zero input response filter is discarded while said permanent codebook vector located at said last transmitted index code is being processed sample by sample in sequence into said zero input response filter until all samples of said codebook vector have been entered, and where the input of said zero input response filter is interrupted after all samples of said codebook vector have been entered and then the desired vector output from said zero-input response filter is processed out sample by sample for subtraction from said final vector v_n, andfor each input vector s_n in a frame, finding the vector stored in said zero-state response codebook which best matches the vector v_n, thereby finding the best match of a codebook vector with an input vector, using an estimate vector s_n produced from the best match codebook vector found for the preceding input vector,having found the best match of said vector v_n with a zero-state response vector in said zero-state response codebook for an input speech or audio vector s_n, transmit the zero-state response codebook index of the current best-match zero-state response vector as a compressed vector code of the current input vector, and also use said index of the current best-match zero-state response vector to select a vector from said permanent codebook for computing said long term pitch predicted input vector s_n to be subtracted from the next input vector s_n of the frame.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A real-time vector adaptive predictive coder which approximates each vector of K speech samples by using each of M fixed vectors in a first codebook to excite a time-varying synthesis filter and picking the vector that minimizes distortion. Predictive analysis for each frame determines parameters used for computing from vectors in the first codebook zero-state response vectors that are stored at the same address (index) in a second codebook. Encoding of input speech vectors s_n is then carried out using the second codebook. When the vector that minimizes distortion is found, its index is transmitted to a decoder which has a codebook identical to the first codebook of the decoder. There the index is used to read out a vector that is used to synthesize an output speech vector s_n. The parameters used in the encoder are quantized, for example by using a table, and the indices are transmitted to the decoder where they are decoded to specify transfer characteristics of filters used in producing the vector s_n from the receiver codebook vector selected by the vector index transmitted.

Citations

12 Claims

1. An improvement in the method for compressing digitally encoded input speech or audio vectors at a transmitter by using a scaling unit controlled by a quantized residual gain factor QG, a synthesis filter controlled by a set of quantized linear protective coefficient parameters QLPC, a pitch predictor controlled by pitch and pitch predictor parameters QP and QPP, a weighting filter controlled by a set of perceptual weighting parameters W, and a permanent indexed codebook containing a predetermined number M of codebook vectors, each having an assigned codebook index, to find an index which identifies the best match between an input speech or audio vector s_n that is to be coded and a synthesized vector s_n generated from a stored vector in said indexed codebook, wherein each of said digitally encoded input vectors consists of a predetermined number K of digitally coded samples, comprising the steps ofbuffering and grouping said input speech or audio vectors into frames of vectors with a predetermined number N of vectors in each frame,performing an initial analysis for each successive frame, said analysis including the computation of a residual gain factor G, a set of perceptual weighting parameters W, a pitch parameter P, a pitch predictor parameter PP, and a set of said linear predictive coefficient parameters LPC, and the computation of quantized values QG, QP, QPP and QLPC of parameters G, P, PP and LPC using one or more indexed quantizing tables for the computation of each quantized parameter or set of parametersfor each frame transmitting indices of said quantized parameters QG, QP, QPP and QLPC determined in the initial analysis step as side information about vectors analyzed for later use in looking up in one or more identical tables said quantized parameters QG, QP QPP and QLPC while reconstructing speech and audio vectors from encoded vectors in a frame, where each index for a quantized parameter points to a location in one or more of said identical tables where said quantized parameter may be found,computing a zero-state response vector from the vector output of a zero-input response filter comprising a scaling unit, synthesis filter and weighting filter identical in operation to said scaling unit, synthesis filter and weighting filter used for encoding said input vectors, said zero-state response vector being computed for each vector in said permanent codebook by first setting to zero the initial condition of said zero-input response filter so that the response computed is not influenced by a preceding one of said codebook vectors processed by said zero-input response filter, and the using said quanitized values of said residual gain factor, set of linear predictive coefficient parameters, and said set of perceptual weighting parameters computed in said initial analysis step by processing each vector in said permanent codebook through said zero-input response filter to compute a zero-state response vector, and storing each zero-state response vector computed in a zero-state response codebook at or together with an index corresponding to the index of said vector in said permanent codebook used for this zero-state response computation step, andafter thus performing an initial analysis of and computing a zero-state response codebook for each successive frame of input speech or audio vectors, encode each input vector s_n of a frame in sequence by transmitting the codebook index of the vector in said permanent codebook which corresponds to the index of a zero-state response vector in said zero-state response codebook that best matches a vector v_n obtained from an input vector s_n bysubtracting a long term pitch prediction vector s_n from the input vector s_n to produce a difference vector d_n and filtering said difference vector d_n by said perceptual weighting filter to produce a final input vector f_n, where said long term pitch prediction s_n is computed by taking a vector from said permanent codebook at the address specified by the preceding particular index transmitted as a compressed vector code and performing gain scaling of this vector using said quantized gain factor QG, then synthesis filtering the vector obtained from said scaling using said quantized values QLPC of said set of linear predictive coefficient parameters to obtain a vector d_n and from vector d_n producing a long term pitch predicted vector s_n of the next input vector s_n through a pitch synthesis filter using said quantized values of pitch predictor parameters QP and QPP, said long term prediction vector s_n being a prediction of the next input vector s_n, andproducing said vector v_n by subtracting from said final input vector f_n the vector output of said zero-input response filter generated in response to a permanent codebook vector at the codebook address of the last transmitted index code, said vector output being generated by processing through said zero input response filter, said permanent codebook vector located at said last transmitted index code where the output of said zero input response filter is discarded while said permanent codebook vector located at said last transmitted index code is being processed sample by sample in sequence into said zero input response filter until all samples of said codebook vector have been entered, and where the input of said zero input response filter is interrupted after all samples of said codebook vector have been entered and then the desired vector output from said zero-input response filter is processed out sample by sample for subtraction from said final vector v_n, andfor each input vector s_n in a frame, finding the vector stored in said zero-state response codebook which best matches the vector v_n, thereby finding the best match of a codebook vector with an input vector, using an estimate vector s_n produced from the best match codebook vector found for the preceding input vector,having found the best match of said vector v_n with a zero-state response vector in said zero-state response codebook for an input speech or audio vector s_n, transmit the zero-state response codebook index of the current best-match zero-state response vector as a compressed vector code of the current input vector, and also use said index of the current best-match zero-state response vector to select a vector from said permanent codebook for computing said long term pitch predicted input vector s_n to be subtracted from the next input vector s_n of the frame.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. An improvement as defined in claim 1, including a method for reconstructing said input speech or audio vectors from index coded vectors at a receiver, comprised of decoding said side information transmitted for each frame of index coded vectors, using the indices received to address a permanent codebook identical to said permanent codebook in said transmitter to successively obtain decoded vectors, scaling said decoded vectors by said quantized gain factor QG, and performing synthesis filtering using said set of linear predictive coefficient parameters and pitch prediction filtering using said quantized pitch parameters QP and QPP to produce approximation vectors s_n of the original signal vectors s_n.
  - 3. An improvement as defined in claim 2 wherein said receiver includes postfiltering of said approximation vectors s_n by long-delay postfiltering and short-delay postfiltering in cascade, said quantized pitch and quantized pitch predictor parameters controlling said long-term postfiltering and said quantized linear predictive coefficient parameters controlling said short-term postfiltering, whereby adaptive postfiltered digitally encoded speech or audio vectors are provided.
  - 4. An improvement as defined in claim 3 including automatic gain control of the adaptive postfiltered digitally encoded speech or audio signal is provided by estimating the square root of the power of said postfiltered speech or audio signal to obtain a value σ
    - _a (n) of said postfiltered speech or audio signal and estimating the square root of the power of a postfiltering speech or audio signal input to obtain a value σ
      
      ₁ (n) of decoded input speech or audio vectors before postfiltering, and controlling the gain of the postfiltered speech or audio output signal by a scaling factor that is a ratio of σ
      
      ₁ (n) to σ
      
      ₂ (n).
  - 5. An improvement as defined in claim 4 wherein said quantized gain factor, quantized pitch and quantized pitch predictor parameters, and quantized linear predictive coefficient parameters are derived from said side information transmitted to said receiver.
  - 6. An improvement as defined in claim 3 wherein postfiltering is accomplished by using a transfer function for said long-delay postfilter of the form ##EQU8## where C_g is an adaptive scaling factor, p is the quantized value QP of the pitch parameter P, and the factors γ
    - and λ
      
      are determined according to the following formulas
      space="preserve" listing-type="equation">γ
      
      =C.sub.z (x), λ
      
      =C.sub.p f(x), 0<
      
      C.sub.z, C.sub.p<
      
      1
      where C_z and C_p are fixed scaling factors, ##EQU9## U_th is an unvoiced threshold value, and x is a voicing indicator parameter that is a function of coefficients b₁, b₂ and b₃, where b₁, b₂, b₃ are coefficients of said quantized pitch predictor QPP given by P₁ (z)=1-b₁ z^-p+1 -b₂ z^-p -b₃ z^-p-1 where z is the inverse of the input delay operator z^-1 used in the z transform representation of transfer functions.
  - 7. An improvement as defined in claim 6 wherein postfiltering is accomplished by using a transfer function for said short-delay postfilter of the form ##EQU10## where α
    - and β
      
      are bandwidth expansion coefficients.
  - 8. An improvement as defined in claim 7 wherein postfiltering further includes in cascade first-order filtering with a transfer function
    
    space="preserve" listing-type="equation">1-μ
    
    z.sup.-1, μ
    
    <
    
    1
    where μ
    
    is a coefficient.

9. A postfiltering method for enhancing digitally processed speech or audio signals comprising the stepsof buffering said speech or audio signals into frames of vectors, each vector having K successive samples,performing analysis of said buffered frames of speech or audio signals in predetermined blocks to compute linear predictive coefficients, pitch and pitch predictor parameters, andfiltering each vector with long-delay and short-delay postfiltering in cascade, said long-delay postfiltering being controlled by said pitch and pitch predictor parameters and said short-delay postfiltering being controlled by said linear predictive coefficient parameters, wherein postfiltering is accomplished by using a transfer function for said short-delay postfilter of the form ##EQU11## where z is the inverse of the unit delay operator z^-1 used in the z transform representation of transfer functions, and α
- and β
  
  are fixed scaling factors.
- View Dependent Claims (10, 11, 12)
- - 10. A postfiltering method as defined in claim 9 including automatic gain control of the postfiltered digitally encoded speech or audio signal provided by estimating the square root of the power of said postfiltered digitally encoded speech or audio signal to obtain a value σ
    - ₂ (n) of said postfiltered speech signal and estimating the square root of the power of a postfiltering input speech or audio signal to obtain a value σ
      
      ₁ (n) of decoded input speech or audio signal before postfiltering, and controlling the gain of the postfiltered speech or audio signal by a scaling factor that is a ratio of σ
      
      ₁ (n) to σ
      
      ₂ (n).
  - 11. A postfiltering method as defined in claim 10 wherein postfiltering is accomplished by using a transfer function for said long-delay postfilter of the form ##EQU12## where C_g is an adaptive scaling factor, p is the quantized value of the pitch parameter QP and the factors γ
    - and λ
      
      are adaptive bandwidth expansion parameters determined according to the following formulas
      space="preserve" listing-type="equation">γ
      
      =C.sub.z f(x), λ
      
      =C.sub.p f(x), 0<
      
      C.sub.z, C.sub.p <
      
      1
      where C_z and C_p are fixed scaling factors and ##EQU13## U_th is an unvoiced threshold value, and x is a voicing indicator that is a function of coefficients b₁, b₂, b₃ where b₁, b₂, b₃ are coefficients of said quantized pitch predictor QPP given by P₁ (z)=1-b₁ z^-p+1 -b₂ z^-p -b₃ z^-p-1 where z is the inverse of the input delay operator z^-1 used in the z transform representation of transfer functions.
  - 12. A postfiltering method as defined in claim 11 wherein postfiltering further includes in cascade first-order filtering with a transfer function
    
    space="preserve" listing-type="equation">1-μ
    
    z.sup.-1, μ
    
    <
    
    1
    where μ
    
    is a coefficient.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Voicecraft Incorporated
Original Assignee
Voicecraft Incorporated
Inventors
Chen, Juin-Hwey, Gersho, Allen
Primary Examiner(s)
NOT, DEFINED
Assistant Examiner(s)
Merecki, John A.

Application Number

US07/035,615
Time in Patent Office

1,310 Days
Field of Search

381/29-32, 381/36-41, 381/51, 375/122, 375/25-34
US Class Current

704/222
CPC Class Codes

G10L 19/06   Determination or coding of ...

G10L 19/083   the excitation function bei...

G10L 19/26   Pre-filtering or post-filte...

G10L 2019/0011   Long term prediction filter...

G10L 2019/0013   Codebook search algorithms

G10L 2019/0014   Selection criteria for dist...

Vector adaptive predictive coder for speech and audio

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Vector adaptive predictive coder for speech and audio

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links