Packet Loss Concealment for Speech Coding

US 20140156267A1
Filed: 02/07/2014
Published: 06/05/2014
Est. Priority Date: 12/26/2006
Status: Active Grant

First Claim

Patent Images

1. A method of improving packet loss concealment for speech coding while still profiting from a pitch prediction or Long-Term Prediction (LTP), the method comprising:

classifying a plurality of speech frames into a plurality of classes, andwherein at least for one of the classes, the following steps are included;

comparing a pitch cycle length with a subframe size within a speech frame when the subframe size is fixed or deciding a first subframe size based on a pitch cycle length within a speech frame when the first subframe size is variable;

having an LTP excitation component;

having a second excitation component;

determining an initial energy of the LTP excitation component for every subframe within a frame of speech signal by using a regular method of minimizing a coding error or a weighted coding error at an encoder;

reducing or limiting the energy of the LTP excitation component to be smaller than the initial energy of the LTP excitation component for the first subframe or the first two subframes within the frame based at least in part on the pitch cycle length compared to the subframe size;

keeping the energy of the LTP excitation component to be equal to the initial energy of the LTP excitation component for any other subframe rather than the first subframe or the first two subframes within the frame;

encoding the energy of the LTP excitation component for every subframe of the frame at the encoder; and

forming an excitation by including the LTP excitation component and the second excitation component.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech coding method of reducing error propagation due to voice packet loss, is achieved by limiting or reducing a pitch gain only for the first subframe or the first two subframes within a speech frame. The method is used for a voiced speech class. A pitch cycle length is compared to a subframe size to decide to reduce the pitch gain for the first subframe or the first two subframes within the frame. A strongly voiced class is decided by checking if the pitch lags are stable and the pitch gains are high enough with the frame; for the strongly voiced frame, the pitch lags and the pitch gains can be encoded more efficiently than other speech classes.

Citations

17 Claims

1. A method of improving packet loss concealment for speech coding while still profiting from a pitch prediction or Long-Term Prediction (LTP), the method comprising:
- classifying a plurality of speech frames into a plurality of classes, andwherein at least for one of the classes, the following steps are included;
  
  comparing a pitch cycle length with a subframe size within a speech frame when the subframe size is fixed or deciding a first subframe size based on a pitch cycle length within a speech frame when the first subframe size is variable;
  
  having an LTP excitation component;
  
  having a second excitation component;
  
  determining an initial energy of the LTP excitation component for every subframe within a frame of speech signal by using a regular method of minimizing a coding error or a weighted coding error at an encoder;
  
  reducing or limiting the energy of the LTP excitation component to be smaller than the initial energy of the LTP excitation component for the first subframe or the first two subframes within the frame based at least in part on the pitch cycle length compared to the subframe size;
  
  keeping the energy of the LTP excitation component to be equal to the initial energy of the LTP excitation component for any other subframe rather than the first subframe or the first two subframes within the frame;
  
  encoding the energy of the LTP excitation component for every subframe of the frame at the encoder; and
  
  forming an excitation by including the LTP excitation component and the second excitation component.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein encoding the energy of the LTP excitation component comprises encoding a gain factor.
  - 3. The method of claim 2 further comprising:
    - limiting or reducing the value of the gain factor for the first subframe or the first two subframes to be smaller than 1; and
      
      compensating for coding quality loss due to the gain factor reduction by increasing coding bit rate of the second excitation component of the first subframe or the first two subframes to be larger than coding bit rate of the second excitation component of any other subframe within the frame.
  - 4. The method of claim 2, further comprising:
    - limiting or reducing the value of the gain factor for the first subframe or the first two subframes to be smaller than 1; and
      
      compensating for coding quality loss due to the gain factor reduction by adding one more stage of excitation component to the second excitation component for the first subframe or the first two subframes rather than the other subframes within the frame.
  - 5. The method of claim 1, wherein the initial energy of the LTP excitation component and the second excitation component are determined by using an analysis-by-synthesis approach.
  - 6. The method of claim 5, comprising a Code-Excited Linear Prediction (CELP) methodology.
  - 7. The method of claim 1, wherein the energy limitation or reduction of the LTP excitation component for the first subframe or the first two subframes within the frame is employed for voiced speech and is not employed for unvoiced speech.

8. A method of efficiently encoding a voiced frame, the method comprising:
- classifying a plurality of speech frames into a plurality of classes, andwherein at least for one of the classes, the following steps are included;
  
  having a Long-Term Prediction (LTP) excitation component;
  
  having a second excitation component;
  
  encoding an energy of the LTP excitation component by encoding a pitch gain;
  
  checking whether a pitch track or pitch lags within the voiced frame are stable from one subframe to a next subframe;
  
  checking whether the voiced frame is strongly voiced by checking whether pitch gains within the voiced frame are high;
  
  encoding the pitch lags or the pitch gains efficiently by a differential coding from one subframe to a next subframe when the voiced frame is strongly voiced and the pitch lags are stable; and
  
  forming an excitation by including the LTP excitation component and the second excitation component.
- View Dependent Claims (9, 10)
- - 9. The method of claim 8, wherein the energy of the LTP excitation component and the second excitation component are determined by using an analysis-by-synthesis approach.
  - 10. The method of claim 8, comprising a Code-Excited Linear Prediction (CELP) methodology.

11. A non-transitory computer-readable medium having computer implementable instructions stored thereon for execution by a processor, wherein the instructions are executed to implement a method of improving packet loss concealment for speech coding while still profiting from a pitch prediction or Long-Term Prediction (LTP), the method comprising:
- classifying a plurality of speech frames into a plurality of classes, andwherein at least for one of the classes, the following steps are included;
  
  comparing a pitch cycle length with a subframe size within a speech frame when the subframe size is fixed or deciding a first subframe size based on a pitch cycle length within a speech frame when the first subframe size is variable;
  
  having an LTP excitation component;
  
  having a second excitation component;
  
  determining an initial energy of the LTP excitation component for every subframe within a frame of speech signal by using a regular method of minimizing a coding error or a weighted coding error at an encoder;
  
  reducing or limiting the energy of the LTP excitation component to be smaller than the initial energy of the LTP excitation component for the first subframe or the first two subframes within the frame based at least in part on the pitch cycle length compared to the subframe size;
  
  keeping the energy of the LTP excitation component to be equal to the initial energy of the LTP excitation component for any other subframe rather than the first subframe or the first two subframes within the frame;
  
  encoding the energy of the LTP excitation component for every subframe of the frame at the encoder; and
  
  forming an excitation by including the LTP excitation component and the second excitation component.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The non-transitory computer-readable medium of claim 11, wherein encoding the energy of the LTP excitation component comprises encoding a gain factor.
  - 13. The non-transitory computer-readable medium of claim 12, wherein the method further comprises:
    - limiting or reducing the value of the gain factor for the first subframe or the first two subframes to be smaller than 1; and
      
      compensating for coding quality loss due to the gain factor reduction by increasing coding bit rate of the second excitation component of the first subframe or the first two subframes to be larger than coding bit rate of the second excitation component of any other subframe within the frame.
  - 14. The non-transitory computer-readable medium of claim 12, wherein the method further comprises:
    - limiting or reducing the value of the gain factor for the first subframe or the first two subframes to be smaller than 1; and
      
      compensating for coding quality loss due to the gain factor reduction by adding one more stage of excitation component to the second excitation component for the first subframe or the first two subframes rather than the other subframes within the frame.
  - 15. The non-transitory computer-readable medium of claim 11, wherein the initial energy of the LTP excitation component and the second excitation component are determined by using an analysis-by-synthesis approach.
  - 16. The non-transitory computer-readable medium of claim 15, comprising a Code-Excited Linear Prediction (CELP) methodology.
  - 17. The non-transitory computer-readable medium of claim 11, wherein the energy limitation or reduction of the LTP excitation component for the first subframe or the first two subframes within the frame is employed for voiced speech and is not employed for unvoiced speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Huawei Technologies Co., Ltd. (Huawei Investment & Holding Co., Ltd.)
Original Assignee
Huawei Technologies Co., Ltd. (Huawei Investment & Holding Co., Ltd.)
Inventors
Gao, Yang

Granted Patent

US 9,336,790 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/207
CPC Class Codes

G10L 19/005   Correction of errors induce...

G10L 19/083   the excitation function bei...

G10L 19/09   Long term prediction, i.e. ...

G10L 19/22   Mode decision, i.e. based o...

Packet Loss Concealment for Speech Coding

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Packet Loss Concealment for Speech Coding

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links