System and method for mixed codebook excitation for speech coding

US 9,972,325 B2
Filed: 02/15/2013
Issued: 05/15/2018
Est. Priority Date: 02/17/2012
Status: Active Grant

First Claim

Patent Images

1. A method of encoding an audio/speech signal, the method comprising:

for each frame in an incoming audio/speech signal having a low bit rate, determining a mixed excitation and an adaptive codebook excitation based on the incoming audio/speech signal, the mixed excitation comprising a sum of a first excitation entry from a first codebook and a second excitation entry from a second codebook, wherein the first and second codebooks are both fixed but different codebooks, wherein the adaptive excitation comprises an entry from an adaptive codebook, wherein the first codebook comprises pulse-like entries, wherein the pulse-like entries comprise non-periodic, signed, and unit magnitude pulses specially designed for an Algebraic Code-Excited Linear Prediction (ACELP) speech coding algorithm, and the second codebook comprises noise-like entries, wherein determining the mixed excitation is performed in time domain;

applying a first filter to the first excitation entry from the first codebook;

applying a second filter to the second excitation entry from the second codebook, the second filter being different from the first filter;

for each subframe in each frame in the incoming audio/speech signal, searching pulse-like entries in the first codebook, by using an Analysis-By-Synthesis searching approach, to find an entry that minimizes a weighted error between a synthesized speech and the incoming audio/speech signal, and coding an index of the entry to obtain at least one coded excitation index;

generating an encoded audio signal based on the determined mixed excitation and the adaptive codebook excitation; and

transmitting the at least one coded excitation index of the determined mixed excitation, wherein the determining and generating are performed using a hardware-based audio encoder.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In accordance with an embodiment, a method of encoding an audio/speech signal includes determining a mixed codebook vector based on an incoming audio/speech signal, where the mixed codebook vector includes a sum of a first codebook entry from a first codebook and a second codebook entry from a second codebook. The method further includes generating an encoded audio signal based on the determined mixed codebook vector, and transmitting a coded excitation index of the determined mixed codebook vector.

34 Citations

View as Search Results

23 Claims

1. A method of encoding an audio/speech signal, the method comprising:
- for each frame in an incoming audio/speech signal having a low bit rate, determining a mixed excitation and an adaptive codebook excitation based on the incoming audio/speech signal, the mixed excitation comprising a sum of a first excitation entry from a first codebook and a second excitation entry from a second codebook, wherein the first and second codebooks are both fixed but different codebooks, wherein the adaptive excitation comprises an entry from an adaptive codebook, wherein the first codebook comprises pulse-like entries, wherein the pulse-like entries comprise non-periodic, signed, and unit magnitude pulses specially designed for an Algebraic Code-Excited Linear Prediction (ACELP) speech coding algorithm, and the second codebook comprises noise-like entries, wherein determining the mixed excitation is performed in time domain;
  
  applying a first filter to the first excitation entry from the first codebook;
  
  applying a second filter to the second excitation entry from the second codebook, the second filter being different from the first filter;
  
  for each subframe in each frame in the incoming audio/speech signal, searching pulse-like entries in the first codebook, by using an Analysis-By-Synthesis searching approach, to find an entry that minimizes a weighted error between a synthesized speech and the incoming audio/speech signal, and coding an index of the entry to obtain at least one coded excitation index;
  
  generating an encoded audio signal based on the determined mixed excitation and the adaptive codebook excitation; and
  
  transmitting the at least one coded excitation index of the determined mixed excitation, wherein the determining and generating are performed using a hardware-based audio encoder.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, wherein determining the mixed excitation comprises:
    - computing first correlations between a filtered target vector and filtered entries in the first codebook, wherein the filtered target vector is based on the incoming audio signal;
      
      determining a first group of highest first correlations;
      
      computing second correlations between a filtered target vector and filtered entries in the second codebook;
      
      determining a second group of highest second correlations; and
      
      computing a first criterion function of combinations of the first and second groups, wherein the first criterion function comprises a function of one of the first group of highest first correlations, one of the second group of highest second correlations and an energy of corresponding entries from the first codebook and the second codebook.
  - 3. The method of claim 2, further comprising:
    - determining a third group of candidate correlations based on a highest computed first criterion functions; and
      
      selecting the mixed excitation based on applying a second criterion function to the third group, wherein the mixed excitation corresponds to codebook entries from the first codebook and the second codebook associated with a highest value of the second criterion function.
  - 4. The method of claim 3, wherein:
    - the first criterion function is
  - 5. The method of claim 2, wherein selecting the mixed excitation based on a highest computed first criterion function.
  - 6. The method of claim 5, wherein the first criterion function is
  - 7. The method of claim 2, further comprising calculating energies of the corresponding entries from the first codebook and the second codebook.
  - 8. The method of claim 2, wherein the energy of corresponding entries from the first codebook and the second codebook are stored in memory.
  - 9. The method of claim 2, wherein the first group comprises more entries than the second group.
  - 10. The method of claim 1, wherein the first filter appliesa first emphasis function to the first excitation entry, and wherein the second filter appliesa second emphasis function to the second excitation entry.
  - 11. The method of claim 10, wherein:
    - the first filter comprises a low pass filtering function; and
      
      the second filter comprises a high pass filtering function.
  - 12. The method of claim 1, wherein the hardware-based audio encoder comprises a processor.
  - 13. The method of claim 1, wherein the hardware-based audio encoder comprises dedicated hardware.

14. A system for encoding an audio/speech signal, the system comprising:
- a hardware-based audio coder configured to;
  
  for each frame in an incoming audio/speech signal having a low bit rate, determine a mixed excitation and an adaptive codebook excitation based on the incoming audio/speech signal, the mixed excitation comprising a sum of a first excitation entry from a pulse-like codebook and a second excitation entry from a noise-like codebook, wherein the pulse-like codebook and the noise-like codebook are both fixed but different codebooks, wherein the adaptive excitation comprises an entry from an adaptive codebook, wherein the pulse-like codebook comprises non-periodic, signed, and unit magnitude pulses specially designed for an Algebraic Code-Excited Linear Prediction (ACELP) speech coding algorithm, wherein the mixed excitation is configured to be determined in time domain;
  
  apply a first filter to the first excitation entry from the pulse-like codebook;
  
  apply a second filter to the second excitation entry from the noise-like codebook, the second filter being different from the first filter;
  
  for each subframe in each frame in the incoming audio/speech signal, search pulse-like entries in the pulse-like codebook, by using an Analysis-By-Synthesis searching approach, to find an entry that minimizes a weighted error between a synthesized speech and the incoming audio/speech signal, and coding an index of the entry to obtain at least one coded excitation index;
  
  generate an encoded audio/speech signal based on the determined mixed excitation and the adaptive codebook excitation; and
  
  transmit the at least one coded excitation index of the determined mixed excitation, wherein the hardware-based audio coder is a code excited linear prediction technique coder.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. The system of claim 14, wherein the hardware-based audio coder is further configured to:
    - compute first correlations between a filtered target vector and entries in the pulse-like codebook, wherein the filtered target vector is based on the incoming audio signal;
      
      determine a first group of highest first correlations;
      
      compute correlations between a filtered target vector and entries in the noise-like codebook;
      
      determine a second group of highest second correlations; and
      
      compute a first criterion function of combinations of first and second groups, wherein the first criterion function comprises a function of one of the first group of highest first correlations, one of the second group of highest second correlations and an energy of corresponding entries from the pulse-like codebook and the noise-like codebook.
  - 16. The system of claim 15, further comprising a memory configured to store values of the energy of corresponding entries from the pulse-like codebook and the noise-like codebook.
  - 17. The system of claim 15, wherein the hardware-based audio coder is further configured to select the mixed excitation based on a highest computed first criterion function.
  - 18. The system of claim 15, wherein the first criterion function is
  - 19. The system of claim 14, wherein the hardware-based audio coder comprises a processor.
  - 20. The system of claim 14, wherein the hardware-based audio coder comprises dedicated hardware.

21. A fast search method of a mixed codebook for encoding an audio/speech signal, the method comprising:
- determining a mixed excitation based on an incoming audio/speech signal, the mixed excitation comprising a sum of a first excitation entry from a first codebook and a second excitation entry from a second codebook, wherein the first codebook comprises pulse-like entries, wherein the pulse-like entries comprise pulses specially designed for an Algebraic Code-Excited Linear Prediction (ACELP) speech coding algorithm, and the second codebook comprises noise-like entries, wherein determining the mixed excitation is performed in time domain;
  
  computing first correlations between a filtered target vector and filtered entries in the first codebook, wherein the filtered target vector is based on the incoming audio signal;
  
  determining a first group of highest first correlations;
  
  computing correlations between a filtered target vector and filtered entries in the second codebook;
  
  determining a second group of highest second correlations;
  
  computing a first criterion function of combinations of the first and second groups, wherein the first criterion function comprises a function of one of the first group of highest first correlations, one of the second group of highest second correlations and an energy of corresponding entries from the first codebook and the second codebook;
  
  determining a third group of candidate correlations based on highest computed first criterion functions;
  
  selecting the mixed excitation based on applying a second criterion function to the third group, wherein the mixed excitation corresponds to codebook entries from the first codebook and the second codebook associated with a highest value of the second criterion function;
  
  coding an index of the entry from the first codebook of the selected mixed excitation to obtain at least one coded excitation index;
  
  generating an encoded audio signal based on the determined mixed excitation; and
  
  transmitting the at least one coded excitation index of the determined mixed excitation, wherein the determining and generating are performed using a hardware-based audio encoder.
- View Dependent Claims (22, 23)
- - 22. The method of claim 21, wherein:
    - the first criterion function is
  - 23. The method of claim 21, wherein the first codebook comprises a pulse-like codebook and the second codebook comprises a noise-like codebook.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Huawei Technologies Co., Ltd. (Huawei Investment & Holding Co., Ltd.)
Original Assignee
Huawei Technologies Co., Ltd. (Huawei Investment & Holding Co., Ltd.)
Inventors
Gao, Yang
Primary Examiner(s)
Patel, Shreyans A

Application Number

US13/768,814
Publication Number

US 20130218578A1
Time in Patent Office

1,915 Days
Field of Search
US Class Current
CPC Class Codes

G10L 19/00 Speech or audio signals ana...

G10L 19/12 the excitation function bei...

System and method for mixed codebook excitation for speech coding

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

34 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for mixed codebook excitation for speech coding

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

34 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links