Methods and systems for celp-based speech coding with fine grain scalability

US 20020133335A1
Filed: 09/13/2001
Published: 09/19/2002
Est. Priority Date: 03/13/2001
Status: Active Grant

First Claim

Patent Images

1. A method of encoding a speech signal in a code excited linear prediction (CELP)-based speech processing system that includes an adaptive codebook and a fixed codebook, wherein the speech signal is divided into frames and each frame is further divided into sequential sub-frames, the method comprising:

generating linear prediction coding (LPC) coefficients for a frame;

generating pitch-related information by using the adaptive codebook, for each sub-frame of the frame;

generating pulse-related information by using the fixed codebook, for a first sub-frame of the frame and for a second sub-frame of the frame;

generating a basic bit-stream from the LPC coefficients, the pitch-related information, and the pulse-related information for the first sub-frame; and

generating enhancement bits from the pulse-related information for the second sub-frame.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and systems for providing a CELP-based speech coding with fine grain scalability include a parameter encoder that generates a basic bit-stream from LPC coefficients for a frame, pitch-related information for all the sub-frames obtained by searching an adaptive codebook, and first pulse-related information for even sub-frames obtained by searching an fixed codebook. The parameter encoder also generates enhancement bits, which are preceded by the basic bit-stream, from second pulse-related information for odd sub-frames. The quality of synthesized speech is improved on a basis of one additional odd sub-frame pulse, as more of the second pulse-related information in the enhancement bits is received by a decoder.

Citations

19 Claims

1. A method of encoding a speech signal in a code excited linear prediction (CELP)-based speech processing system that includes an adaptive codebook and a fixed codebook, wherein the speech signal is divided into frames and each frame is further divided into sequential sub-frames, the method comprising:
- generating linear prediction coding (LPC) coefficients for a frame;
  
  generating pitch-related information by using the adaptive codebook, for each sub-frame of the frame;
  
  generating pulse-related information by using the fixed codebook, for a first sub-frame of the frame and for a second sub-frame of the frame;
  
  generating a basic bit-stream from the LPC coefficients, the pitch-related information, and the pulse-related information for the first sub-frame; and
  
  generating enhancement bits from the pulse-related information for the second sub-frame.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the basic bit-stream provides a minimum quality when synthesized into speech, and the enhancement bits improve the quality of the synthesized speech.
  - 3. The method of claim 1, wherein the first sub-frame and the second sub-frame alternate in the sequential sub-frames.
  - 4. The method of claim 2, further comprising providing an even sub-frame as the first sub-frame, and an odd sub-frame as the second sub-frame.
  - 5. The method of claim 1, further comprising placing the enhancement bits after the basic bit-stream.
  - 6. The method of claim 5, wherein the generating of pulse-related information for the second sub-frame includes generating information for a plurality of pulses, and in the enhancement bits, placing all information for one pulse before information of another pulse.
  - 7. The method of claim 1, further comprising:
    - using the pulse-related information in addition to the pitch-related information for the first sub-frame, for generating pitch-related information and pulse-related information for a succeeding sub-frame; and
      
      using the pitch-related information without the pulse-related information for the second sub-frame, for generating pitch-related information and pulse-related information for a succeeding sub-frame.
  - 8. The method of claim 1, further comprising:
    - searching the adaptive codebook and the fixed codebook to minimize a difference between a synthesized speech and a target signal, for generating the pitch-related information and the pulse-related information; and
      
      linearly attenuating a magnitude of samples in the target signal for the second sub-frame, the samples being as many as an order of a synthesizer outputting the synthesized speech.

9. A method of synthesizing speech in a code excited linear prediction (CELP)-based speech processing system that includes an adaptive codebook and a fixed codebook, wherein a speech signal is divided into frames and each frame is further divided into sub-frames, the method comprising:
- receiving a basic bit-stream which includes linear prediction coding (LPC) coefficients for a frame, pitch-related information for all sub-frames of the frame, and first pulse-related information for a part of the sub-frames;
  
  receiving enhancement bits which include a part or a whole of second pulse-related information for a remainder of the sub-frames;
  
  generating an excitation by referring to the adaptive codebook and the fixed codebook based on the pitch-related information included in the basic bit-stream and the first pulse-related information included in the basic bit-stream, respectively;
  
  generating an excitation by referring to the adaptive codebook and the fixed codebook based on the pitch-related information included in the basic bit-stream and the part or the whole of the second pulse-related information included in the enhancement bits, respectively; and
  
  outputting synthesized speech according to the excitations and the LPC coefficients.
- View Dependent Claims (10, 11, 12, 14, 15, 16, 18, 19)
- - 10. The method of claim 9, wherein an even sub-frame is the part of the sub-frames, and an odd sub-frame is the remainder of the sub-frames.
  - 11. The method of claim 9, wherein the second pulse-related information includes information for a plurality of pulses, and quality of the synthesized speech is improved each time information for one pulse is added to the enhancement bits received.
  - 12. The method of claim 9, further comprising:
    - feeding back the excitation generated from the first pulse-related information in addition to the pitch-related information, for generating an excitation for a succeeding sub-frame; and
      
      feeding back another excitation generated from the pitch-related information without the second pulse-related information, for generating an excitation for a succeeding sub-frame.
  - 14. The system according to claim 13, further comprising a transmitter for transmitting the basic bit-stream and a part of the enhancement bits onto a channel, the part being determined based on traffic of the channel.
  - 15. The system according to claim 13, wherein the pitch-related information is reused in the first portion for a succeeding sub-frame, the first pulse-related information being reused in addition to the pitch-related information, the second pulse-related information not being reused.
  - 16. The system according to claim 13, further comprising:
    - an analysis-by-synthesis loop including a synthesizer for searching the adaptive codebook and the fixed codebook to minimize a difference between a synthesized speech and a target signal; and
      
      a target signal processor for linearly attenuating a magnitude of samples in the target signal provided to the analysis-by-synthesis loop for the second kind of sub-frame, the samples being as many as an order of the synthesizer.
  - 18. The system according to claim 17, wherein the second pulse-related information includes information for a plurality of pulses, and the parameter decoder extracts, from the enhancement bits received, information for each pulse and provides the second portion with the information for each pulse.
  - 19. The system according to claim 17, wherein the excitation generated from the pitch-related information is fed back to the first portion for a succeeding sub-frame, the excitation generated from the first pulse-related information being fed back in addition to the excitation from the pitch-related information, the excitation generated from the second pulse-related information not being fed back.

13. A speech processing system based on code excited linear prediction (CELP) for encoding a speech signal, wherein the speech signal is divided into frames and each frame is further divided into sub-frames, the system comprising:
- a generator of linear prediction coding (LPC) coefficients for a frame;
  
  a first portion including an adaptive codebook for generating pitch-related information for each sub-frame of the frame;
  
  a second portion including a fixed codebook for generating pulse-related information for each sub-frame of the frame, the pulse-related information including first information for a first kind of sub-frame and second information for a second kind of sub-frame; and
  
  a parameter encoder for generating a basic bit-stream from the LPC coefficients, the pitch-related information, and the first pulse-related information, and for generating enhancement bits from the second pulse-related information.

17. A speech processing system based on code excited linear prediction (CELP) for synthesizing speech, wherein a speech signal is divided into frames and each frame is further divided into sub-frames, the system comprising:
- a parameter decoder for extracting linear prediction coding (LPC) coefficients for a frame, pitch-related information for all the sub-frames of the frame, and first pulse-related information for a part of the sub-frames, from a basic bit-stream received, and for extracting a part or a whole of second pulse-related information for a remainder of the sub-frames from enhancement bits received;
  
  a first portion including an adaptive codebook for generating an excitation based on the pitch-related information;
  
  a second portion including a fixed codebook for generating an excitation based on the first pulse-related information or based on the part or the whole of the second pulse-related information; and
  
  a synthesizer for outputting synthesized speech according to the excitations and the LPC coefficients.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Industrial Technology Research Institute
Original Assignee
Industrial Technology Research Institute
Inventors
Chen, Fang-Chu

Granted Patent

US 6,996,522 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/219
CPC Class Codes

G10L 19/10 the excitation function bei...

Methods and systems for celp-based speech coding with fine grain scalability

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and systems for celp-based speech coding with fine grain scalability

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links