Speech restoration system and method for concealing packet losses

US 7,302,385 B2
Filed: 07/07/2003
Issued: 11/27/2007
Est. Priority Date: 07/07/2003
Status: Expired due to Fees

First Claim

Patent Images

1. A speech restoration system for concealing packet losses, the system comprising:

a demultiplexer that demultiplexes an input bit stream and divides the input bit stream into several packets;

a packet loss concealing unit that produces and outputs a linear spectrum pair (LSP) coefficient representing the vocal tract of voice and an excitation signal corresponding to a lost frame, when a packet loss occurs; and

a speech restoring unit that synthesizes voice using the packets input from the demultiplexer, outputs the result as restored voice, and synthesizes voice corresponding to a lost packet using the LSP coefficient and the excitation signal input from the packet loss concealing unit and outputs the result as restored voice when the lost packet is detected,wherein the packet loss concealing unit repeats linear prediction coefficients (LPCs) of a last-received valid frame, produces a first excitation signal for the lost frame using a time scale modification (TSM) method, and outputs the first excitation signal to the speech restoring unit, when the lost frame is voiceless, and produces a second excitation signal by re-estimating a gain parameter based on the first excitation signal and outputs the second excitation signal to the speech restoring unit, when the lost frame is voiced.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Provided are a speech restoration system and method for concealing packet losses. The system includes a demultiplexer that demultiplexes an input bit stream and divides the input bit stream into several packets; a packet loss concealing unit that produces and outputs a linear spectrum pair (LSP) coefficient representing the vocal tract of voice and an excitation signal corresponding to a lost frame, when a packet loss occurs; and a speech restoring unit that synthesizes voice using the packets input from the demultiplexer, outputs the result as restored voice, and synthesizes voice corresponding to a lost packet using the LSP coefficient and the excitation signal input from the packet loss concealing unit and outputs the result as restored voice when the lost packet is detected, wherein the packet loss concealing unit repeats linear prediction coefficients (LPCs) of a last-received valid frame, produces a first excitation signal for the lost frame using a time scale modification (TSM) method, when the lost frame is voiceless, and produces a second excitation signal by re-estimating a gain parameter based on the first excitation signal, when the lost frame is voiced.

17 Citations

View as Search Results

15 Claims

1. A speech restoration system for concealing packet losses, the system comprising:
- a demultiplexer that demultiplexes an input bit stream and divides the input bit stream into several packets;
  
  a packet loss concealing unit that produces and outputs a linear spectrum pair (LSP) coefficient representing the vocal tract of voice and an excitation signal corresponding to a lost frame, when a packet loss occurs; and
  
  a speech restoring unit that synthesizes voice using the packets input from the demultiplexer, outputs the result as restored voice, and synthesizes voice corresponding to a lost packet using the LSP coefficient and the excitation signal input from the packet loss concealing unit and outputs the result as restored voice when the lost packet is detected,wherein the packet loss concealing unit repeats linear prediction coefficients (LPCs) of a last-received valid frame, produces a first excitation signal for the lost frame using a time scale modification (TSM) method, and outputs the first excitation signal to the speech restoring unit, when the lost frame is voiceless, and produces a second excitation signal by re-estimating a gain parameter based on the first excitation signal and outputs the second excitation signal to the speech restoring unit, when the lost frame is voiced.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The system of claim 1, wherein the packet loss concealing unit comprises:
    - an LSP concealing unit that produces and outputs a LSP coefficient so as to indicate the vocal tract of voice for the lost frame, based on the LSP coefficient of the last-received valid frame;
      
      a determination unit that determines whether voice is voiced or voiceless from a long-period prediction gain of the last-received valid frame, the voice indicated by a code train corresponding to the lost frame; and
      
      an excitation signal concealing unit that performs TSM on an excitation signal produced to replace the lost frame by repeating the LPCs of the last-received valid frame in order to produce the first excitation signal, when the lost frame is voiceless, and produces the second excitation signal by re-estimating a gain parameter based on the first excitation signal, when the lost frame is voiced.
  - 3. The system of claim 2, wherein the determination unit determines whether voice is voiced or voiceless from the long-period prediction gain of the last-received valid frame, the voice indicated by a code train corresponding to the lost frame.
  - 4. The system of claim 2, wherein the excitation signal concealing unit comprises:
    - a TSM unit that extracts a section having the highest similarity with an excitation signal from a previous excitation signal, and produces the first excitation signal by performing TSM on the extracted section, the excitation signal being produced with respect to the lost frame by repeating the LPCs of the last-received valid frame;
      
      a parameter re-estimator that estimates a codebook gain based on a mean square error between the first excitation signal and a feedback of the second excitation signal and produces the second excitation signal; and
      
      a switching unit that selectively outputs one of the first excitation signal input from the TSM unit and the second excitation signal input from the parameter re-estimator, in response to a voiced/voiceless sound determination signal input from the determination unit.
  - 5. The system of claim 4, wherein the TSM unit comprises:
    - a modification unit that extracts a section having the highest similarity with an excitation signal from a previous excitation signal, sequentially combining the section with the previous excitation signal in units of sub frames, using an overlap-add method, and produces a third excitation signal, the excitation signal being produced with respect to the lost frame by repeating the LPCs of the last-received valid frame; and
      
      a first estimating unit that synthesizes the third excitation signal using an LPC and produces the first excitation signal.
  - 6. The system of claim 5, wherein the modification unit comprises a dynamic buffer in which the excitation signal and the previous excitation signal are dynamically stored, the excitation signal being produced with respect to the lost frame by repeating the LPCs of the last-received valid frame.
  - 7. The system of claim 4, wherein the parameter re-estimator comprises:
    - an error calculator that calculates a mean square error between the first excitation signal input from the TSM unit and the feedback of the second excitation signal and produces a gain control signal for re-estimation of the gain parameter;
      
      a vector estimator that estimates the gain control signal, codebook gains of an adaptive codebook (ACB) vector and a fixed codebook (FCB) vector, combines the estimated ACB gain with the estimated FCB gain, and produces a fourth excitation signal; and
      
      a second estimating unit that synthesizes the fourth excitation signal using a LPC and produces the second excitation signal.

8. A speech restoration method of concealing packet losses, the method comprising:
- demultiplexing an input bit stream and dividing the bit stream into several packets;
  
  checking whether a loss in the packets occurs;
  
  producing a LSP coefficient that represents the vocal tract of voice when packet loss occurs;
  
  producing a first excitation signal by performing TSM on an excitation signal produced with respect to a lost frame by repeating LPCs of a last-received valid frame when the lost frame of the packet is voiceless, and producing a second excitation signal by estimating a gain parameter based on the first excitation signal when the lost frame of the packet is voiced; and
  
  synthesizing voice corresponding to the lost frame using the LSP coefficient and the first or second excitation signal and outputs restored voice when packet loss occurs.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
- - 9. The method of claim 8, wherein the production and output of the LSP coefficient are performed using a LSP coefficient of a previously input available frame, the LSP representing the vocal tract with respect to the lost frame.
  - 10. The method of claim 8, wherein during the production of the first or second excitation signal, whether voice is voiced or voiceless is determined from a long-period prediction gain of the last-received valid frame, the voice indicated by a code train corresponding to the lost frame.
  - 11. The method of claim 8, wherein the production of the first or second excitation signal comprises:
    - producing the first excitation signal by performing TSM on an excitation signal produced with respect to the lost frame by repeating the LPCs of the last-received valid frame, when the lost frame is voiceless; and
      
      producing the second excitation signal by estimating the gain parameter based on the first excitation signal when the lost frame is voiced.
  - 12. The method of claim 11, wherein the production of the first excitation signal comprises:
    - producing a third excitation signal by extracting a section having the highest similarity with an excitation signal from a previous excitation signal, sequentially overlap-adding the section with the previous excitation signals, and producing a third excitation signal, the excitation signal being produced with respect to the lost frame by repeating the LPCs of the last-received valid frame; and
      
      producing the first excitation signal by synthesizing the third excitation signal using an LPC.
  - 13. The method of claim 12, wherein during the production of the third excitation signal, the excitation signal, which is produced with respect to the lost frame by repeating the LPCs of the last-received valid frame, and the previous excitation signal are dynamically stored.
  - 14. The method of claim 11, wherein the production of the second excitation signal comprises:
    - producing the first excitation signal by extracting a section having the highest similarity with an excitation signal from previous excitation signals, and performing TSM on the extracted section, the excitation signal being produced with respect to the lost frame by repeating the LPCs of the last-received valid frames; and
      
      producing the second excitation signal by estimating a codebook gain using a mean square error between the first excitation signal and a feedback of the second excitation signal.
  - 15. The method of claim 14, wherein the production of the second excitation signal comprises:
    - producing a gain control signal for re-estimation of a gain parameter by calculating a mean square error between the first excitation signal and a feedback of the second excitation signal;
      
      producing a fourth excitation signal by estimating the gain control signal and codebook gains of an ACB vector and a FCB vector and combining the estimated ACB gain with the estimated FCB gain; and
      
      producing the second excitation signal by synthesizing the fourth excitation signal using an LP.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Electronics and Telecommunications Research Institute
Original Assignee
Electronics and Telecommunications Research Institute
Inventors
Hwang, Dae Hwan, Youn, Dae Hee, Lee, Ki Seung, Park, Young Cheol, Sung, Ho Sang, Lee, Moon Keun
Primary Examiner(s)
Hudspeth; David
Assistant Examiner(s)
Jackson; Jakieda R.

Application Number

US10/615,268
Publication Number

US 20050010401A1
Time in Patent Office

1,604 Days
Field of Search

704/214, 704/219, 704/228
US Class Current

704/219
CPC Class Codes

G10L 19/005   Correction of errors induce...

G10L 19/04   using predictive techniques

G10L 25/93   Discriminating between voic...

Speech restoration system and method for concealing packet losses

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

17 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

Speech restoration system and method for concealing packet losses

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

17 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links