Scalable and embedded codec for speech and audio signals

US 20080052068A1
Filed: 08/10/2007
Published: 02/28/2008
Est. Priority Date: 09/23/1998
Status: Active Grant

First Claim

Patent Images

1. (canceled)

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method for processing of audio and speech signals is disclosed, which provide compatibility over a range of communication devices operating at different sampling frequencies and/or bit rates. The analyzer of the system divides the input signal in different portions, at least one of which carries information sufficient to provide intelligible reconstruction of the input signal. The analyzer also encodes separate information about other portions of the signal in an embedded manner, so that a smooth transition can be achieved from low bit-rate to high bit-rate applications. Accordingly, communication devices operating at different sampling rates and/or bit-rates can extract corresponding information from the output bit stream of the analyzer. In the present invention embedded information generally relates to separate parameters of the input signal, or to additional resolution in the transmission of original signal parameters. Non-linear techniques for enhancing the overall performance of the system are also disclosed. Also disclosed is a novel method of improving the quantization of signal parameters. In a specific embodiment the input signal is processed in two or more modes dependent on the state of the signal in a frame. When the signal is determined to be in a transition state, the encoder provides phase information about N sinusoids, which the decoder end uses to improve the quality of the output signal at low bit rates.

Citations

50 Claims

1. (canceled)

2. (canceled)

3. (canceled)

4. (canceled)

5. (canceled)

6. (canceled)

7. (canceled)

8. (canceled)

9. (canceled)

10. (canceled)

11. (canceled)

12. (canceled)

13. (canceled)

14. (canceled)

15. (canceled)

16. (canceled)

17. (canceled)

18. (canceled)

19. (canceled)

20. (canceled)

21. A system for embedded coding of audio signals comprising:
- (a) a frame extractor for dividing an input signal into a plurality of signal frames corresponding to successive time intervals;
  
  (b) means for providing parametric representations of the signal in each frame, said parametric representations being based on a signal model;
  
  (c) means for providing a first encoded data portion corresponding to a user-specified parametric representation, which first encoded data portion contains information sufficient to reconstruct a representation of the input signal;
  
  (d) means for providing one or more secondary encoded data portions of the user-selected parametric representation; and
  
  (e) means for providing an embedded output signal based at least on said first encoded data portion and said one or more secondary encoded data portions of the user-selected parametric representation.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28)
- - 22. The system of claim 21 further comprising:
    - (f) means for providing representations of the signal in each frame, which are not based on a signal model.
  - 23. The system of claim 22 further comprising (g) means for selecting a specific one from the representations in (b) and (f) based on user-selected constraints.
  - 24. The system of claim 21 wherein said means for providing parametric representations of the signal in each frame comprises a pitch detector for computing a first estimate of the pitch of a signal in each frame;
    - means for determining parameters of sinusoids representing the signal in each frame; and
      
      a spectrum envelope encoder for encoding the shape of the envelope of the signal in each frame.
  - 25. The system of a claim 21 wherein said means for providing an embedded output signal comprises a bit stream assembler for providing an output bit stream containing user specified information about parameters of at least one sinusoid in the spectrum of the input signal, and about parameters representing a spectrum envelope of the signal in each frame.
  - 26. The system of claim 21 further comprising means for decoding the embedded output signal.
  - 27. The system of claim 26 wherein said means for decoding operate at a sampling frequency different from a sampling frequency of the input signal.
  - 28. The system of claim 21 wherein said means for providing an embedded output signal comprises means for assembling data packets suitable for transmission over a packet-switched network.

29. A method for multistage vector quantization of signals comprising:
- (a) passing an input signal through a first stage of a multistage vector quantizer having a predetermined set of codebook vectors, each vector corresponding to a Voronoi cell, to obtain error vectors corresponding to differences between a codebook vector and an input signal vector falling within a Voronoi cell;
  
  (b) determining probability density functions (pdfs) for the error vectors in at least two Voronoi cells;
  
  (c) transforming error vectors using a transformation based on the pdfs determined for said at least two Voronoi cells; and
  
  (d) passing transformed error vectors through at least a second stage of the multistage vector quantizer to provide a quantized output signal.
- View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37)
- - 30. The method of claim 29 further comprising the step of performing an inverse transformation on the quantized output signal to reconstruct a representation of the input signal.
  - 31. The method of claim 29 wherein in step (c) the transformation comprises scaling the sizes of said at least two Voronoi cells as to approximately equalize these sizes.
  - 32. The method of claim 31 wherein scaling factor for a Voronoi cell is determined as the inverse of an average for the Euclidean distance between the codebook vector for the Voronoi cell and a set of training vectors.
  - 33. The method of claim 29 wherein in step (c) the transformation comprises rotating the error vector at an angle, which is determined by the Voronoi cell.
  - 34. The method of claim 33 wherein the rotation angle is determined as the angle between the codebook vector for the Voronoi cell and one of the coordinate axes of the cell.
  - 35. The method of claim 29 wherein in step (c) the transformation comprises both scaling and rotating the error vector at given angle.
  - 36. The method of claim 29 wherein in step (c) a transformation for inner Voronoi cells is different than a transformation for outer Voronoi cells.
  - 37. The method of claim 29 wherein in step (c) the transformation is performed using tuning of translation and rotation parameters as to maximally align boundaries of scaled Voronoi regions and slopes of pdfs in each Voronoi region.

38. A system for processing audio signals comprising;
- (a) a frame extractor for dividing an input audio signal into a plurality of signal frames corresponding to successive time intervals;
  
  (b) a frame mode classifier for determining if the signal in a frame is in a transition state;
  
  (c) a processor for extracting parameters of the signal in a frame receiving input from said classifier, wherein for frames the signal of which is determined to be in said transition state said extracted parameters include phase information; and
  
  (d) a multi-mode coder in which extracted parameters of the signal in a frame are processed in at least two distinct paths dependent on whether the frame signal is determined to be in a transition state.
- View Dependent Claims (39, 40, 41, 42, 43)
- - 39. The system of claim 38 wherein said extracted parameters comprise gain, pitch and voicing parameters and parameters related to Linear Prediction Coefficients (LPCs):
    - $y (n; ω_{0}) = μ \sum_{k = 1}^{K} γ_{k} \exp (j n ω_{0}) + \sum_{l = 1}^{L} \sum_{k = 1}^{K - 1} γ_{k + 1} γ_{k}^{*} \exp (j nl ω_{0})$
  - 40. The system of claim 38 wherein said frame mode classifier receives input from said processor for extracting parameters and outputs at least one state flag.
  - 41. The system of claim 40 wherein the multi-mode coder determines one of said at least two distinct processing paths on the basis of said at least one state flag.
  - 42. The system of claim 38 further comprising a decoder for decoding signals in at least two distinct processing paths.
  - 43. The system of claim 38 wherein said distinct processing paths include distinct bit allocation for frames determined to be in different states.

44. A system for processing audio signals comprising:
- (a) a frame extractor for dividing an input signal into a plurality of signal frames corresponding to successive time intervals;
  
  (b) means for providing a parametric representation of the signal in each frame, said parametric representation being based on a signal model;
  
  (c) a non-linear processor for providing refined estimates of parameters of the parametric representation of the signal in each frame; and
  
  (d) means for encoding said refined parameter estimates.
- View Dependent Claims (45, 46, 47, 48, 49, 50)
- - 45. The system of claim 44 wherein said refined estimates comprises an estimate of the pitch.
  - 46. The system of claim 44 wherein said refined estimates comprises an estimate of a voicing parameter for the input speech signal.
  - 47. The system of claim 44 wherein said refined estimates comprises an estimate of a pitch onset time for an input speech signal.
  - 48. The system of claim 44 wherein said non-linear processor computes the maximum of a correlation function of the input signal over a set of complex frequencies.
  - 49. The system of claim 45 wherein the computation is done iteratively.
  - 50. The system of claim 44 wherein a measure of voicing for the input signal is computed as $ρ$
    - ⁡
      
      ( ω
      
      0 ) = ∑
      
      m = 1 M ⁢
      
      
      
      Y m 
      
      2 ⁢
      
      0.5 * [ 1 + cos ⁡
      
      ( 2 ⁢
      
      π
      
      ω
      
      m / ω
      
      0 ) ] / ∑
      
      m = 1 M ⁢
      
      
      
      Y m 
      
      2 where Y_mare complex amplitudes of the output of a nonlinear operation defined over the input signal s(n) as defined $\begin{matrix} y (n) = μ \sum_{k = 1}^{K} s_{k} (n) + \sum_{l = 1}^{L} \sum_{k = 1}^{K - 1} s_{k + 1} (n) s_{k}^{*} (n) \\ = μ \sum_{k = 1}^{K} γ_{k} \exp (j n ω_{k}) + \sum_{l = 1}^{L} \sum_{k = 1}^{K - 1} γ_{k + 1} γ_{k}^{*} \exp [j n (ω_{k + 1} - ω_{k})] \end{matrix}$ where γ
      
      _k=A_kexp (jθ
      
      _k) is the complex amplitude and where 0≦
      
      μ
      
      ≦
      
      1 is a bias factor.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Chen Juin-Hwey, Craig Watkins, David Campana, Joseph Aguilar, Robert Dunn, Robert Mcaulay, Robert Zopf, Wang Wei, Xiaoquin Sun
Original Assignee
Chen Juin-Hwey, Craig Watkins, David Campana, Joseph Aguilar, Robert Dunn, Robert Mcaulay, Robert Zopf, Wang Wei, Xiaoquin Sun
Inventors
Wang, Wei, Dunn, Robert, Aguilar, Joseph, Zopf, Robert, Watkins, Craig, Campana, David, Chen, Juin-Hwey (Raymond), McAulay, Robert, Sun, Xiaoquin

Granted Patent

US 9,047,865 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/230
CPC Class Codes

G10L 19/002   Dynamic bit allocation for ...

G10L 19/093   using sinusoidal excitation...

G10L 19/24   Variable rate codecs, e.g. ...

Scalable and embedded codec for speech and audio signals

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

50 Claims

Specification

Solutions

Use Cases

Quick Links

Scalable and embedded codec for speech and audio signals

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

50 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links