System and Method for a High Performance Audio Codec

US 20080162150A1
Filed: 12/14/2007
Published: 07/03/2008
Est. Priority Date: 12/28/2006
Status: Abandoned Application

First Claim

Patent Images

1. A system for high performance audio codec comprising:

A CELP-based codec,An ASR engine; and

,A text comparator.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for a high performance audio codec provides higher voice quality and higher recognition accuracy from an ASR engine at an increased data rate and computational power and embodiments include those having a CELP-based codec, an ASR engine, a text comparator, an encoder, a decoder, an LPC Computation and formant analysis module, a dual stage data rate determination module, a VQ of LSP coefficients module, a pitch synthesis and optimal pitch parameter search module, and an excitation codebook parameter search module. A method for high performance audio codec includes three stages and comprises the steps of having an ASR engine yield transcribed text from each of an uncompressed reference signal and a decompressed signal that has passed through an encoder and wherein the transcribed text is compared with original text to determine word error rates in an iterative process whereby both voice quality and recognition accuracy are optimized.

Citations

51 Claims

1. A system for high performance audio codec comprising:
- A CELP-based codec,An ASR engine; and
  
  ,A text comparator.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. The system for high performance audio codec of claim 1 further comprising the ASR engine including features selected from the group transcription engine, speech analytics engine, voice biometrics engine, interactive voice response (IVR) engine, language learning engine, language translation engine.
  - 3. The system for high performance audio codec of claim 2 further comprising the ASR engine selected from the group embedded, network-based.
  - 4. The system for high performance audio codec of claim 3 further comprising the ASR engine selected from the group phonetic, large vocabulary continuous speech recognition (LVCSR).
  - 5. The system for high performance audio codec of claim 4 further comprising:
    - an encoder; and
      
      ,a decoder.
  - 6. The system for high performance audio codec of claim 5, the encoder further comprising:
    - an LPC Computation and Formant Analysis Module,a Dual Stage Data Rate Determination module,a Vector Quantization (VQ) of LSP Coefficients Module which contains a VQ Codebook,a Pitch Synthesis and Optimal Pitch Parameter Search Module; and
      
      ,an Excitation Codebook Parameter Search Module which contains an Excitation Codebook.
  - 7. The system for high performance audio codec of claim 6 further comprising the CELP-based codec being a MASC codec.
  - 8. The system for high performance audio codec of claim 7 further comprising the MASC codec having n pairs of odd and even roots and (2n)^th-order LPC filters wherein 2n equals n multiplied by two.
  - 9. The system for high performance audio codec of claim 8 further comprising the MASC codec having 10^th-order LPC filters.
  - 10. The system for high performance audio codec of claim 9 wherein the MASC codec having 10^th-order LPC filters generates five pairs of odd and even roots from LPC coefficients.
  - 11. The system for high performance audio codec of claim 10 further comprising a VQ of LSP coefficients module including a VQ codebook and wherein an optimal size and length of the VQ codebook is determined by cross-correlating the auto-correlation coefficients of the speech signal with a determined number of coefficients obtained from the LSP frequencies.
  - 12. The system for high performance audio codec of claim 11 wherein the optimal size and length of the VQ codebook thereby reduces transcription error from the ASR engine.
  - 13. The system for high performance audio codec of claim 12 wherein the optimal size and length of the VQ codebook thereby reduces transcription error from the ASR engine and also enhances voice quality in terms selected from the group PESQ, MOS.
  - 14. The system for high performance audio codec of claim 13 wherein a maximum number of LSP values in the VQ codebook is 2048.
  - 15. The system for high performance audio codec of claim 14 wherein a PCM REF is selected from the group narrow band, wide band.
  - 16. The system for high performance audio codec of claim 15 wherein the narrow band PCM REF is within the range 8 kHz to 11 kHz sampling frequency, inclusive.
  - 17. The system for high performance audio codec of claim 16 wherein the wide band PCM REF is at or above 16 kHz sampling frequency.
  - 18. The system for high performance audio codec of claim 17 wherein the PCM REF includes an audio sample byte size of at least 8 bits.
  - 19. The system for high performance audio codec of claim 18 wherein the PCM REF includes an audio sample byte size selected from the group 8-bit, 16-bit, 32-bit, 64-bit.

20. A system for high performance audio codec including an encoder and a decoder and further comprising:
- An LPC computation and formant analysis module,a dual stage data rate determination module,an LPC to LSP conversion module,a VQ of LSP Coefficients module,an interpolation and LSP to LPC conversion module,a pitch synthesis and optimal pitch parameter search module,an excitation codebook parameter search module; and
  
  ,a data packing module.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
- - 21. The system for high performance audio Codec of claim 20 further comprising the excitation codebook parameter search module having an excitation codebook.
  - 22. The system for high performance audio Codec of claim 21 further comprising the encoder and decoder each having an LSP to LPC conversion module.
  - 23. The system for high performance audio codec of claim 22 further comprising the vector quantization of LSP coefficients module having a VQ codebook.
  - 24. The system for high performance audio codec of claim 23 wherein a maximum number of LSP values in the VQ codebook is 2048.
  - 25. The system for high performance audio codec of claim 24 further comprising the data packing module including a packing portion for the encoder and an unpacking portion for the decoder.
  - 26. The system for high performance audio codec of claim 25 further comprising a CELP-based codec.
  - 27. The system for high performance audio codec of claim 26 further comprising the CELP-based codec being a MASC codec.
  - 28. The system for high performance audio codec of claim 27 further comprising the MASC codec having 10^th-order LPC filters.
  - 29. The system for high performance audio codec of claim 28 wherein the MASC codec having 10^th-order LPC filters generates five pairs of odd and even roots from LPC coefficients.
  - 30. The system for high performance audio codec of claim 29 wherein an optimal size and length of the VQ codebook is determined by cross-correlating the auto-correlation coefficients of the speech signal with a determined number of coefficients obtained from the LPC filters.
  - 31. The system for high performance audio codec of claim 30 further comprising an ASR engine and wherein the optimal size and length of the VQ codebook thereby reduces transcription error from the ASR engine.
  - 32. The system for high performance audio codec of claim 31 further comprising the ASR engine including one or more features from the group transcription engine, speech analytics engine, voice biometrics engine, interactive voice response (IVR) engine, language learning engine, language translation engine.
  - 33. The system for high performance audio codec of claim 32 further comprising the ASR engine selected from the group embedded, network-based.
  - 34. The system for high performance audio codec of claim 33 further comprising the ASR engine selected from the group phonetic, large vocabulary continuous speech recognition (LVCSR).
  - 35. The system for high performance audio codec of claim 34 wherein the optimal size of the VQ codebook thereby reduces transcription error from the ASR engine and also enhances voice quality as measured in terms selected from the group PESQ, MOS.
  - 36. The system for high performance audio codec of claim 35 wherein a PCM REF is selected from the group narrow band, wide band.
  - 37. The system for high performance audio codec of claim 36 wherein the narrow band PCM REF is within the range 8 kHz to 11 kHz sampling frequency, inclusive.
  - 38. The system for high performance audio Codec of claim 37 wherein the wide band PCM REF is at or above 16 kHz sampling frequency.
  - 39. The system for high performance audio Codec of claim 38 wherein the PCM REF includes an audio sample byte size of at least 8-bit.
  - 40. The system for high performance audio Codec of claim 39 wherein the PCM REF includes an audio sample byte size selected from the group 8-bit, 6-bit, 32-bit, 64-bit.
  - 41. The system for high performance audio Codec of claim 40 wherein an optimal size of the excitation codebook is determined by minimizing a sensitivity-weighted mean square error between input speech and synthesized speech.

42. A method for high performance audio Codec comprising the steps of:
- For Stage 1;
  
  Input speech as an uncompressed reference signal is sent to an ASR Engine, bypassing the audio Codec, whereby the ASR engine yields transcribed text from the uncompressed reference signal,The transcribed text from the uncompressed reference signal is also sent to the text comparator which compares the transcribed text from the uncompressed reference signal received from the ASR engine with the original text in order to determine a percent word error rate, % WER REF, with respect to the uncompressed reference signal,For Stage 2;
  
  input speech is sent to an encoder of the audio Codec as an uncompressed reference signal,The encoder yields compressed speech,The compressed speech from the encoder is sent to a decoder yielding a decoded signal in the form of a decompressed reference signal,The decompressed reference signal is sent to an ASR Engine yielding transcribed text from the decompressed reference signal,The transcribed text from the decompressed reference signal is sent to a text comparator which compares the transcribed text from the decompressed reference signal received from the ASR engine with the original text in order to determine a percent word error rate, % WER DEC, with respect to the decompressed signal,For Stage 3;
  
  a Δ
  
  WER is computed as a function of the % WER REF and the % WER Dec.

44. The method for high performance audio Codec of claim 43 further comprising the uncompressed reference signal being a pulse code modulated reference signal, PCM REF.
- View Dependent Claims (45, 46, 47, 48, 49, 50, 51)
- - 45. The method for high performance audio Codec of claim 44 further comprising the decompressed reference signal being a pulse code modulated decompressed signal, PCM DEC.
  - 46. The method for high performance audio Codec of claim 45 further comprising the Δ
    - WER computed as the function of the % WER REF and the % WER being an ADWER computed as an absolute difference, Δ
      
      WER_Abs, between the % WER REF and the % WER Dec wherein Δ
      
      WER_Absequals the % WER DEC subtracted from the % WER REF.
  - 47. The method for high performance audio Codec of claim 46 further comprising the Δ
    - WER computed as the function of the % WER REF and the % WER being a RDWER computed as a relative difference, Δ
      
      WER_Rel, wherein Δ
      
      WER_Relequals the Δ
      
      WER_Absdivided by the % WER REF.
  - 48. The method for high performance audio Codec of claim 47 at Stage 2 the input speech is sent to an encoder of the audio Codec as an uncompressed reference signal further comprising PCM REF being passed through modules within the encoder selected from the group dual stage data rate determination module, vector quantization of LSP coefficients module, pitch synthesis and optimal pitch parameter search module, excitation codebook parameter search module.
  - 49. The method for high performance audio Codec of claim 48 at Stage 2 the input speech is sent to an encoder of the audio Codec as an uncompressed reference signal further comprising PCM REF being passed through the encoder, the encoder modules further comprising:
    - a data rate determination module,a vector quantization of LSP coefficients module,a pitch synthesis and optimal pitch parameter search module; and
      
      ,an excitation codebook parameter search module.
  - 50. The method for high performance audio Codec of claim 48 wherein the vector quantization of LSP coefficients module contains a VQ codebook.
  - 51. The method for high performance audio Codec of claim 49 wherein the excitation codebook parameter search module contains an excitation codebook.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Vianix Delaware LLC
Original Assignee
Vianix Delaware LLC
Inventors
Ramaswamy, Veeru

Application Number

US11/956,979
Publication Number

US 20080162150A1
Time in Patent Office

Days
Field of Search
US Class Current

704/500
CPC Class Codes

G10L 19/0018 Speech coding using phoneti...

System and Method for a High Performance Audio Codec

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

51 Claims

Specification

Solutions

Use Cases

Quick Links

System and Method for a High Performance Audio Codec

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

51 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links