Speech compression method and apparatus

US 8,639,503 B1
Filed: 01/03/2013
Issued: 01/28/2014
Est. Priority Date: 01/03/2003
Status: Active Grant

First Claim

Patent Images

1. A method for encoding speech, the method comprising:

processing an input speech signal using an encoder, resulting in a compressed encoder representation of the input speech signal; and

if a speech recognizer identifies, in the input speech signal, a corresponding dictionary speech element that approximates the input speech signal,determining, with an electronic device, a compressed recognizer representation of the corresponding dictionary speech element,calculating, with the electronic device, one or more differences between the compressed encoder representation and the compressed recognizer representation, andcompiling, with the electronic device, compressed speech information that includes representations of the one or more differences,wherein the encoder and the speech recognizer are implemented with the electronic device.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for encoding speech includes processing an input speech signal using an encoder, resulting in a compressed encoder representation of the input speech signal. The method also includes, if a speech recognizer identifies, in the input speech signal, a corresponding dictionary speech element that approximates the input speech signal, determining, with an electronic device, a compressed recognizer representation of the corresponding dictionary speech element, calculating, with the electronic device, one or more differences between the compressed encoder representation and the compressed recognizer representation, and compiling, with the electronic device, compressed speech information that includes representations of the one or more differences. The encoder and the speech recognizer are implemented with the electronic device.

Citations

27 Claims

1. A method for encoding speech, the method comprising:
- processing an input speech signal using an encoder, resulting in a compressed encoder representation of the input speech signal; and
  
  if a speech recognizer identifies, in the input speech signal, a corresponding dictionary speech element that approximates the input speech signal,determining, with an electronic device, a compressed recognizer representation of the corresponding dictionary speech element,calculating, with the electronic device, one or more differences between the compressed encoder representation and the compressed recognizer representation, andcompiling, with the electronic device, compressed speech information that includes representations of the one or more differences,wherein the encoder and the speech recognizer are implemented with the electronic device.

2. A method for encoding an input speech signal, the method comprising:
- processing the input speech signal using a first encoder configured (i) to compress the input speech signal by removing natural redundant information in the input speech signal, and (ii) to generate an encoded representation of the input speech signal, wherein the first encoder is implemented with an electronic device;
  
  processing the input speech signal using a second encoder configured to compress the input speech signal by recognizing known speech elements in the input speech signal, the known speech elements stored in a memory associated with the second encoder, wherein the second encoder is implemented with the electronic device;
  
  when the second encoder identifies a known speech element that approximates the input speech signal,determining, with the electronic device, an encoded representation of the known speech element,calculating, with the electronic device, one or more differences between (i) the encoded representation of the input speech signal generated by the first encoder, and (ii) the encoded representation of the known speech element generated by the second encoder, andcompiling, with the electronic device, compressed speech information that includes (i) representations of the one or more differences, and (ii) an indication of the known speech element; and
  
  when the second encoder does not identify a corresponding known speech element, compiling, with the electronic device, the compressed speech information to include the encoded representation of the input speech signal generated by the first encoder.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 3. The method of claim 2, wherein:
    - processing the input speech signal using the first encoder comprises performing, with the electronic device, analysis-by-synthesis to generate the encoded representation of the input speech signal to include an indication of a first formant filter for the input speech signal; and
      
      processing the input speech signal using the second encoder comprises, when the second encoder identifies the known speech element that approximates the input speech signal, generating, with the electronic device, an indication of a second formant filter corresponding to the known speech element; and
      
      calculating the one or more differences includes calculating, with the electronic device, a set of formant filter parameter differences between the first formant filter and the second formant filter.
  - 4. The method of claim 3, wherein:
    - the first formant filter includes a first set of line spectral pairs (LSPs);
      
      the second formant filter includes a second set of LSPs; and
      
      calculating the set of formant filter parameter differences includes calculating, for each LSP in the first set of LSPs, an LSP difference between the LSP in the first set of LSPs and a corresponding LSP in the second set of LSPs.
  - 5. The method of claim 4, further comprising quantizing, with the electronic device, the set of formant filter parameter differences using a formant filter parameter difference codebook that includes multiple entries, each entry having a set of LSP differences.
  - 6. The method of claim 5, wherein the formant filter parameter difference codebook includes 512 or fewer entries, so that the set of formant filter parameter differences is quantizable using 9 or fewer bits.
  - 7. The method of claim 2, wherein:
    - processing the input speech signal using the first encoder comprises performing, with the electronic device, analysis-by-synthesis to generate the encoded representation of the input speech signal to include an indication of a first speech excitation for the input speech signal; and
      
      processing the input speech signal using the second encoder comprises, when the second encoder identifies the known speech element that approximates the input speech signal, generating, with the electronic device, an indication of a second speech excitation corresponding to the known speech element; and
      
      calculating the one or more differences comprises calculating, with the electronic device, excitation differences between the first speech excitation and the second speech excitation.
  - 8. The method of claim 7, wherein:
    - the first excitation includes a first set of excitation pulse locations;
      
      the first excitation includes a second set of excitation pulse locations; and
      
      calculating the excitation differences includes calculating, for each excitation pulse in the first set of excitation pulses, a location difference between a first location of the excitation pulse in the first set of excitation pulses and a second location of a corresponding excitation pulse in the second set of excitation pulse locations.
  - 9. The method of claim 8, further comprising encoding the location difference using two or fewer bits.
  - 10. The method of claim 2, further comprising:
    - starting a delay timer upon recognition by the second encoder of a potential known speech element onset, wherein the delay timer is implemented with the electronic device; and
      
      if the delay timer expires prior to the second encoder identifying a corresponding known speech element, compiling the compressed speech information to include the encoded representation of the input speech signal generated by the first encoder.
  - 11. The method of claim 2, further comprising:
    - when the second encoder identifies a corresponding known speech element, determining, with the electronic device, a duration difference between an input speech duration and a duration of the corresponding known speech element; and
      
      when the duration difference exceeds a maximum difference value, performing, with the electronic device, dynamic time warping so that the encoded representation of the known speech element generated by the second encoder corresponds in duration with the encoded representation of the input speech signal generated by the second encoder.
  - 12. The method of claim 2, further comprising transmitting, with the electronic device, the compressed speech information over a transmission channel.
  - 13. The method of claim 2, further comprising storing the compressed speech information.

14. An apparatus, comprising:
- a first speech encoder configured (i) to compress an input speech signal by removing natural redundant information in the input speech signal, and (ii) to generate an encoded representation of the input speech signal;
  
  a memory to store known speech elements;
  
  a second speech encoder coupled to the memory, the second speech encoder configured to recognize, in the input speech signal, known speech elements from the memory, and, when a known speech element that approximates the input speech signal is identified, determine an encoded representation of the known speech element;
  
  a difference encoder configured to, when the second encoder identifies a known speech element that approximates the input speech signal,calculate one or more differences between (i) the encoded representation of the input speech signal generated by the first encoder, and (ii) the encoded representation of the known speech element generated by the second encoder, andcompile compressed speech information that includes (i) representations of the one or more differences, and (ii) an indication of the known speech element; and
  
  a transmitter configured (i) to transmit, when the second speech encoder identifies the corresponding known speech element, the compressed speech information that includes representations of the one or more differences, and (ii) to transmit, when the second speech encoder does not identify a known speech element that approximates the input speech signal, the encoded representation of the input speech signal generated by the first speech encoder.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The apparatus of claim 14, wherein:
    - the first speech encoder includes an analysis-by-synthesis encoder configured to determine an indication of a first formant filter for the input speech signal;
      
      the second speech encoder is configured to, when a known speech element that approximates the input speech signal is identified, determine an indication of a second formant filter for the corresponding known speech element; and
      
      the difference encoder is configured to calculate a set of formant filter parameter differences between the first formant filter and the second formant filter.
  - 16. The apparatus of claim 15, wherein:
    - the first speech encoder is configured to determine a first set of line spectral pairs (LSPs);
      
      the second speech recognizer is configured to determine a second set of LSPs; and
      
      the difference encoder is configured to calculate, for each LSP in the first set of LSPs, an LSP difference between the LSP in the first set of LSPs and a corresponding LSP in the second set of LSPs.
  - 17. The apparatus of claim 16, wherein the difference encoder is configured to quantize the set of formant filter parameter differences using a formant filter parameter difference codebook that includes multiple entries, each entry having a set of LSP differences.
  - 18. The apparatus of claim 14, wherein:
    - the first speech encoder includes an analysis-by-synthesis encoder configured to determine an indication of a first speech excitation for the input speech signal;
      
      the second speech encoder is configured to, when a known speech element that approximates the input speech signal is identified, determine an indication of a second speech excitation for the corresponding known speech element; and
      
      the difference encoder is configured to calculate excitation differences between the first speech excitation and the second speech excitation.
  - 19. The apparatus of claim 14, further comprising an electronic information storage device for storing the compressed speech information,wherein the channel transmitter is configured to store data to the electronic information storage device.

20. A tangible, non-transitory computer readable storage medium storing instructions that, when executed by a processor, cause the processor to:
- process an input speech signal using a first encoder configured (i) to compress the input speech signal by removing natural redundant information in the input speech signal, and (ii) to generate an encoded representation of the input speech signal;
  
  process the input speech signal using a second encoder configured to compress the input speech signal by recognizing known speech elements in the input speech signal, the known speech elements stored in a memory associated with the second encoder;
  
  when the second encoder identifies a known speech element that approximates the input speech signal,determine an encoded representation of the known speech element,calculate one or more differences between (i) the encoded representation of the input speech signal generated by the first encoder, and (ii) the encoded representation of the known speech element generated by the second encoder, andcompile compressed speech information that includes (i) representations of the one or more differences, and (ii) an indication of the known speech element; and
  
  when the second encoder does not identify a corresponding known speech element, compile the compressed speech information to include the encoded representation of the input speech signal generated by the first encoder.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27)
- - 21. The computer readable storage medium of claim 20, storing instructions that, when executed by a processor, cause the processor to:
    - process the input speech signal using the first encoder at least by performing analysis-by-synthesis to generate the encoded representation of the input speech signal to include an indication of a first formant filter for the input speech signal;
      
      process the input speech signal using the second encoder at least by, when the second encoder identifies the known speech element that approximates the input speech signal, generating an indication of a second formant filter corresponding to the known speech element; and
      
      calculate the one or more differences at least by calculating a set of formant filter parameter differences between the first formant filter and the second formant filter.
  - 22. The computer readable medium of claim 21, wherein:
    - the first formant filter includes a first set of line spectral pairs (LSPs);
      
      the second formant filter includes a second set of LSPs; and
      
      the computer readable medium stores instructions that, when executed by a processor, cause the processor to calculate the set of formant filter parameter differences at least by calculating, for each LSP in the first set of LSPs, an LSP difference between the LSP in the first set of LSPs and a corresponding LSP in the second set of LSPs.
  - 23. The computer readable medium of claim 22, storing instructions that, when executed by a processor, cause the processor to quantize the set of formant filter parameter differences using a formant filter parameter difference codebook that includes multiple entries, each entry having a set of LSP differences.
  - 24. The computer readable medium of claim 20, storing instructions that, when executed by a processor, cause the processor to:
    - process the input speech signal using the first encoder at least by performing analysis-by-synthesis to generate the encoded representation of the input speech signal to include an indication of a first speech excitation for the input speech signal; and
      
      process the input speech signal using the second encoder at least by, when the second encoder identifies the known speech element that approximates the input speech signal, generating an indication of a second speech excitation corresponding to the known speech element; and
      
      calculate the one or more differences at least by calculating excitation differences between the first speech excitation and the second speech excitation.
  - 25. The computer readable medium of claim 24, wherein:
    - the first excitation includes a first set of excitation pulse locations;
      
      the first excitation includes a second set of excitation pulse locations; and
      
      the computer readable medium stores instructions that, when executed by a processor, cause the processor to calculate the excitation differences at least by calculating, for each excitation pulse in the first set of excitation pulses, a location difference between a first location of the excitation pulse in the first set of excitation pulses and a second location of a corresponding excitation pulse in the second set of excitation pulse locations.
  - 26. The computer readable medium of claim 20, storing instructions that, when executed by a processor, cause the processor to:
    - start a delay timer upon recognition by the second encoder of a potential known speech element onset; and
      
      if the delay timer expires prior to the second encoder identifying a corresponding known speech element, compile the compressed speech information to include the encoded representation of the input speech signal generated by the first encoder.
  - 27. The computer readable medium of claim 20, storing instructions that, when executed by a processor, cause the processor to:
    - when the second encoder identifies a corresponding known speech element, determine a duration difference between an input speech duration and a duration of the corresponding known speech element; and
      
      when the duration difference exceeds a maximum difference value, perform dynamic time warping so that the encoded representation of the known speech element generated by the second encoder corresponds in duration with the encoded representation of the input speech signal generated by the second encoder.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Marvell Asia Pte Limited (Marvell Technology Group Limited)
Original Assignee
Marvell International Limited (Marvell Technology Group Limited)
Inventors
Darroudi, Khosro, Mears, Brian R.
Primary Examiner(s)
YEN, ERIC L

Application Number

US13/733,602
Time in Patent Office

390 Days
Field of Search

704/200, 704/230, 704/231
US Class Current

704/230
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 19/04   using predictive techniques

G10L 19/08   Determination or coding of ...

G10L 19/10   the excitation function bei...

Speech compression method and apparatus

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

27 Claims

Specification

Solutions

Use Cases

Quick Links

Speech compression method and apparatus

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

27 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links