Speech compression method and apparatus

US 8,352,248 B2
Filed: 01/03/2003
Issued: 01/08/2013
Est. Priority Date: 01/03/2003
Status: Active Grant

First Claim

Patent Images

1. A method for encoding speech comprising:

processing an input speech signal using an encoder, resulting in a compressed encoder representation of the input speech signal,if a speech recognizer identifies a corresponding dictionary speech element, which approximates the input speech signal,determining a compressed recognizer representation of the corresponding dictionary speech element,calculating one or more differences between the compressed encoder representation and the compressed recognizer representation,compiling compressed speech information that includes representations of the one or more differences; and

the method further comprising, if the speech recognizer does not identify a corresponding dictionary speech element, compiling the compressed speech information to include the compressed encoder representation of the input speech signal, and not to include the one or more differences.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for encoding speech includes a speech encoder (106, FIG. 1), a speech recognizer (110), and a difference encoder (108). When the speech recognizer (110) recognizes a word, phoneme or feature within an input speech signal (122), the difference encoder (108) calculates the differences between speech parameters (140, 142) derived by the speech encoder (106) and speech parameters (146, 148) derived by the speech recognizer (110). The difference encoder (108) quantizes the differences (128), which replace corresponding encoder-derived parameters to be transmitted over a channel (130). In one embodiment, the difference encoder representation (128) of the speech parameters consumes fewer bits than the encoder-derived representation (124). Accordingly, the resulting bandwidth consumed by a single channel can be decreased.

24 Citations

30 Claims

1. A method for encoding speech comprising:
- processing an input speech signal using an encoder, resulting in a compressed encoder representation of the input speech signal,if a speech recognizer identifies a corresponding dictionary speech element, which approximates the input speech signal,determining a compressed recognizer representation of the corresponding dictionary speech element,calculating one or more differences between the compressed encoder representation and the compressed recognizer representation,compiling compressed speech information that includes representations of the one or more differences; and
  
  the method further comprising, if the speech recognizer does not identify a corresponding dictionary speech element, compiling the compressed speech information to include the compressed encoder representation of the input speech signal, and not to include the one or more differences.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein processing the input speech signal includes using an analysis-by-synthesis encoder to determine an encoder representation of a formant filter for the input speech signal, and the method further includes:
    - processing the input speech signal using a speech recognizer, which determines a recognizer representation of a formant filter for the corresponding dictionary speech element; and
      
      calculating the one or more differences includes calculating a set of formant filter parameter differences between the encoder representation of the formant filter and the recognizer representation of the formant filter.
  - 3. The method of claim 2, wherein:
    - determining the encoder representation of the formant filter includes determining a first set of line spectral pairs (LSPs);
      
      determining the recognizer representation of the formant filter includes determining a second set of LSPs; and
      
      calculating the set of formant filter parameter differences includes calculating, for each LSP in the first set of LSPs, an LSP difference between the LSP in the first set of LSPs and a corresponding LSP in the second set of LSPs.
  - 4. The method of claim 3, further comprising quantizing the set of formant filter parameter differences using a formant filter parameter difference codebook that includes multiple entries, each entry having a set of LSP differences.
  - 5. The method of claim 4, wherein the formant filter parameter difference codebook includes 512 or fewer entries, so that the set of formant filter parameter differences is quantizable using 9 or fewer bits.
  - 6. The method of claim 1, wherein processing the input speech signal includes using an analysis-by-synthesis encoder to determine an encoder version of speech excitation for the input speech signal, and the method further includes:
    - processing the input speech signal using a speech recognizer, which determines a recognizer version of speech excitation for the corresponding dictionary speech element; and
      
      calculating the one or more differences comprises calculating excitation differences between the encoder version of the speech excitation and the recognizer version of the speech excitation.
  - 7. The method of claim 6, wherein:
    - determining the encoder version of the speech excitation includes determining a first set of excitation pulse locations;
      
      determining the recognizer version of the speech excitation includes determining a second set of excitation pulse locations; and
      
      calculating the excitation differences includes calculating, for each excitation pulse in the first set of excitation pulses, a location difference between a first location of the excitation pulse in the first set of excitation pulses and a second location of a corresponding excitation pulse in the second set of excitation pulse locations.
  - 8. The method of claim 7, further comprising encoding the location difference using two or fewer bits.
  - 9. The method of claim 1, further comprising:
    - starting a delay timer upon recognition of a potential speech element onset; and
      
      if the delay timer expires prior to the speech recognizer identifying the corresponding dictionary speech element, compiling the compressed speech information to include the compressed encoder representation of the input speech signal, and not to include the one or more differences.
  - 10. The method of claim 1, further comprising:
    - if the speech recognizer identifies the corresponding dictionary speech element, determining a duration difference between an input speech duration and a dictionary speech element duration of the corresponding dictionary speech element; and
      
      if the duration difference exceeds a maximum difference value, performing dynamic time warping so that the compressed recognizer representation of the input speech signal corresponds in duration with the compressed encoder representation of the input speech signal.
  - 11. The method of claim 1, further comprising transmitting the compressed speech information over a transmission channel.
  - 12. The method of claim 1, further comprising storing the compressed speech information.

13. An apparatus comprising:
- speech encoder means for processing an input speech signal, resulting in a compressed encoder representation of the input speech signal;
  
  speech recognizer means for processing the input speech signal; and
  
  difference encoder means, responsive to the speech recognizer means, fordetermining a compressed recognizer representation of a corresponding dictionary speech element that approximates the input speech signal when the speech recognizer means identifies the corresponding dictionary speech element,calculating one or more differences between the compressed encoder representation and the compressed recognizer representation, andcompiling compressed speech information that includes representations of the one or more differences; and
  
  a transmitter to transmit the compressed speech information that includes representations of the one or more differences when the speech recognizer means identifies the corresponding dictionary speech element and to transmit the compressed encoder representation of the input speech signal when the speech recognizer means does not identify a dictionary speech element that approximates the input speech signal.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The apparatus of claim 13, wherein:
    - the speech encoder means includes an analysis-by-synthesis encoder means, which determines an encoder representation of a formant filter for the input speech signal;
      
      the speech recognizer means determines a recognizer representation of a formant filter for the corresponding dictionary speech element; and
      
      the difference encoder means calculates a set of formant filter parameter differences between the encoder representation of the formant filter and the recognizer representation of the formant filter.
  - 15. The apparatus of claim 14, wherein:
    - the speech encoder means determines a first set of line spectral pairs (LSPs);
      
      the speech recognizer means determines a second set of LSPs; and
      
      the difference encoder means calculates, for each LSP in the first set of LSPs, an LSP difference between the LSP in the first set of LSPs and a corresponding LSP in the second set of LSPs.
  - 16. The apparatus of claim 15, wherein the difference encoder means further quantizes the set of formant filter parameter differences using a formant filter parameter difference codebook that includes multiple entries, each entry having a set of LSP differences.
  - 17. The apparatus of claim 13, wherein:
    - the speech encoder means includes an analysis-by-synthesis encoder means, which determines an encoder version of speech excitation for the input speech signal;
      
      the speech recognizer means determines a recognizer version of speech excitation for the corresponding dictionary speech element; and
      
      the difference encoder means calculates excitation differences between the encoder version of the speech excitation and the recognizer version of the speech excitation.
  - 18. The apparatus of claim 17, wherein:
    - the speech encoder means determines a first set of excitation pulse locations;
      
      the speech recognizer means determines a second set of excitation pulse locations; and
      
      the difference encoder means calculates, for each excitation pulse in the first set of excitation pulses, a location difference between a first location of the excitation pulse in the first set of excitation pulses and a second location of a corresponding excitation pulse in the second set of excitation pulse locations.

19. An apparatus comprising:
- a speech encoder, which processes an input speech signal, resulting in a compressed encoder representation of the input speech signal;
  
  a speech recognizer, which processes the input speech signal; and
  
  a difference encoder, whichdetermines a compressed recognizer representation of a corresponding dictionary speech element that approximates the input speech signal when the speech recognizer identifies the corresponding dictionary speech element,calculates one or more differences between the compressed encoder representation and the compressed recognizer representation, andcompiles compressed speech information that includes representations of the one or more differences; and
  
  a transmitter, which transmits the compressed speech information that includes representations of the one or more differences when the speech recognizer identifies the corresponding dictionary speech element and transmits the compressed encoder representation of the input speech signal when the speech recognizer does not identify a dictionary speech element that approximates the input speech signal.
- View Dependent Claims (20, 21, 22, 23, 24)
- - 20. The apparatus of claim 19, wherein:
    - the speech encoder includes an analysis-by-synthesis encoder, which determines an encoder representation of a formant filter for the input speech signal;
      
      the speech recognizer determines a recognizer representation of a formant filter for the corresponding dictionary speech element; and
      
      the difference encoder calculates a set of formant filter parameter differences between the encoder representation of the formant filter and the recognizer representation of the formant filter.
  - 21. The apparatus of claim 20, wherein:
    - the speech encoder determines a first set of line spectral pairs (LSPs);
      
      the speech recognizer determines a second set of LSPs; and
      
      the difference encoder calculates, for each LSP in the first set of LSPs, an LSP difference between the LSP in the first set of LSPs and a corresponding LSP in the second set of LSPs.
  - 22. The apparatus of claim 21, wherein the difference encoder further quantizes the set of form ant filter parameter differences using a formant filter parameter difference codebook that includes multiple entries, each entry having a set of LSP differences.
  - 23. The apparatus of claim 19, wherein:
    - the speech encoder includes an analysis-by-synthesis encoder, which determines an encoder version of speech excitation for the input speech signal;
      
      the speech recognizer determines a recognizer version of speech excitation for the corresponding dictionary speech element; and
      
      the difference encoder calculates excitation differences between the encoder version of the speech excitation and the recognizer version of the speech excitation.
  - 24. The apparatus of claim 19, further comprising an electronic information storage device for storing the compressed speech information.

25. A system comprising:
- a communication channel operably connected to a first communication device and a second communication device;
  
  the first communication device, which includes a speech encoder, which processes an input speech signal, resulting in a compressed encoder representation of the input speech signal, a speech recognizer, and a difference encoder, whichdetermines a compressed recognizer representation of a corresponding dictionary speech element that approximates the input speech signal when the speech recognizer identifies the corresponding dictionary speech element,calculates one or more differences between the compressed encoder representation and the compressed recognizer representation, and compiles compressed speech information that includes representations of the one or more differences;
  
  wherein the first communication device further includes a transmitter, which transmits the compressed speech information that includes representations of the one or more differences when the speech recognizer identifies the corresponding dictionary speech element and transmits the compressed encoder representation of the input speech signal when the speech recognizer does not identify a dictionary speech element that approximates the input speech signal; and
  
  wherein the system further comprises the second communication device, which constructs an output speech signal based on the compressed speech information, and information associated with the corresponding dictionary speech element, and the compressed encoder information.
- View Dependent Claims (26, 27)
- - 26. The system of claim 25, wherein the communication channel is a wireless communication channel, and the first device further includes a dipole antenna for sending the compressed speech information over the wireless communication channel.
  - 27. The system of claim 25, wherein the communication channel is a wired communication channel, and the first device further includes an interface for sending the compressed speech information over the wired communication channel.

28. A program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine to perform a method for encoding speech, the method comprising:
- processing an input speech signal using an encoder, resulting in a compressed encoder representation of the input speech signal;
  
  processing the input speech signal using a speech recognizer;
  
  if the speech recognizer identifies a corresponding dictionary speech element, which approximates the input speech signal,determining a compressed recognizer representation of the corresponding dictionary speech element;
  
  calculating one or more differences between the compressed encoder representation and the compressed recognizer representation; and
  
  compiling compressed speech information that includes representations of the one or more differences, andthe method further comprising, if the speech recognizer does not identify a corresponding dictionary speech element, compiling the compressed speech information to include the compressed encoder representation of the input speech signal, and not to include the one or more differences.
- View Dependent Claims (29, 30)
- - 29. The program storage device of claim 28, wherein:
    - processing the input speech signal includes using an analysis-by-synthesis encoder to determine an encoder representation of a formant filter for the input speech signal;
      
      processing the input speech signal using a speech recognizer includes determining a recognizer representation of a formant filter for the corresponding dictionary speech element; and
      
      calculating the one or more differences comprises calculating a set of formant filter parameter differences between the encoder representation of the formant filter and the recognizer representation of the formant filter.
  - 30. The program storage device of claim 28, wherein:
    - processing the input speech signal using an analysis-by-synthesis encoder includes determining an encoder version of speech excitation for the input speech signal;
      
      processing the input speech signal using a speech recognizer includes determining a recognizer version of speech excitation for the corresponding dictionary speech element; and
      
      calculating the one or more differences comprises calculating excitation differences between the encoder version of the speech excitation and the recognizer version of the speech excitation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Marvell Asia Pte Limited (Marvell Technology Group Limited)
Original Assignee
Marvell International Limited (Marvell Technology Group Limited)
Inventors
Darroudi, Khosro, Mears, Brian R.
Primary Examiner(s)
YEN, ERIC L

Application Number

US10/336,668
Publication Number

US 20040133422A1
Time in Patent Office

3,658 Days
Field of Search

704/200, 704/230, 704/231
US Class Current

704/200
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 19/04   using predictive techniques

G10L 19/08   Determination or coding of ...

G10L 19/10   the excitation function bei...

Speech compression method and apparatus

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

24 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Speech compression method and apparatus

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

24 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links