Apparatus system and method for speech compression and decompression

US 6,138,089 A
Filed: 03/10/1999
Issued: 10/24/2000
Est. Priority Date: 03/10/1999
Status: Expired due to Term

First Claim

Patent Images

1. A method for processing a speech signal comprising steps of:

identifying a plurality of portions of said speech signal representing individual speech pitches;

generating an encoded speech signal from a plurality of said speech pitches, said encoded speech signal retaining ones of said plurality of pitches and omitting other ones of said plurality of pitches, at least one speech pitch being omitted for each speech pitch retained; and

generating a reconstructed speech signal by replacing each said omitted pitch with an interpolated replacement pitch having signal waveform characteristics which are interpolated from a first retained reference pitch occurring temporally earlier to said pitch to be interpolated and from a second retained reference pitch occurring temporally later than said pitch to be interpolated.

View all claims

11 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention provides system, apparatus, and method for compressing a speech signal by decimating or removing somewhat redundant portions of the signal while retaining reference signal portions sufficient to reconstruct the signal without noticeable loss in quality, thereby permitting a storage and transmission of high quality speech with minimal storage volume or transmission bandwidth requirements. Speech pitch waveform decimation is used to reduce data to produce an encoded speech signal during compression, and time based interpolative speech reconstruction is used on the encoded signal to reconstruct the original speech signal. In one aspect, the invention provides a method for processing a speech signal that includes identifying portions of the speech signal representing individual speech pitches; generating an encoded speech signal from the speech pitches, the encoded speech signal retaining ones of the plurality of pitches and omitting other ones of the plurality of pitches; and generating a reconstructed speech signal by replacing each the omitted pitch with an interpolated replacement pitch having signal waveform characteristics which are interpolated from a first retained reference pitch occurring temporally earlier to the pitch to be interpolated and from a second retained reference pitch occurring temporally later than the pitch to be interpolated. In another aspect apparatus is provided to perform the speech compression and reconstruction method. In another aspect an internet voice electronic mail system is provided which has minimal voice message storage and transmission requirements while retaining high fidelity voice quality.

111 Citations

View as Search Results

19 Claims

1. A method for processing a speech signal comprising steps of:
- identifying a plurality of portions of said speech signal representing individual speech pitches;
  
  generating an encoded speech signal from a plurality of said speech pitches, said encoded speech signal retaining ones of said plurality of pitches and omitting other ones of said plurality of pitches, at least one speech pitch being omitted for each speech pitch retained; and
  
  generating a reconstructed speech signal by replacing each said omitted pitch with an interpolated replacement pitch having signal waveform characteristics which are interpolated from a first retained reference pitch occurring temporally earlier to said pitch to be interpolated and from a second retained reference pitch occurring temporally later than said pitch to be interpolated.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method in claim 1, wherein said step of generating a reconstructed speech signal comprises the steps of:
    - interpolating said replacement pitches to have signal values that are linear interpolations of the signal amplitude values of the temporally earlier and temporally later pitches at corresponding times relative to the start of the pitches.
  - 3. The method in claim 2, wherein the interpolated pitch signal amplitudes are interpolated according to the expression:
    - ##EQU3## where Aⁱ_pnew,t is the computed desired amplitude of the new interpolated pitch for the sample corresponding to relative time t;
      
      A_pref1,t is the reference pitch amplitude of the first reference pitch at the corresponding relative time t measured relative to the origin of each pitch;
      
      n is the number of pitches that have been omitted and which are to be reconstructed, and i is an index of the particular pitch for which the weighted amplitude is being computed.
  - 4. The method in claim 1, wherein at least three out of four pitches are omitted and the reconstructed speech signal includes three pitches interpolated from the two surrounding reference pitches.
  - 5. The method in claim 1, wherein at least four out of five pitches are omitted and the reconstructed speech signal includes four pitches interpolated from the two surrounding reference pitches.
  - 6. The method in claim 1, wherein at least five out of six pitches are omitted and the reconstructed speech signal includes five pitches interpolated from the two surrounding reference pitches.

7. A speech processor for processing a speech signal, said speech processor comprising:
- a plurality of delay circuits, each receiving said speech signal f(t) as an input and generating a different time delayed version of said speech signal f(t-Td_i) as an output;
  
  a plurality of correlator circuits, each said correlator circuit receiving said input speech signal f(t) and one of said time delayed speech signals f(t-Td_i) and generating a correlation value indicating the amount of correlation between said speech signal f(t) and said time delayed speech signal;
  
  a comparator circuit receiving said plurality of correlation values and generating an autocorrelation of said input signal with time delayed versions of said speech signal, one correlation value being received from each of said correlator circuits;
  
  a pitch detector receiving said autocorrelation signal and identifying a pitch length for at least a portion of said speech signal; and
  
  an encoder receiving said pitch length and said speech signal and generating an encoded version of said speech signal wherein speech pitches of said speech signal are retained or omitted on the basis of said pitch detector input.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
- - 8. The speech processor in claim 7, further comprising:
    - a noise detector circuit receiving said comparator output signal and generating an output signal that is used when said pitch detector does not detect a pitch, said noise detector analyzing a non-correlated portion of said speech signal and determining the part of said speech signal that should be included as a representation in the encoded signal.
  - 9. The speech processor in claim 7, further comprising:
    - a pitch counter circuit which compares the values of the auto-correlation function for a sequence of pitches and determines when the autocorrelation value crosses some predetermined threshold, a new reference pitch being inserted in said encoded signal when said value of said auto-correlation function drops below said threshold.
  - 10. The speech processor in claim 9, wherein said autocorrelation threshold is set in the range between about 0.7 and 0.9.
  - 11. The speech processor in claim 7, wherein said pitch detector comprises a vowel pitch detector and a consonant pitch detector;
    - said vowel pitch detector comprising means to receive said comparator output signal and calculating a vowel pitch length for high amplitude signals and large values of said autocorrelation function that are typical for vowel sounds;
      
      said consonant pitch detector comprising means to receive said comparator output signal and calculating a consonant pitch length for low amplitude signals and small values of the autocorrelation function that are typical for consonant sounds.
  - 12. The speech processor in claim 11, wherein said vowel pitch length is determined as the distance between two local maximums of the autocorrelation function which satisfy three conditions:
    - (i) the vowel pitch length L_v is between 50 and 200 msec, (ii) the adjusted pitches differ not more than about five percent (5%), and (iii) the local maximums of the autocorrelation function that marks the beginning and the end of each pitch are larger than any local maximums between them.
  - 13. The speech processor in claim 11, wherein said consonant pitch length is determined as the distance between two local maximums of the autocorrelation function which satisfy three conditions:
    - (i) the consonant pitch length L_v is between 50 and 200 msec, (ii) the adjusted consonant pitches differ not more than about five percent (5%), (iii) the local maximums of the autocorrelation function that marks the beginning and the end of each consonant pitch are larger than any local maximums between them, and (iv) the consonant pitch length is close, within some predetermined length difference, to last pitch length determined by the consonant pitch detector or to the first pitch length determined by the vowel pitch detector after the consonant'"'"'s pitch length is determined.
  - 14. The speech processor in claim 7, further comprising:
    - a noise detector circuit receiving said comparator output signal and generating an output signal that is used when said pitch detector does not detect a pitch, said noise detector analyzing a non-correlated portion of said speech signal and determining the part of said speech signal that should be included as a representation in the encoded signal;
      
      a pitch counter circuit which compares the values of the auto-correlation function for a sequence of pitches and determines when the autocorrelation value crosses some predetermined threshold, a new reference pitch being inserted in said encoded signal when said value of said auto-correlation function drops below said threshold; and
      
      said pitch detector comprises a vowel pitch detector and a consonant pitch detector;
      
      said vowel pitch detector comprising means to receive said comparator output signal and calculating a vowel pitch length for high amplitude signals and large values of said autocorrelation function that are typical for vowel sounds;
      
      said vowel pitch length is determined as the distance between two local maximums of the autocorrelation function which satisfy three conditions;
      
      (i) the vowel pitch length L_v is between 50 and 200 msec, (ii) the adjusted pitches differ not more than about five percent, and (iii) the local maximums of the autocorrelation function that marks the beginning and the end of each pitch are larger than any local maximums between them;
      
      said consonant pitch detector comprising means to receive said comparator output signal and calculating a consonant pitch length for low amplitude signals and small values of the autocorrelation function that are typical for consonant sounds;
      
      said consonant pitch length is determined as the distance between two local maximums of the autocorrelation function which satisfy three conditions;
      
      (i) the consonant pitch length L_v is between 50 and 200 msec, (ii) the adjusted consonant pitches differ not more than about five percent, (iii) the local maximums of the autocorrelation function that marks the beginning and the end of each consonant pitch are larger than any local maximums between them, and (iv) the consonant pitch length is close, within some predetermined length difference, to last pitch length determined by the consonant pitch detector or to the first pitch length determined by the vowel pitch detector after the consonant'"'"'s pitch length is determined.

15. An electronic voice mail system for communicating an original speech signal message between a first computer and a second computer among a plurality of networked computers, said system said characterized in that:
- said first computer system includes a first speech processor operative to generate a compressed encoded speech signal;
  
  said second computer system includes a second speech processor operative to generate a decompressed reconstructed speech signal from said encoded signal;
  
  said first speech processor comprising;
  
  a plurality of delay circuits, each receiving said speech signal f(t) as an input and generating a different time delayed version of said speech signal f(t-Td_i) as an output;
  
  a plurality of correlator circuits, each said correlator circuit receiving said input speech signal f(t) and one of said time delayed speech signals f(t-Td_i) and generating a correlation value indicating the amount of correlation between said speech signal f(t) and said time delayed speech signal;
  
  a comparator circuit receiving said plurality of correlation values and generating an autocorrelation of said input signal with time delayed versions of said speech signal, one correlation value being received from each of said correlator circuits;
  
  a pitch detector receiving said autocorrelation signal and identifying a pitch length for at least a portion of said speech signal; and
  
  an encoder receiving said pitch length and said speech signal and generating an encoded version of said speech signal wherein speech pitches of said speech signal are retained or omitted on the basis of said pitch detector input; and
  
  said second speech processor comprising;
  
  a decoder receiving said encoded speech signal generated by said first speech processor, including receiving a plurality of reference pitches; and
  
  interpolation means for interpolating pitches occurring temporally between said reference pitches to generate a reconstructed version of said original speech signal.
- View Dependent Claims (18, 19)
- - 18. The voice transmission system in claim 15, wherein said first processor comprises a hardware processor including a plurality of specialized speech processing circuits.
  - 19. The voice transmission system in claim 15, wherein said first processor comprises a general purpose computer executing software or firmware to implement said signal delay processor, said signal correlator, said comparator, said pitch detector, and said encoder.

16. A voice transmission system for communicating an original speech signal message over a low-bandwidth communications channel between a transmitting location and a receiving location, said system said characterized in that:
- said transmitting location includes a first processor adapted to generate a compressed encoded speech signal;
  
  said first processor comprising;
  
  a signal delay processor receiving said original speech signal f(t) as an input and generating a plurality of different time delayed versions of said speech signal f(t-Td_i) as outputs;
  
  a signal correlator receiving said original speech signal f(t) and said time delayed speech signals f(t-Td_i), i=1, . . . , n and generating correlation values indicating the amount of correlation between said speech signal f(t) and said time delayed speech signals;
  
  a comparator receiving said correlation values and generating an autocorrelation result of said input signal with time delayed versions of said speech signal;
  
  a pitch detector receiving said autocorrelation signal and identifying a pitch length for at least a portion of said speech signal; and
  
  an encoder receiving said pitch length and said original speech signal and generating an encoded version of said speech signal wherein speech pitches of said speech signal are retained or omitted on the basis of said pitch detector input.
- View Dependent Claims (17)
- - 17. The voice transmission system in claim 16, wherein said receiving location includes a second processor operative to generate a decompressed reconstructed speech signal from said encoded signal;
    - and said second speech processor comprising;
      
      a decoder receiving said encoded speech signal generated by said first processor, including receiving at least one reference pitch; and
      
      an interpolator for interpolating speech pitches occurring temporally adjacent said at least one reference pitch to generate a reconstructed version of said original speech signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Ip Ot Sub Ulc
Original Assignee
infolio, Inc.
Inventors
Guberman, Shelia
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Chawan, Vijay B

Application Number

US09/265,914
Time in Patent Office

594 Days
Field of Search

704/207, 704/208, 704/500-504, 704/219, 704/228, 704/220
US Class Current

704/207
CPC Class Codes

G10L 19/02 using spectral analysis, e....

Apparatus system and method for speech compression and decompression

First Claim

11 Assignments

0 Petitions

Accused Products

Abstract

111 Citations

19 Claims

Specification

Use Cases

Quick Links

Others

Apparatus system and method for speech compression and decompression

First Claim

11 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

111 Citations

19 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others