Method and apparatus for artificial bandwidth expansion in speech processing

US 20040138876A1
Filed: 01/10/2003
Published: 07/15/2004
Est. Priority Date: 01/10/2003
Status: Abandoned Application

First Claim

Patent Images

1. A method of improving speech in a plurality of signal segments having speech signals in a time domain, said method characterized by upsampling the signal segments for providing upsampled segments in the time domain;

converting the upsampled segments into a plurality of transformed segments having speech spectra in a frequency domain;

classifying the speech signals into a plurality of classes based on at least one signal characteristic of the speech signals;

modifying the speech spectra in the frequency domain based on the classes for providing modified transformed segments; and

converting the modified transformed segments into speech data in the time domain.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and device for improving the quality of speech signals transmitted using an audio bandwidth between 300 Hz and 3.4 kHz. After the received speech signal is divided into frames, zeros are inserted between samples to double the sampling frequency. The level of these aliased frequency components is adjusted using an adaptive algorithm based on the classification of the speech frame. Sound can be classified into sibilants and non-sibilants, and a non-sibilant sound can be further classified into a voiced sound and a stop consonant. The adjustment is based on parameters, such as the number of zero-crossings and energy distribution, computed from the spectrum of the up-sampled speech signal between 300 Hz and 3.4 kHz. A new sound with a bandwidth between 300 Hz and 7.7 kHz is obtained by inverse Fourier transforming the spectrum of the adjusted, up-sampled sound.

74 Citations

View as Search Results

32 Claims

1. A method of improving speech in a plurality of signal segments having speech signals in a time domain, said method characterized by upsampling the signal segments for providing upsampled segments in the time domain;
- converting the upsampled segments into a plurality of transformed segments having speech spectra in a frequency domain;
  
  classifying the speech signals into a plurality of classes based on at least one signal characteristic of the speech signals;
  
  modifying the speech spectra in the frequency domain based on the classes for providing modified transformed segments; and
  
  converting the modified transformed segments into speech data in the time domain.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1, wherein each signal segment comprises a plurality of signal samples, said method characterized in that said upsampling is carried out by inserting a value between adjacent signal samples in the signal segment.
  - 3. The method of claim 2, characterized in that the inserted value is zero.
  - 4. The method of claim 1, wherein the speech signals include a time waveform having a plurality of crossing points on a time axis, said method characterized in that said at least one characteristic of the speech signals is indicative of the number of crossing points in a signal segment.
  - 5. The method of claim 4, wherein each of the signal segments comprises a number of signal samples, said method characterized in that said at least one characteristic of the signal segments is indicative of a ratio of the number of crossing points in the signal segment and the number of signal samples in said signal segment.
  - 6. The method of claim 1, wherein said at least one signal characteristic of the speech signals is indicative of energy in the signal segments.
  - 7. The method of claim 1, characterized in that said at least one signal characteristic of the speech signals is indicative of a ratio of an energy of a second derivative of the speech signals and an energy in the speech signals.
  - 8. The method of claim 5, wherein the plurality of classes include a voiced sound and a stop consonant, said method characterized in that the speech signals are classified as the voiced sound if the ratio is smaller than a predetermined value and the speech signals are classified as the stop consonant if the ratio is greater than the predetermined value.
  - 9. The method of claim 5, wherein the plurality of classes include a sibilant class and a non-sibilant class, said method characterized in that the speech signals are classified as the sibilant class if the ratio is greater than a predetermined value, and the speech signals are classified as the non-sibilant class if the ratio is smaller than or equal to the predetermined value.
  - 10. The method of claim 9, wherein said at least one signal characteristic of the speech signals is indicative of a further ratio of an energy of a second derivative of the speech signals and an energy in the speech signals, said method further characterized in that the speech signals are classified as the sibilant class if the further ratio is also greater than a further predetermined value.
  - 11. The method of claim 9, wherein each of the speech spectra has a first spectral portion in a lower frequency range and a second spectral portion in a higher frequency range, said method characterized in that the second spectral portion is enhanced for providing the modified transformed segments if the speech signals are classified as the sibilant class.
  - 12. The method of claim 9, wherein each of the speech spectra has a first spectral portion in a lower frequency range and a second spectral portion in a higher frequency range, said method characterized in that the second spectral portion is attenuated for providing the modified transformed segments if the speech signals are classified as the non-sibilant class.
  - 13. The method of claim 1, wherein each of the speech spectra has a first spectral portion in a lower frequency range and a second spectral portion in a higher frequency range, said method further characterized by smoothing the second spectral portion by an averaging operation prior to converting the modified transformed segments into the speech data in the time domain.

14. A network device in a telecommunications network, wherein the network device is capable of receiving data indicative of speech;
- and partitioning the received data into a plurality of signal segments having speech signals in a time domain, said network device characterized by an upsampling module for upsampling the signal segments for providing upsampled segments in the time domain;
  
  a transform module for converting the upsampled segments into a plurality of transformed segments having speech spectra in a frequency domain;
  
  a classification algorithm for classifying the speech signals into a plurality of classes based on at least one signal characteristic of the speech signals; and
  
  an adjustment algorithm for modifying the speech spectra in the frequency domain based on the classes for providing modified transformed segments.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 15. The device of claim 14, further characterized by an inverse transform module for converting the modified transformed segments into speech data in the time domain.
  - 16. The device of claim 14, wherein each of the signal segments comprises a number of signal samples for sampling a waveform having a plurality of crossing points on a time axis, said device characterized in that the classification algorithm is adapted to classify the speech signals based on a ratio of the number of crossing points and the number of signal samples in at least one signal segment.
  - 17. The device of claim 14, characterized in that the classification algorithm is adapted to classify the speech signals based on a ratio of an energy of a second derivative in the speech signal and an energy in at least one signal segment.
  - 18. The device of claim 17, wherein each of the signal segments comprises a number of signal samples for sampling a waveform having a plurality of crossing points on a time axis, said device further characterized in that the classification algorithm is adapted to classify the speech signals also based on a further ratio of the number of crossing points and the number of signal samples in said at least one signal segment.
  - 19. The device of claim 14, wherein the plurality of classes include a sibilant class and a non-sibilant class, and each of the speech spectra has a first spectral portion in a lower frequency range and a second spectral portion in a higher frequency range, said device characterized in that the adjustment algorithm is adapted to enhance the second spectral portion if the speech signals are classified as the sibilant class, and attenuate the second spectral portion if the speech signals are classified as the non-sibilant class.
  - 20. The device of claim 14, wherein each of the speech spectra has a first spectral portion in a lower frequency range and a second spectral portion in a higher frequency range, said device further characterized in that the adjustment algorithm is adapted to smooth the second spectral portion by an averaging operation.
  - 21. The device of claim 19, further characterized in that the adjustment algorithm is adapted to smooth the second spectral portion by an averaging operation.
  - 22. The device of claim 14, comprising a mobile terminal in the telecommunications network.
  - 23. The device of claim 14, comprising a base station in the telecommunications network.
  - 24. The device of claim 14, comprising a transcoder in the telecommunications network.

25. A sound classification algorithm for use in a speech decoder, wherein speech data in the speech decoder is partitioned into a plurality of signal segments having speech signals in a time domain and each signal segment includes a number of signal samples, and wherein the speech signals include a time waveform having a plurality of crossing points on a time axis, said classification algorithm characterized by classifying the speech signals into a plurality of classes based on a ratio of the number of crossing points and the number of signal samples in at least one signal segment.
- View Dependent Claims (26, 27, 28, 29)
- - 26. The sound classification algorithm of claim 25, wherein the speech signals are classified into a sibilant class and a non-sibilant class, said classification algorithm characterized in that the speech signals are classified as the sibilant class if the ratio is greater than a predetermined value.
  - 27. The algorithm of claim 25, characterized in that said classifying is also based on a further ratio of an energy of a second derivative of a second derivative of the speech signal and an energy in said at least one signal segment.
  - 28. The sound classification algorithm of claim 27, wherein the speech signals are classified into a sibilant class and a non-sibilant class, said classification algorithm characterized in that the speech signals are classified as the sibilant class if the ratio is greater than a first predetermined value and the further ratio is greater than a second predetermined value.
  - 29. The sound classification algorithm of claim 28, characterized in that the first predetermined value is substantially equal to 0.6, and the second predetermined value is substantially equal to 8.

30. A spectral adjustment algorithm for use in a speech decoder capable of receiving speech data, partitioning speech data into a plurality of signal segments having speech signals in the time domain, upsampling the signal segments for providing upsampled segments, and converting the upsampled segments into a plurality of transformed segments, each having a first speech spectral portion in a first frequency range and a second speech spectral portion in a second frequency range higher than the first frequency range, said adjustment algorithm characterized by enhancing the second speech spectral portion, if the speech signals are classified as a sibilant class, and attenuating the second speech spectral portion, if the speech signals are classified as a non-sibilant class.
- View Dependent Claims (31, 32)
- - 31. The spectral adjustment algorithm of claim 30, further characterized by smoothing the second speech spectral portion by an averaging operation.
  - 32. The spectral adjustment algorithm of claim 30, wherein when the speech signals in at least two consecutive signal segments are classified as the sibilant class, said at least two consecutive signal segments including a leading segment and at least one following segment, said adjustment algorithm characterized by enhancing the second speech spectral portion in the leading segment by a first factor, and enhancing the second speech spectral portion in said at least one following segment by a second factor greater than the first factor.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nokia Corporation
Original Assignee
Nokia Corporation
Inventors
Kayhko, Kimmo, Kallio, Loura, Kajala, Matti, Valve, Paivi, Alku, Paavo

Application Number

US10/341,332
Publication Number

US 20040138876A1
Time in Patent Office

Days
Field of Search
US Class Current

704/209
CPC Class Codes

G10L 21/038 using band spreading techni...

G10L 25/93 Discriminating between voic...

Method and apparatus for artificial bandwidth expansion in speech processing

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

74 Citations

32 Claims

Specification

Use Cases

Quick Links

Others

Method and apparatus for artificial bandwidth expansion in speech processing

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

74 Citations

32 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others