Methods and devices for source controlled variable bit-rate wideband speech coding
First Claim
1. A source-controlled Variable bit-rate Multi-mode WideBand (VMR-WB) codec comprising a unit operable with an Adaptive Multi-Rate wideband (AMR-WB) codec, where in a VMR-WB encoding/AMR-WB decoding case, speech frames are encoded in an AMR-WB interoperable mode of a VMR-WB encoder using one of bit rates corresponding to Interoperable-Full Rate (I-FR) for active speech frames, Interoperable-Half Rate (I-HR) at least for dim-and-burst signaling, Quarter Rate-Comfort Noise Generator (CNG-QR) to encode at least relevant background noise frames and Eighth Rate-Comfort Noise Generator (CNG-ER) frames for background noise frames not encoded as CNG-QR frames, said unit responsive to a case that voice activity is not detected for using CNG-ER encoding, further responsive to a case that voice activity is detected, and responsive to a voiced versus unvoiced classification such that if a frame is classified as unvoiced, the frame is encoded with one of Unvoiced HR or Unvoiced QR encoding, further responsive to a frame not being classified as unvoiced for using a stable voiced classification, and if the frame is classified as stable voiced, encoded the frame using Voiced HR encoding, else assuming the frame to likely contain a non-stationary speech segment for using an appropriate FR encoding, whereas a frame with low energy, and not detected as at least a background or an unvoiced frame, is encoded using generic HR coding to reduce the average data rate;
- an unvoiced classification decision being based on at least some of a voicing measure {overscore (r)}x, a spectral tilt et, an energy variation within a frame dE, and a relative frame energy Erel, where decision thresholds are set based at least in part on an operating mode comprising a required average data rate.
2 Assignments
0 Petitions
Accused Products
Abstract
Speech signal classification and encoding systems and methods are disclosed herein. The signal classification is done in three steps each of them discriminating a specific signal class. First, a voice activity detector (VAD) discriminates between active and inactive speech frames. If an inactive speech frame is detected (background noise signal) then the classification chain ends and the frame is encoded with comfort noise generation (CNG). If an active speech frame is detected, the frame is subjected to a second classifier dedicated to discriminate unvoiced frames. If the classifier classifies the frame as unvoiced speech signal, the classification chain ends, and the frame is encoded using a coding method optimized for unvoiced signals. Otherwise, the speech frame is passed through to the “stable voiced” classification module. If the frame is classified as stable voiced frame, then the frame is encoded using a coding method optimized for stable voiced signals. Otherwise, the frame is likely to contain a non-stationary speech segment such as a voiced onset or rapidly evolving voiced speech signal. In this case a general-purpose speech coder is used at a high bit rate for sustaining good subjective quality.
-
Citations
85 Claims
-
1. A source-controlled Variable bit-rate Multi-mode WideBand (VMR-WB) codec comprising a unit operable with an Adaptive Multi-Rate wideband (AMR-WB) codec, where in a VMR-WB encoding/AMR-WB decoding case, speech frames are encoded in an AMR-WB interoperable mode of a VMR-WB encoder using one of bit rates corresponding to Interoperable-Full Rate (I-FR) for active speech frames, Interoperable-Half Rate (I-HR) at least for dim-and-burst signaling, Quarter Rate-Comfort Noise Generator (CNG-QR) to encode at least relevant background noise frames and Eighth Rate-Comfort Noise Generator (CNG-ER) frames for background noise frames not encoded as CNG-QR frames, said unit responsive to a case that voice activity is not detected for using CNG-ER encoding, further responsive to a case that voice activity is detected, and responsive to a voiced versus unvoiced classification such that if a frame is classified as unvoiced, the frame is encoded with one of Unvoiced HR or Unvoiced QR encoding, further responsive to a frame not being classified as unvoiced for using a stable voiced classification, and if the frame is classified as stable voiced, encoded the frame using Voiced HR encoding, else assuming the frame to likely contain a non-stationary speech segment for using an appropriate FR encoding, whereas a frame with low energy, and not detected as at least a background or an unvoiced frame, is encoded using generic HR coding to reduce the average data rate;
- an unvoiced classification decision being based on at least some of a voicing measure {overscore (r)}x, a spectral tilt et, an energy variation within a frame dE, and a relative frame energy Erel, where decision thresholds are set based at least in part on an operating mode comprising a required average data rate.
-
2. A method for encoding a sampled speech signal comprising speech frames, the method comprising:
-
determining whether a current frame of the sampled speech signal is an active speech frame or an inactive speech frame, if said current frame is an active speech frame, performing a classification procedure to determine whether the current frame is an unvoiced frame, said classification procedure comprising examining at least three of the following parameters in order to determine whether the current frame is an unvoiced frame;
a) a voicing measure (rx,{overscore (r)}x);
b) a spectral tilt measure (etilt , et);
c) an energy variation within the current frame (dE);
d) a relative energy of the current frame (Erel);
and when the current frame is classified as an unvoiced frame by said classification procedure, encoding the current frame using an unvoiced signal coding algorithm. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33)
-
-
34. A device for encoding a sampled speech signal comprising speech frames, the device comprising:
-
a voice activity detector for determining whether frames of the sampled speech signal are active speech frames or inactive speech frames;
a classification unit arranged to perform a classification procedure on active speech frames to determine whether said active speech frames are unvoiced frames, said classification procedure comprising examining at least three of the following parameters in order to determine whether a current frame is an unvoiced frame;
a) a voicing measure (rx,{overscore (r)}x);
b) a spectral tilt measure (etilt,et);
c) an energy variation within the current frame (dE);
d) a relative energy of the current frame (Erel);
said device being arranged to encode the current frame using an unvoiced signal coding algorithm when the classification unit classifies the current frame as an unvoiced frame. - View Dependent Claims (35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 80, 81)
-
-
66. A device for encoding a sampled speech signal comprising speech frames, the device comprising:
-
means for determining whether a current frame of the sampled speech signal is an active speech frame or an inactive speech frame, means, responsive to said current frame being an active speech frame, for performing a classification procedure to determine whether the current frame is an unvoiced frame, said classification procedure comprising examining at least three of the following parameters in order to determine whether the current frame is an unvoiced frame;
a) a voicing measure (rx,{overscore (r)}x);
b) a spectral tilt measure (etilt, et);
c) an energy variation within the current frame (dE);
d) a relative energy of the current frame (Erel);
and means for encoding the current frame using an unvoiced signal coding algorithm when the current frame is classified as an unvoiced frame by said classification procedure.
-
-
67. A speech encoder, responsive to a current frame being classified as an active speech frame, for encoding said current frame using an unvoiced signal coding algorithm,
wherein an active speech frame is further classified as an active unvoiced speech frame by examining at least three parameters selected from the set: - a voicing measure (rx,{overscore (r)}x), a spectral tilt measure (etilt,et), an energy variation within the current frame (dE), and a relative energy of the current frame (Erel).
-
68. A program of machine-readable instructions, tangibly embodied on an information bearing medium and executable by a digital data processor, to perform actions directed toward encoding a sampled speech signal comprising speech frames, the actions comprising:
-
determining whether a current frame of the sampled speech signal is an active speech frame or an inactive speech frame, performing a classification procedure on an active speech frame to determine whether the current frame is an unvoiced frame, said classification procedure comprising examining at least three of the following parameters in order to determine whether the current frame is an unvoiced frame;
a) a voicing measure (rx,{overscore (r)}x);
b) a spectral tilt measure (etilt,et);
c) an energy variation within the current frame (dE);
d) a relative energy of the current frame (Erel);
and encoding the current frame using an unvoiced signal coding algorithm when the current frame is classified as an unvoiced frame by said classification procedure. - View Dependent Claims (69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 82, 83, 84, 85)
-
Specification