Audio classification based on perceptual quality for low or medium bit rates

US 9,589,570 B2
Filed: 09/13/2013
Issued: 03/07/2017
Est. Priority Date: 09/18/2012
Status: Active Grant

First Claim

Patent Images

1. A method for encoding signals, the method comprising:

receiving, by an audio encoder, a digital signal comprising audio data, wherein the audio data includes data of speech and non-speech sounds;

classifying, by the audio encoder, the digital signal as an AUDIO signal based on the audio data in the digital signal;

determining, by the audio encoder, whether classifying conditions are satisfied, wherein the classifying conditions include;

pitch differences between sub-frames in the digital signal are less than a first threshold, a coding rate of the digital signal is below a second threshold, an average normalized pitch correlation value for the sub-frames in the digital signal is greater than a third threshold and a smoothed pitch correlation obtained according to the average normalized pitch correlation value is greater than a fourth threshold, wherein each of the pitch differences is an absolute value of the difference between two pitch values corresponding to two sub-frames respectively;

re-classifying, by the audio encoder, the digital signal as a VOICED signal when the classifying conditions are satisfied;

encoding, by the audio encoder, the digital signal in the time-domain if the digital signal is classified as a VOICED signal; and

encoding, by the audio encoder, the digital signal in the frequency-domain if the digital signal is classified as an AUDIO signal.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The quality of encoded signals can be improved by reclassifying AUDIO signals carrying non-speech data as VOICE signals when periodicity parameters of the signal satisfy one or more criteria. In some embodiments, only low or medium bit rate signals are considered for re-classification. The periodicity parameters can include any characteristic or set of characteristics indicative of periodicity. For example, the periodicity parameter may include pitch differences between subframes in the audio signal, a normalized pitch correlation for one or more subframes, an average normalized pitch correlation for the audio signal, or combinations thereof. Audio signals which are re-classified as VOICED signals may be encoded in the time-domain, while audio signals that remain classified as AUDIO signals may be encoded in the frequency-domain.

Citations

14 Claims

1. A method for encoding signals, the method comprising:
- receiving, by an audio encoder, a digital signal comprising audio data, wherein the audio data includes data of speech and non-speech sounds;
  
  classifying, by the audio encoder, the digital signal as an AUDIO signal based on the audio data in the digital signal;
  
  determining, by the audio encoder, whether classifying conditions are satisfied, wherein the classifying conditions include;
  
  pitch differences between sub-frames in the digital signal are less than a first threshold, a coding rate of the digital signal is below a second threshold, an average normalized pitch correlation value for the sub-frames in the digital signal is greater than a third threshold and a smoothed pitch correlation obtained according to the average normalized pitch correlation value is greater than a fourth threshold, wherein each of the pitch differences is an absolute value of the difference between two pitch values corresponding to two sub-frames respectively;
  
  re-classifying, by the audio encoder, the digital signal as a VOICED signal when the classifying conditions are satisfied;
  
  encoding, by the audio encoder, the digital signal in the time-domain if the digital signal is classified as a VOICED signal; and
  
  encoding, by the audio encoder, the digital signal in the frequency-domain if the digital signal is classified as an AUDIO signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the average normalized pitch correlation value for the sub-frames in the digital signal is obtained by:
    - determining a normalized pitch correlation value for each sub-frame in the digital signal; and
      
      dividing the sum of all normalized pitch correlation values by the number of the sub-frames in the digital signal to obtain the average normalized pitch correlation value.
  - 3. The method of claim 1, wherein the digital signal carries non-speech data.
  - 4. The method of claim 1, wherein the digital signal carries music data.
  - 5. The method of claim 1, wherein, the number of the sub-frames is 4, the pitch differences comprises the first pitch difference dpit1, the second pitch difference dpit2, and the third pitch difference dpit3, wherein, the dpit1, the dpit2 and the dpit3 are calculated as follows:
    - dpit1=|P₁−
      
      P₂|
      dpit2=|P₂−
      
      P₃|,
      dpit3=|P₃−
      
      P₄|wherein, P₁, P₂, P₃, and P₄are four pitch values corresponding to the sub-frames respectively;
      
      accordingly, and wherein the classifying condition that the pitch differences between the sub-frames in the digital signal are less than a threshold comprises;
      
      all the dpit1, the dpit2 and the dpit 3 are less than the first threshold.
  - 6. The method of claim 5, wherein, P₁, P₂, P₃, and P₄are the best pitch values found in a pitch range from a minimum pitch limit PIT_MIN to a maximum pitch limit PIT_MAX for each sub-frame.
  - 7. The method of claim 1, wherein, the smoothed pitch correlation from a previous to a current frame is obtained by following formula:
    - Voicing_sm=(3·
      
      Voicing_sm+Voicing)/4wherein, the Voicing_sm at the left side of the formula denotes the smoothed pitch correlation of the current frame, the Voicing_sm at the right side of the formula denotes the smoothed pitch correlation of the previous frame and Voicing denotes the average normalized pitch correlation value for the sub-frames in the digital signal.

8. An audio encoder comprising:
- at least one processor; and
  
  a computer readable storage medium storing programming for execution by the at least one processor, the programming including instructions to;
  
  receive a digital signal comprising audio data, wherein the audio data includes data of speech and non-speech sounds;
  
  classify the digital signal as an AUDIO signal based on the audio data in the digital signal;
  
  determine whether classifying conditions are satisfied, wherein, the classifying conditions include;
  
  pitch differences between sub-frames in the digital signal are less than a first threshold, a coding rate of the digital signal is below a second threshold, an average normalized pitch correlation value for the sub-frames in the digital signal is greater than a third threshold and a smoothed pitch correlation obtained according to the average normalized pitch correlation value is greater than a fourth threshold;
  
  wherein, each of the pitch differences is an absolute value of the difference between two pitch values corresponding to two sub-frames respectively;
  
  re-classify the digital signal as a VOICED signal when the classifying conditions are satisfied;
  
  encode the digital signal in the time-domain if the digital signal is classified as a VOICED signal; and
  
  encode the digital signal in the frequency-domain if the digital signal is classified as an AUDIO signal.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The audio encoder of claim 8, wherein the instructions to determine an average normalized pitch correlation value for the sub-frames in the digital signal include instructions to:
    - determine a normalized pitch correlation value for each sub-frame in the digital signal; and
      
      divide the sum of all normalized pitch correlation values by the number of the sub-frames in the digital signal to obtain the average normalized pitch correlation value.
  - 10. The audio encoder of claim 8, wherein the digital signal carries non-speech data.
  - 11. The audio encoder of claim 8, wherein the digital signal carries music data.
  - 12. The audio encoder of claim 8, wherein, the number of the sub-frames is 4, the pitch differences comprises the first pitch difference dpit1, the second pitch difference dpit2, the third pitch difference dpit3, wherein, the dpit1, the dpit2 and the dpit3 are calculated as follows:
    - dpit1=|P₁−
      
      P₂|
      dpit2=|P₂−
      
      P₃|,
      dpit3=|P₃−
      
      P₄|wherein, P₁, P₂, P₃, and P₄are four pitch values corresponding to the sub-frames respectively;
      
      accordingly, and wherein the classifying condition that the pitch differences between sub-frames in the digital signal are less than a threshold comprises;
      
      all the dpit1, the dpit2 and the dpit 3 are less than the first threshold.
  - 13. The audio encoder of claim 12, wherein, P₁, P₂, P₃, and P₄are the best pitch values found in a pitch range from a minimum pitch limit PIT_MIN to a maximum pitch limit PIT_MAX for each sub-frame.
  - 14. The audio encoder of claim 8, wherein, the smoothed pitch correlation from a previous to a current frame is obtained by following formula:
    - Voicing_sm=(3·
      
      Voicing_sm+Voicing)/4wherein, the Voicing_sm at the left side of the formula denotes the smoothed pitch correlation of the current frame, the Voicing_sm at the right side of the formula denotes the smoothed pitch correlation of the previous frame and Voicing denotes the average normalized pitch correlation value for the sub-frames in the digital signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Huawei Technologies Co., Ltd. (Huawei Investment & Holding Co., Ltd.)
Original Assignee
Huawei Technologies Co., Ltd. (Huawei Investment & Holding Co., Ltd.)
Inventors
Gao, Yang
Primary Examiner(s)
Sirjani, Fariba

Application Number

US14/027,052
Publication Number

US 20140081629A1
Time in Patent Office

1,271 Days
Field of Search

704/208
US Class Current

1/1
CPC Class Codes

G10L 19/002   Dynamic bit allocation for ...

G10L 19/20   using sound class specific ...

G10L 19/24   Variable rate codecs, e.g. ...

G10L 2025/937   Signal energy in various fr...

G10L 25/06   the extracted parameters be...

G10L 25/90   Pitch determination of spee...

G10L 25/93   Discriminating between voic...

Audio classification based on perceptual quality for low or medium bit rates

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Audio classification based on perceptual quality for low or medium bit rates

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links