Audio Signal Classification Method and Apparatus

US 20160155456A1
Filed: 02/05/2016
Published: 06/02/2016
Est. Priority Date: 08/06/2013
Status: Active Grant

First Claim

Patent Images

1. An audio signal classification method, comprising:

determining, according to voice activity of a current audio frame, whether to obtain a current frequency spectrum fluctuation parameter of the current audio frame and store the current frequency spectrum fluctuation parameter, wherein a frequency spectrum fluctuation parameter denotes an energy fluctuation of a frequency spectrum of an audio signal;

updating, according to whether the audio frame is percussive music, stored one or more frequency spectrum fluctuation parameters; and

classifying the current audio frame as a speech frame or a music frame according to statistics of a part or all of effective data of the stored frequency spectrum fluctuation parameters.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An audio signal classification method and apparatus, where the method includes determining, according to voice activity of a current audio frame, whether to obtain a frequency spectrum fluctuation of the current audio frame and store the frequency spectrum fluctuation in a frequency spectrum fluctuation memory, and updating, according to whether the audio frame is percussive music or activity of a historical audio frame, frequency spectrum fluctuations stored in the frequency spectrum fluctuation memory, and classifying the current audio frame as a speech frame or a music frame according to statistics of a part or all of effective data of the frequency spectrum fluctuations stored in the frequency spectrum fluctuation memory.

45 Citations

View as Search Results

30 Claims

1. An audio signal classification method, comprising:
- determining, according to voice activity of a current audio frame, whether to obtain a current frequency spectrum fluctuation parameter of the current audio frame and store the current frequency spectrum fluctuation parameter, wherein a frequency spectrum fluctuation parameter denotes an energy fluctuation of a frequency spectrum of an audio signal;
  
  updating, according to whether the audio frame is percussive music, stored one or more frequency spectrum fluctuation parameters; and
  
  classifying the current audio frame as a speech frame or a music frame according to statistics of a part or all of effective data of the stored frequency spectrum fluctuation parameters.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method according to claim 1, wherein determining, according to voice activity of the current audio frame, whether to obtain the current frequency spectrum fluctuation parameter of the current audio frame and store the current frequency spectrum fluctuation parameter comprises storing the current frequency spectrum fluctuation parameter of the current audio frame when the current audio frame is an active frame.
  - 3. The method according to claim 1, wherein determining, according to voice activity of the current audio frame, whether to obtain the current frequency spectrum fluctuation parameter of the current audio frame and store the current frequency spectrum fluctuation parameter comprises storing the current frequency spectrum fluctuation parameter of the current audio frame when the current audio frame is an active frame, and the current audio frame does not belong to an energy attack.
  - 4. The method according to claim 1, wherein determining, according to voice activity of the current audio frame, whether to obtain the current frequency spectrum fluctuation parameter of the current audio frame and store the current frequency spectrum fluctuation parameter comprises storing the current frequency spectrum fluctuation parameter of the current audio frame when the current audio frame is an active frame, and none of multiple consecutive frames comprising the current audio frame and a historical frame of the current audio frame belongs to an energy attack.
  - 5. The method according to claim 1, wherein updating, according to whether the current audio frame is percussive music, stored one or more frequency spectrum fluctuation parameters comprises modifying values of the stored frequency spectrum fluctuation parameters when the current audio frame belongs to percussive music.
  - 6. The method according to claim 1, wherein classifying the current audio frame as the speech frame or the music frame according to statistics of the part or all of effective data of the stored frequency spectrum fluctuation parameters comprises:
    - obtaining an average value of the part or all of the effective data of the stored frequency spectrum fluctuation parameters; and
      
      classifying the current audio frame as the music frame when the obtained average value satisfies a music classification condition.
  - 7. The method according to claim 1, further comprising:
    - obtaining a frequency spectrum high-frequency-band peakiness parameter, a frequency spectrum correlation degree parameter, and a linear prediction residual energy tilt parameter of the current audio frame, wherein the frequency spectrum high-frequency-band peakiness parameter denotes a peakiness or an energy acutance, on a high frequency band, of a frequency spectrum of the current audio frame, wherein the frequency spectrum correlation degree parameter denotes stability, between adjacent frames, of a signal harmonic structure of the current audio frame, and wherein the linear prediction residual energy tilt parameter denotes an extent to which linear prediction residual energy of the audio signal changes as a linear prediction order increases;
      
      determining, according to the voice activity of the current audio frame, whether to store the frequency spectrum high-frequency-band peakiness parameter, the frequency spectrum correlation degree parameter, and the linear prediction residual energy tilt parameter, andwherein classifying the current audio frame as the speech frame or the music frame according to statistics of the part or all of effective data of the stored frequency spectrum fluctuation parameters comprises;
      
      obtaining an average value of the part or all of effective data of the stored frequency spectrum fluctuation parameters, an average value of a part or all of effective data of stored frequency spectrum high-frequency-band peakiness parameters, an average value of a part or all of effective data of stored frequency spectrum correlation degrees parameters, and a variance of a part or all of effective data of stored linear prediction residual energy tilt parameters separately; and
      
      classifying the current audio frame as the music frame when a music classifying condition comprising one of the following conditions is satisfied;
      
      the average value of the effective data of the stored frequency spectrum fluctuation parameters is less than a first threshold;
      
      the average value of the effective data of the stored frequency spectrum high-frequency-band peakiness parameters is greater than a second threshold;
      
      the average value of the effective data of the stored frequency spectrum correlation degree parameters is greater than a third threshold; and
      
      the variance of the effective data of the stored linear prediction residual energy tilt parameters is less than a fourth threshold.
  - 8. The method according to claim 7, wherein the music classifying condition further comprises a voicing_cnt, wherein the voicing_cnt is less than a fifth threshold, and wherein the voicing_cnt denotes a quantity of voicing parameters whose values are greater than a sixth threshold in a voicing historical buffer which is used to store a voicing parameter of the current audio frame when the voicing parameter of the current audio frame is needed to be obtained and stored.
  - 9. The method according to claim 1, wherein the stored one or more frequency spectrum fluctuation parameters are stored in a frequency spectrum fluctuation buffer when the current frequency spectrum fluctuation parameter is determined to be obtained and stored, and wherein the current frequency spectrum fluctuation parameter is stored to the frequency spectrum fluctuation buffer.

10. An audio signal classification method, comprising:
- determining, according to voice activity of a current audio frame, whether to obtain a current frequency spectrum fluctuation parameter of the current audio frame and store the current frequency spectrum fluctuation parameter, wherein a frequency spectrum fluctuation parameter denotes an energy fluctuation of a frequency spectrum of an audio signal;
  
  updating, according to activity of a historical audio frame, stored one or more frequency spectrum fluctuation parameters; and
  
  classifying the current audio frame as a speech frame or a music frame according to statistics of a part or all of effective data of the stored frequency spectrum fluctuation parameters.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 11. The method according to claim 10, wherein determining, according to voice activity of the current audio frame, whether to obtain the current frequency spectrum fluctuation parameter of the current audio frame and store the current frequency spectrum fluctuation parameter comprises storing the current frequency spectrum fluctuation parameter of the current audio frame when the current audio frame is an active frame.
  - 12. The method according to claim 10, wherein determining, according to voice activity of the current audio frame, whether to obtain the current frequency spectrum fluctuation parameter of the current audio frame and store the current frequency spectrum fluctuation parameter comprises storing the current frequency spectrum fluctuation parameter of the current audio frame when the current audio frame is an active frame, and the current audio frame does not belong to an energy attack.
  - 13. The method according to claim 10, wherein determining, according to voice activity of the current audio frame, whether to obtain the current frequency spectrum fluctuation parameter of the current audio frame and store the current frequency spectrum fluctuation parameter comprises storing the current frequency spectrum fluctuation parameter of the current audio frame when the current audio frame is an active frame, and none of multiple consecutive frames comprising the current audio frame and a historical frame of the current audio frame belongs to an energy attack.
  - 14. The method according to claim 10, wherein updating, according to whether the current audio frame is percussive music, stored one or more frequency spectrum fluctuation parameters comprises modifying values of the stored frequency spectrum fluctuation parameters when the current audio frame belongs to percussive music.
  - 15. The method according to claim 10, wherein updating, according to activity of the historical audio frame, stored one or more frequency spectrum fluctuation parameters comprises modifying data of other stored frequency spectrum fluctuation parameters except the current frequency spectrum fluctuation parameter into ineffective data when the current audio frame is an active frame and a previous audio frame is an inactive frame.
  - 16. The method according to claim 10, wherein updating, according to activity of the historical audio frame, stored one or more frequency spectrum fluctuation parameters comprises modifying the current frequency spectrum fluctuation parameter into a second value when the current audio frame is the active frame and a historical classification result is a music signal and the current frequency spectrum fluctuation parameter is greater than the second value.
  - 17. The method according to claim 10, wherein classifying the current audio frame as the speech frame or the music frame according to statistics of the part or all of effective data of the stored frequency spectrum fluctuation parameters comprises:
    - obtaining an average value of the part or all of the effective data of the stored frequency spectrum fluctuation parameters; and
      
      classifying the current audio frame as the music frame when the obtained average value satisfies a music classification condition.
  - 18. The method according to claim 10, further comprising:
    - obtaining a frequency spectrum high-frequency-band peakiness parameter, a frequency spectrum correlation degree parameter, and a linear prediction residual energy tilt parameter of the current audio frame, wherein the frequency spectrum high-frequency-band peakiness parameter denotes a peakiness or an energy acutance, on a high frequency band, of a frequency spectrum of the current audio frame, wherein the frequency spectrum correlation degree parameter denotes stability, between adjacent frames, of a signal harmonic structure of the current audio frame, and wherein the linear prediction residual energy tilt parameter denotes an extent to which linear prediction residual energy of the audio signal changes as a linear prediction order increases;
      
      determining, according to the voice activity of the current audio frame, whether to store the frequency spectrum high-frequency-band peakiness parameter, the frequency spectrum correlation degree parameter, and the linear prediction residual energy tilt parameter, andwherein classifying the current audio frame as the speech frame or the music frame according to statistics of the part or all of effective data of the stored frequency spectrum fluctuation parameters comprises;
      
      obtaining an average value of the part or all of effective data of the stored frequency spectrum fluctuation parameters, an average value of a part or all of effective data of stored frequency spectrum high-frequency-band peakiness parameters, an average value of a part or all of effective data of stored frequency spectrum correlation degrees parameters, and a variance of a part or all of effective data of stored linear prediction residual energy tilt parameters separately; and
      
      classifying the current audio frame as the music frame when a music classifying condition comprising one of the following conditions is satisfied;
      
      the average value of the effective data of the stored frequency spectrum fluctuation parameters is less than a first threshold;
      
      the average value of the effective data of the stored frequency spectrum high-frequency-band peakiness parameters is greater than a second threshold;
      
      the average value of the effective data of the stored frequency spectrum correlation degree parameters is greater than a third threshold; and
      
      the variance of the effective data of the stored linear prediction residual energy tilt parameters is less than a fourth threshold.
  - 19. The method according to claim 18, wherein the music classifying condition further comprises a voicing_cnt, wherein the voicing_cnt is less than a fifth threshold, and wherein the voicing_cnt denotes a quantity of voicing parameters whose values are greater than a sixth threshold in a voicing historical buffer which is used to store a voicing parameter of the current audio frame when the voicing parameter of the current audio frame is needed to be obtained and stored.
  - 20. The method according to claim 10, wherein the stored one or more frequency spectrum fluctuation parameters are stored in a frequency spectrum fluctuation buffer when the current frequency spectrum fluctuation parameter is determined to be obtained and stored, and wherein the current frequency spectrum fluctuation parameter is stored to the frequency spectrum fluctuation buffer.

21. An audio signal classification apparatus configured to classify an input audio signal, comprising:
- a memory; and
  
  a processor coupled to the memory, wherein the processor is configured to determine, according to voice activity of a current audio frame, whether to obtain and store a current frequency spectrum fluctuation parameter of the current audio frame, wherein the current frequency spectrum fluctuation parameter denotes an energy fluctuation of a frequency spectrum of an audio signal,wherein the memory is configured to store one or more frequency spectrum fluctuation parameters when the processor outputs a result that the frequency spectrum fluctuation parameter needs to be stored;
  
  wherein the processor is further configured to;
  
  update, according to whether the audio frame is percussive music or activity of a historical audio frame, the frequency spectrum fluctuation parameters stored in the memory; and
  
  classify the current audio frame as a speech frame or a music frame according to statistics of a part or all of effective data of the frequency spectrum fluctuation parameters stored in the memory.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 22. The apparatus according to claim 21, wherein the processor is further configured to output a result that the current frequency spectrum fluctuation parameter of the current audio frame needs to be stored when the current audio frame is an active frame.
  - 23. The apparatus according to claim 21, wherein the processor is further configured to output a result that the current frequency spectrum fluctuation parameter of the current audio frame needs to be stored when the current audio frame is an active frame, and the current audio frame does not belong to an energy attack.
  - 24. The apparatus according to claim 21, wherein the processor is further configured to output a result that the current frequency spectrum fluctuation parameter of the current audio frame needs to be stored when the current audio frame is an active frame, and none of multiple consecutive frames comprising the current audio frame and a historical frame of the current audio frame belongs to an energy attack.
  - 25. The apparatus according to claim 21, wherein the processor is further configured to modify one or more values of frequency spectrum fluctuation parameters stored in the memory when the current audio frame belongs to percussive music.
  - 26. The apparatus according to claim 21, wherein the processor is further configured to:
    - modify data of other frequency spectrum fluctuation parameters stored in the memory except the current frequency spectrum fluctuation parameter into ineffective data when the current audio frame is an active frame, and a previous audio frame is an inactive frame;
      
      ormodify the current frequency spectrum fluctuation parameter of the current audio frame into a second value when the current audio frame is the active frame, and a historical classification result is a music signal and the current frequency spectrum fluctuation parameter is greater than the second value.
  - 27. The apparatus according to claim 21, wherein the processor is further configured to:
    - obtain an average value of the part or all of the effective data of the frequency spectrum fluctuation parameters stored in the memory;
      
      compare the average value with a music classification condition; and
      
      classify the current audio frame as the music frame when the obtained average value satisfies the music classification condition.
  - 28. The apparatus according to claim 21, wherein the processor is further configured to:
    - obtain a frequency spectrum high-frequency-band peakiness parameter, a frequency spectrum correlation degree parameter, and a linear prediction residual energy tilt parameter of the current audio frame, wherein the frequency spectrum high-frequency-band peakiness parameter denotes a peakiness or an energy acutance, on a high frequency band, of a frequency spectrum of the current audio frame, wherein the frequency spectrum correlation degree parameter denotes stability, between adjacent frames, of a signal harmonic structure of the current audio frame, and wherein the linear prediction residual energy tilt parameter denotes an extent to which linear prediction residual energy of the audio signal changes as a linear prediction order increases;
      
      determine, according to the voice activity of the current audio frame, whether to store the frequency spectrum high-frequency-band peakiness parameter, the frequency spectrum correlation degree parameter, and the linear prediction residual energy tilt parameter in the memory;
      
      store the frequency spectrum high-frequency-band peakiness parameter, the frequency spectrum correlation degree parameter, and the linear prediction residual energy tilt parameter when the frequency spectrum high-frequency-band peakiness parameter, the frequency spectrum correlation degree parameter, and the linear prediction residual energy tilt parameter need to be stored;
      
      obtain statistics of a part or all of effective data of the stored one or more frequency spectrum fluctuations, statistics of a part or all of effective data of stored one or more frequency spectrum high-frequency-band peakiness parameters, statistics of a part or all of effective data of stored one or more frequency spectrum correlation degrees parameters, and statistics of a part or all of effective data of stored one or more linear prediction residual energy tilts parameters, andclassify the audio frame as the speech frame or the music frame according to the statistics.
  - 29. The apparatus according to claim 28, wherein the processor is further configured to:
    - obtain an average value of the part or all of effective data of the stored frequency spectrum fluctuation parameters, an average value of the part or all of effective data of the stored frequency spectrum high-frequency-band peakiness parameters, an average value of the part or all of effective data of the stored frequency spectrum correlation degree parameters, and a variance of the part or all of effective data of the stored linear prediction residual energy tilt parameters separately; and
      
      classify the current audio frame as the music frame when a classifying condition comprising one of the following conditions is satisfied;
      
      the average value of the effective data of the stored frequency spectrum fluctuation parameters is less than a first threshold;
      
      the average value of the effective data of the stored frequency spectrum high-frequency-band peakiness parameters is greater than a second threshold;
      
      the average value of the effective data of the stored frequency spectrum correlation degree parameters is greater than a third threshold; and
      
      the variance of the effective data of the stored linear prediction residual energy tilt parameters is less than a fourth threshold.
  - 30. The apparatus according to claim 29, wherein the classifying condition further comprises a voicing_cnt, wherein the voicing_cnt is less than a fifth threshold, and wherein the voicing_cnt denotes a quantity of voicing parameters whose values are greater than a sixth threshold in a voicing historical buffer which is used to store a voicing parameter of the current audio frame when the voicing parameter of the current audio frame is needed to be obtained and stored.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Huawei Technologies Co., Ltd. (Huawei Investment & Holding Co., Ltd.)
Original Assignee
Huawei Technologies Co., Ltd. (Huawei Investment & Holding Co., Ltd.)
Inventors
Wang, Zhe

Granted Patent

US 10,090,003 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G10L 19/06   Determination or coding of ...

G10L 19/12   the excitation function bei...

G10L 2025/783   based on threshold decision

G10L 25/12   the extracted parameters be...

G10L 25/18   the extracted parameters be...

G10L 25/78   Detection of presence or ab...

G10L 25/81   for discriminating voice fr...

Audio Signal Classification Method and Apparatus

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

45 Citations

30 Claims

Specification

Use Cases

Quick Links

Others

Audio Signal Classification Method and Apparatus

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

45 Citations

30 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others