Method and apparatus for classifying an audio signal based on frequency spectrum fluctuation

US 10,090,003 B2
Filed: 02/05/2016
Issued: 10/02/2018
Est. Priority Date: 08/06/2013
Status: Active Grant

First Claim

Patent Images

1. An audio signal classification method, comprising:

storing, based on at least one condition being met, data of a frequency spectrum fluctuation parameter of a current audio frame of an audio signal into a memory where data of frequency spectrum fluctuation parameters of a plurality of audio frames are stored, wherein the at least one condition comprises the current audio frame being an active frame, and wherein a frequency spectrum fluctuation parameter denotes an energy fluctuation of a frequency spectrum of the audio signal;

determining whether the current audio frame is an active frame and a last audio frame preceding the current audio frame is an inactive frame;

upon determining that the current audio frame is an active frame and the last audio frame preceding the current audio frame is an inactive frame, modifying data of frequency spectrum fluctuation parameters of audio frames preceding the current audio frame stored in the memory into ineffective data, wherein data of frequency spectrum fluctuation parameters in the memory not having been modified into ineffective data are effective data; and

determining whether a current signal is percussive music, wherein the current signal comprises the current audio frame and a plurality of audio frames preceding the current audio frame;

upon determining that the current signal is percussive music, modifying effective data of the current audio frame and a plurality of audio frames preceding the current audio frame into a value less than or equal to a music threshold;

obtaining statistics of a part or all of the effective data in the memory; and

classifying the current audio frame as a speech frame or a music frame according to the statistics.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An audio signal classification method and apparatus, where the method includes determining, according to voice activity of a current audio frame, whether to obtain a frequency spectrum fluctuation of the current audio frame and store the frequency spectrum fluctuation in a frequency spectrum fluctuation memory, and updating, according to whether the audio frame is percussive music or activity of a historical audio frame, frequency spectrum fluctuations stored in the frequency spectrum fluctuation memory, and classifying the current audio frame as a speech frame or a music frame according to statistics of a part or all of effective data of the frequency spectrum fluctuations stored in the frequency spectrum fluctuation memory.

31 Citations

View as Search Results

13 Claims

1. An audio signal classification method, comprising:
- storing, based on at least one condition being met, data of a frequency spectrum fluctuation parameter of a current audio frame of an audio signal into a memory where data of frequency spectrum fluctuation parameters of a plurality of audio frames are stored, wherein the at least one condition comprises the current audio frame being an active frame, and wherein a frequency spectrum fluctuation parameter denotes an energy fluctuation of a frequency spectrum of the audio signal;
  
  determining whether the current audio frame is an active frame and a last audio frame preceding the current audio frame is an inactive frame;
  
  upon determining that the current audio frame is an active frame and the last audio frame preceding the current audio frame is an inactive frame, modifying data of frequency spectrum fluctuation parameters of audio frames preceding the current audio frame stored in the memory into ineffective data, wherein data of frequency spectrum fluctuation parameters in the memory not having been modified into ineffective data are effective data; and
  
  determining whether a current signal is percussive music, wherein the current signal comprises the current audio frame and a plurality of audio frames preceding the current audio frame;
  
  upon determining that the current signal is percussive music, modifying effective data of the current audio frame and a plurality of audio frames preceding the current audio frame into a value less than or equal to a music threshold;
  
  obtaining statistics of a part or all of the effective data in the memory; and
  
  classifying the current audio frame as a speech frame or a music frame according to the statistics.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method according to claim 1, wherein the at least one condition further comprises:
    - the current audio frame does not belong to an energy attack.
  - 3. The method according to claim 1, wherein the current audio frame and an audio frame preceding the current audio frame belong to a group of multiple consecutive frames, and wherein the at least one condition further comprises:
    - none of the multiple consecutive frames belongs to an energy attack.
  - 4. The method according to claim 1, whereinthe step of obtaining obtains an average value of the part or all of the effective data in the memory;
    - andthe step of classifying classifies the current audio frame as the music frame based on a condition that the obtained average value satisfies a music classification condition.
  - 5. The method according to claim 1, wherein the step of obtaining statistics comprises:
    - obtaining a first group of effective data comprising data of the frequency spectrum fluctuation parameter of the current frames and one or more effective data of frequency spectrum fluctuation parameter of one or more audio frames continuously prior to the current frame;
      
      obtaining a second group of effective data comprising data of the frequency spectrum fluctuation parameter of the current frames and one or more effective data of frequency spectrum fluctuation parameter of one or more audio frames continuously prior to the current frame;
      
      wherein, the quantity of data in the first group and the quantity of data in the second group are different;
      
      obtaining a first statistics according to the data in the first group and a second statistics according to the data in the second group;
      
      and wherein the step of classifying classifies the current audio frame as a music frame according to the first statistics and the second statistics.
  - 6. The method according to claim 1, wherein the step of determining whether the current signal is percussive music comprises:
    - When a relatively acute energy protrusion occurs in the current signal in both a short time and a long time, and the current signal has no obvious voiced sound characteristic, if the plurality of audio frames preceding the current audio frame are mainly music frames, determining the current signal is percussive music.
  - 7. The method according to claim 1, wherein the step of determining whether the current signal is percussive music comprises:
    - when none of subframes of the current signal has an obvious voiced sound characteristic and a relatively obvious increase also occurs in the time domain envelope of the current signal relative to a long-time average of the time domain envelope, determining that the current signal is also percussive music.

8. An audio signal classification apparatus configured to classify an input audio signal, comprising:
- a memory comprising instructions; and
  
  one or more processors in communication with the memory, wherein the one or more processors execute the instructions to;
  
  store, based on at least one condition being met, data of a frequency spectrum fluctuation parameter of a current audio frame of an audio signal into the memory where a plurality of frequency spectrum fluctuation parameters of a plurality of audio frames are stored, wherein the at least one condition comprises the current audio frame being an active frame, and wherein a frequency spectrum fluctuation parameter denotes an energy fluctuation of a frequency spectrum of the audio signal;
  
  determine whether the current audio frame is an active frame and a last audio frame preceding the current audio frame is an inactive frame;
  
  upon determining that the current audio frame is an active frame and the last audio frame preceding the current audio frame is an inactive frame, modify data of frequency spectrum fluctuation parameters of audio frames preceding the current audio frame stored in the memory into ineffective data, wherein data of frequency spectrum fluctuation parameters in the memory not having been modified into ineffective data are effective data; and
  
  determine whether a current signal is percussive music, wherein the current signal comprises the current audio frame and a plurality of audio frames preceding the current audio frame;
  
  upon determining that the current signal is percussive music, modify effective data of the current audio frame and a plurality of audio frames preceding the current audio frame into a value less than or equal to a music threshold;
  
  obtain statistics of a part or all of the effective data in the memory; and
  
  classify the current audio frame as a speech frame or a music frame according to the statistics.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The apparatus according to claim 8, wherein the at least one condition further comprises:
    - the current audio frame does not belong to an energy attack.
  - 10. The apparatus according to claim 8, wherein the current audio frame and an audio frame preceding the current audio frame belong to a group of multiple consecutive frames, and wherein the at least one condition further comprises:
    - none of the multiple consecutive frames belongs to an energy attack.
  - 11. The apparatus according to claim 8, wherein, to obtain the statistics, the one or more processors are configured to:
    - obtain a first group of effective data comprising data of the frequency spectrum fluctuation parameter of the current frames and one or more effective data of frequency spectrum fluctuation parameter of one or more audio frames continuously prior to the current frame;
      
      obtain a second group of effective data comprising data of the frequency spectrum fluctuation parameter of the current frames and one or more effective data of frequency spectrum fluctuation parameter of one or more audio frames continuously prior to the current frame;
      
      wherein, the quantity of data in the first group and the quantity of data in the second group are different;
      
      obtain a first statistics according to the data in the first group and a second statistics according to the data in the second group; and
      
      wherein, to classify the current frame, the one or more processors are configured to classify the current audio frame as a speech frame according to the first statistics and the second statistics.
  - 12. The apparatus according to claim 8, wherein to determine whether a current signal is percussive music, the one or more processors are configured to:
    - when a relatively acute energy protrusion occurs in the current signal in both a short time and a long time, and the current signal has no obvious voiced sound characteristic, if the plurality of audio frames preceding the current audio frame are mainly music frames, determine the current signal is percussive music.
  - 13. The apparatus according to claim 8, wherein to determine whether a current signal is percussive music, the one or more processors are configured to:
    - when none of subframes of the current signal has an obvious voiced sound characteristic and a relatively obvious increase also occurs in the time domain envelope of the current signal relative to a long-time average of the time domain envelope, determine that the current signal is also percussive music.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Huawei Technologies Co., Ltd. (Huawei Investment & Holding Co., Ltd.)
Original Assignee
Huawei Technologies Co., Ltd. (Huawei Investment & Holding Co., Ltd.)
Inventors
Wang, Zhe
Primary Examiner(s)
He, Jialong

Application Number

US15/017,075
Publication Number

US 20160155456A1
Time in Patent Office

970 Days
Field of Search
US Class Current
CPC Class Codes

G10L 19/06   Determination or coding of ...

G10L 19/12   the excitation function bei...

G10L 2025/783   based on threshold decision

G10L 25/12   the extracted parameters be...

G10L 25/18   the extracted parameters be...

G10L 25/78   Detection of presence or ab...

G10L 25/81   for discriminating voice fr...

Method and apparatus for classifying an audio signal based on frequency spectrum fluctuation

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

31 Citations

13 Claims

Specification

Use Cases

Quick Links

Others

Method and apparatus for classifying an audio signal based on frequency spectrum fluctuation

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

31 Citations

13 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others