Audio signal classification method and apparatus

US 10,529,361 B2
Filed: 08/22/2018
Issued: 01/07/2020
Est. Priority Date: 08/06/2013
Status: Active Grant

First Claim

Patent Images

1. An audio signal classification method, comprising:

storing, based on at least one condition, data of a frequency spectrum fluctuation parameter of a current audio frame of an audio signal into a memory where data of frequency spectrum fluctuation parameters of a plurality of audio frames are stored, wherein the at least one condition is the current audio frame being an active frame, wherein the frequency spectrum fluctuation parameter denotes an energy fluctuation of a frequency spectrum of the audio signal;

modifying data of frequency spectrum fluctuation parameters of audio frames preceding the current audio frame stored in the memory into ineffective data when the current audio frame is an active frame and an audio frame immediately preceding the current audio frame is an inactive frame, wherein data of frequency spectrum fluctuation parameters in the memory not having been modified into ineffective data are effective data;

modifying the effective data stored in the memory into a value that is less than or equal to a music threshold when a current signal is percussive music, wherein the current signal comprises the current audio frame and a plurality of audio frames precede the current audio frame;

obtain statistics of a part or all of the effective data stored in the memory;

classifying the current audio frame as a speech frame or a music frame according to the statistics of a part or all of the effective data stored in the memory.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An audio signal classification method and apparatus includes determining, according to voice activity of a current audio frame, whether to obtain a frequency spectrum fluctuation of the current audio frame and store the frequency spectrum fluctuation in a frequency spectrum fluctuation memory, updating, according to whether the audio frame is percussive music or activity of a historical audio frame, the frequency spectrum fluctuations stored in the frequency spectrum fluctuation memory, and classifying the current audio frame as a speech frame or a music frame according to statistics of a part or all of effective data of the frequency spectrum fluctuations that is stored in the frequency spectrum fluctuation memory.

43 Citations

24 Claims

1. An audio signal classification method, comprising:
- storing, based on at least one condition, data of a frequency spectrum fluctuation parameter of a current audio frame of an audio signal into a memory where data of frequency spectrum fluctuation parameters of a plurality of audio frames are stored, wherein the at least one condition is the current audio frame being an active frame, wherein the frequency spectrum fluctuation parameter denotes an energy fluctuation of a frequency spectrum of the audio signal;
  
  modifying data of frequency spectrum fluctuation parameters of audio frames preceding the current audio frame stored in the memory into ineffective data when the current audio frame is an active frame and an audio frame immediately preceding the current audio frame is an inactive frame, wherein data of frequency spectrum fluctuation parameters in the memory not having been modified into ineffective data are effective data;
  
  modifying the effective data stored in the memory into a value that is less than or equal to a music threshold when a current signal is percussive music, wherein the current signal comprises the current audio frame and a plurality of audio frames precede the current audio frame;
  
  obtain statistics of a part or all of the effective data stored in the memory;
  
  classifying the current audio frame as a speech frame or a music frame according to the statistics of a part or all of the effective data stored in the memory.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the current audio frame and a historical frame of the current audio frame belong to a group of multiple consecutive frames, and wherein the at least one condition further comprises none of the group of multiple consecutive frames belongs to an energy attack.
  - 3. The method of claim 1, wherein classifying the current audio frame as the speech frame or the music frame according to statistics of the part or all of effective data comprises:
    - obtaining an average value of the part or all of the effective data of the frequency spectrum fluctuation parameters that are stored; and
      
      either classifying the current audio frame as the music frame based on a condition that the average value satisfies a music classification condition or classifying the current audio frame as the speech frame based on a condition that the average value satisfies a speech classification condition.
  - 4. The method of claim 1, wherein classifying the current audio frame as the speech frame or the music frame comprises:
    - obtaining a first group of the effective data comprising data of the frequency spectrum fluctuation parameter of the current audio frame and one or more effective data of frequency spectrum fluctuation parameters of one or more audio frames continuously prior to the current audio frame;
      
      obtaining a second group of the effective data comprising data of the frequency spectrum fluctuation parameter of the current audio frame and one or more effective data of frequency spectrum fluctuation parameters of one or more audio frames continuously prior to the current audio frame, wherein a quantity of data in the first group and a quantity of data in the second group are different;
      
      obtaining a first statistics according to the quantity of the data in the first group and a second statistics according to the quantity of the data in the second group; and
      
      classifying the current audio frame as the music frame or the speech frame according to the first statistics or the second statistics.
  - 5. The method of claim 1, wherein the current signal is determined as the percussive music when a relatively acute energy protrusion occurs in the current signal in both a short time period and a long time period, the current signal has no obvious voiced sound characteristic, and several historical frames before the current audio frame are mainly music frames.
  - 6. The method of claim 1, wherein the current signal is determined as the percussive music when none of subframes of the current signal has an obvious voiced sound characteristic and a relatively obvious increase also occurs in a time domain envelope of the current signal relative to a long-time average of the time domain envelope.

7. An audio signal classification apparatus configured to classify an input audio signal, comprising:
- a memory comprising instructions; and
  
  one or more processors in communication with the memory, wherein the one or more processors execute the instructions to;
  
  store, based on at least one condition, data of a frequency spectrum fluctuation parameter of a current audio frame of an audio signal into the memory where data of frequency spectrum fluctuation parameters of a plurality of audio frames are stored, wherein the at least one condition comprises the current audio frame is an active frame, the frequency spectrum fluctuation parameter denotes an energy fluctuation of a frequency spectrum of the audio signal;
  
  modify data of frequency spectrum fluctuation parameters of audio frames preceding the current audio frame stored in the memory into ineffective data when the current audio frame is an active frame and an audio frame immediately preceding the current audio frame is an inactive frame, wherein data of frequency spectrum fluctuation parameters in the memory not having been modified into ineffective data are effective data;
  
  modify the effective data stored in the memory into a value that is less than or equal to a music threshold when a current signal is percussive music, wherein the current signal comprises the current audio frame and a plurality of audio frames precede the current audio frame;
  
  obtain statistics of a part or all of the effective data stored in the memory;
  
  classify the current audio frame as a speech frame or a music frame according to the statistics of a part or all of the effective data stored in the memory.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The audio signal classification apparatus of claim 7, wherein the current audio frame and a historical frame of the current audio frame belong to a group of multiple consecutive frames, and wherein the at least one condition further comprises none of the group of multiple consecutive frames belongs to an energy attack.
  - 9. The audio signal classification apparatus of claim 7, wherein to classifying the current audio frame as the speech frame or the music frame, the one or more processors are configured to:
    - obtain an average value of the part or all of the effective data of the frequency spectrum fluctuation parameters that are stored; and
      
      either classify the current audio frame as the music frame based on a condition that the average value satisfies a music classification condition or classify the current audio frame as the speech frame based on a condition that the average value satisfies a speech classification condition.
  - 10. The audio signal classification apparatus of claim 7, wherein to classify the current audio frame as a speech frame or a music frame, the one or more processors are configured to:
    - obtain a first group of the effective data comprising data of the frequency spectrum fluctuation parameter of the current audio frame and one or more effective data of frequency spectrum fluctuation parameters of one or more audio frames continuously prior to the current audio frame;
      
      obtain a second group of the effective data comprising data of the frequency spectrum fluctuation parameter of the current audio frame and one or more effective data of frequency spectrum fluctuation parameters of one or more audio frames continuously prior to the current audio frame, wherein a quantity of data in the first group and a quantity of data in the second group are different;
      
      obtain a first statistics according to the quantity of the data in the first group and a second statistics according to the quantity of the data in the second group; and
      
      classify the current audio frame as the music frame or the speech frame according to the first statistics or the second statistics.
  - 11. The audio signal classification apparatus of claim 7, wherein the current signal is determined as the percussive music when a relatively acute energy protrusion occurs in the current signal in both a short time period and a long time period, the current signal has no obvious voiced sound characteristic, and several historical frames before the current audio frame are mainly music frames.
  - 12. The audio signal classification apparatus of claim 7, wherein the current signal is determined as the percussive music when none of subframes of the current signal has an obvious voiced sound characteristic and a relatively obvious increase also occurs in a time domain envelope of the current signal relative to a long-time average of the time domain envelope.

13. An audio signal classification method, comprising:
- storing, based on at least one condition, data of a frequency spectrum fluctuation parameter of a current audio frame of an audio signal into a memory where data of frequency spectrum fluctuation parameters of a plurality of audio frames are stored, wherein the at least one condition comprises the current audio frame is an active frame, the frequency spectrum fluctuation parameter denotes an energy fluctuation of a frequency spectrum of the audio signal;
  
  modifying data of frequency spectrum fluctuation parameters of audio frames preceding the current audio frame stored in the memory into ineffective data when the current audio frame is an active frame and an audio frame immediately preceding the current audio frame is an inactive frame;
  
  wherein data of the frequency spectrum fluctuation parameters with negative values is the ineffective data, and data of frequency spectrum fluctuation parameters with a non-negative value is effective data;
  
  modifying the effective data stored in the memory into a value that is less than or equal to a music threshold when a current signal is percussive music, wherein the current signal comprises the current audio frame and a plurality of audio frames precede the current audio frame;
  
  obtaining statistics of a part or all of the effective data stored in the memory; and
  
  classifying the current audio frame as a speech frame or a music frame according to the statistics of a part or all of the effective data stored in the memory.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The method of claim 13, wherein the current audio frame and a historical frame of the current audio frame belong to a group of multiple consecutive frames, and the at least one condition further comprises none of the group of multiple consecutive frames belongs to an energy attack.
  - 15. The method of claim 13, wherein classifying the current audio frame as the speech frame or the music frame according to statistics of the part or all of effective data comprises:
    - obtaining an average value of the part or all of the effective data of the frequency spectrum fluctuation parameters that are stored; and
      
      either classifying the current audio frame as the music frame based on a condition that the average value satisfies a music classification condition or classifying the current audio frame as the speech frame based on a condition that the average value satisfies a speech classification condition.
  - 16. The method of claim 13, wherein classifying the current audio frame as the speech frame or the music frame comprises:
    - obtaining a first group of the effective data comprising data of the frequency spectrum fluctuation parameter of the current audio frame and one or more effective data of frequency spectrum fluctuation parameters of one or more audio frames continuously prior to the current audio frame;
      
      obtaining a second group of the effective data comprising data of the frequency spectrum fluctuation parameter of the current audio frame and one or more effective data of frequency spectrum fluctuation parameters of one or more audio frames continuously prior to the current audio frame, wherein a quantity of data in the first group and a quantity of data in the second group are different;
      
      obtaining a first statistics according to the quantity of the data in the first group and a second statistics according to the quantity of the data in the second group; and
      
      classifying the current audio frame as the music frame or the speech frame according to the first statistics or the second statistics.
  - 17. The method of claim 13, wherein the current signal is determined as the percussive music when a relatively acute energy protrusion occurs in the current signal in both a short time period and a long time period, the current signal has no obvious voiced sound characteristic, and several historical frames before the current audio frame are mainly music frames.
  - 18. The method of claim 13, wherein the current signal is determined as the percussive music when none of subframes of the current signal has an obvious voiced sound characteristic and a relatively obvious increase also occurs in a time domain envelope of the current signal relative to a long-time average of the time domain envelope.

19. An audio signal classification apparatus configured to classify an input audio signal, comprising:
- a memory comprising instructions; and
  
  one or more processors in communication with the memory, wherein the one or more processors execute the instructions to;
  
  store, based on at least one condition, data of a frequency spectrum fluctuation parameter of a current audio frame of an audio signal into a memory where data of frequency spectrum fluctuation parameters of a plurality of audio frames are stored, wherein the at least one condition comprises the current audio frame is an active frame, the frequency spectrum fluctuation parameter denotes an energy fluctuation of a frequency spectrum of the audio signal;
  
  modify data of frequency spectrum fluctuation parameters of audio frames preceding the current audio frame stored in the memory into ineffective data when the current audio frame is an active frame and an audio frame immediately preceding the current audio frame is an inactive frame;
  
  wherein data of the frequency spectrum fluctuation parameters with negative values is the ineffective data, and data of frequency spectrum fluctuation parameters with a non-negative value is effective data;
  
  modify the effective data stored in the memory into a value that is less than or equal to a music threshold when a current signal is percussive music, wherein the current signal comprises the current audio frame and a plurality of audio frames precede the current audio frame;
  
  obtain statistics of a part or all of the effective data stored in the memory; and
  
  classify the current audio frame as a speech frame or a music frame according to the statistics of a part or all of the effective data stored in the memory.
- View Dependent Claims (20, 21, 22, 23, 24)
- - 20. The audio signal classification apparatus of claim 19, wherein the current audio frame and a historical frame of the current audio frame belong to a group of multiple consecutive frames, and wherein the at least one condition further comprises none of the group of multiple consecutive frames belongs to an energy attack.
  - 21. The audio signal classification apparatus of claim 19, wherein to classifying the current audio frame as the speech frame or the music frame, the one or more processors are configured to:
    - obtain an average value of the part or all of the effective data of the frequency spectrum fluctuation parameters that are stored; and
      
      either classify the current audio frame as the music frame based on a condition that the average value satisfies a music classification condition or classify the current audio frame as the speech frame based on a condition that the average value satisfies a speech classification condition.
  - 22. The audio signal classification apparatus of claim 19, wherein to classify the current audio frame as a speech frame or a music frame, the one or more processors are configured to:
    - obtain a first group of the effective data comprising data of the frequency spectrum fluctuation parameter of the current audio frame and one or more effective data of frequency spectrum fluctuation parameters of one or more audio frames continuously prior to the current audio frame;
      
      obtain a second group of the effective data comprising data of the frequency spectrum fluctuation parameter of the current audio frame and one or more effective data of frequency spectrum fluctuation parameters of one or more audio frames continuously prior to the current audio frame, wherein a quantity of data in the first group and a quantity of data in the second group are different;
      
      obtain a first statistics according to the quantity of the data in the first group and a second statistics according to the quantity of the data in the second group; and
      
      classify the current audio frame as the music frame or the speech frame according to the first statistics or the second statistics.
  - 23. The audio signal classification apparatus of claim 19, wherein the current signal is determined as the percussive music when a relatively acute energy protrusion occurs in the current signal in both a short time period and a long time period, the current signal has no obvious voiced sound characteristic, and several historical frames before the current audio frame are mainly music frames.
  - 24. The audio signal classification apparatus of claim 19, wherein the current signal is determined as the percussive music when none of subframes of the current signal has an obvious voiced sound characteristic and a relatively obvious increase also occurs in a time domain envelope of the current signal relative to a long-time average of the time domain envelope.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Huawei Technologies Co., Ltd. (Huawei Investment & Holding Co., Ltd.)
Original Assignee
Huawei Technologies Co., Ltd. (Huawei Investment & Holding Co., Ltd.)
Inventors
Wang, Zhe
Primary Examiner(s)
He, Jialong

Application Number

US16/108,668
Publication Number

US 20180366145A1
Time in Patent Office

503 Days
Field of Search
US Class Current
CPC Class Codes

G10L 19/06   Determination or coding of ...

G10L 19/12   the excitation function bei...

G10L 2025/783   based on threshold decision

G10L 25/12   the extracted parameters be...

G10L 25/18   the extracted parameters be...

G10L 25/78   Detection of presence or ab...

G10L 25/81   for discriminating voice fr...

Audio signal classification method and apparatus

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

43 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Audio signal classification method and apparatus

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

43 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links