Audio signal classification method and apparatus
First Claim
1. An audio signal classification method, comprising:
- storing, based on at least one condition, data of a frequency spectrum fluctuation parameter of a current audio frame of an audio signal into a memory where data of frequency spectrum fluctuation parameters of a plurality of audio frames are stored, wherein the at least one condition is the current audio frame being an active frame, wherein the frequency spectrum fluctuation parameter denotes an energy fluctuation of a frequency spectrum of the audio signal;
modifying data of frequency spectrum fluctuation parameters of audio frames preceding the current audio frame stored in the memory into ineffective data when the current audio frame is an active frame and an audio frame immediately preceding the current audio frame is an inactive frame, wherein data of frequency spectrum fluctuation parameters in the memory not having been modified into ineffective data are effective data;
modifying the effective data stored in the memory into a value that is less than or equal to a music threshold when a current signal is percussive music, wherein the current signal comprises the current audio frame and a plurality of audio frames precede the current audio frame;
obtain statistics of a part or all of the effective data stored in the memory;
classifying the current audio frame as a speech frame or a music frame according to the statistics of a part or all of the effective data stored in the memory.
1 Assignment
0 Petitions
Accused Products
Abstract
An audio signal classification method and apparatus includes determining, according to voice activity of a current audio frame, whether to obtain a frequency spectrum fluctuation of the current audio frame and store the frequency spectrum fluctuation in a frequency spectrum fluctuation memory, updating, according to whether the audio frame is percussive music or activity of a historical audio frame, the frequency spectrum fluctuations stored in the frequency spectrum fluctuation memory, and classifying the current audio frame as a speech frame or a music frame according to statistics of a part or all of effective data of the frequency spectrum fluctuations that is stored in the frequency spectrum fluctuation memory.
43 Citations
24 Claims
-
1. An audio signal classification method, comprising:
-
storing, based on at least one condition, data of a frequency spectrum fluctuation parameter of a current audio frame of an audio signal into a memory where data of frequency spectrum fluctuation parameters of a plurality of audio frames are stored, wherein the at least one condition is the current audio frame being an active frame, wherein the frequency spectrum fluctuation parameter denotes an energy fluctuation of a frequency spectrum of the audio signal; modifying data of frequency spectrum fluctuation parameters of audio frames preceding the current audio frame stored in the memory into ineffective data when the current audio frame is an active frame and an audio frame immediately preceding the current audio frame is an inactive frame, wherein data of frequency spectrum fluctuation parameters in the memory not having been modified into ineffective data are effective data; modifying the effective data stored in the memory into a value that is less than or equal to a music threshold when a current signal is percussive music, wherein the current signal comprises the current audio frame and a plurality of audio frames precede the current audio frame; obtain statistics of a part or all of the effective data stored in the memory; classifying the current audio frame as a speech frame or a music frame according to the statistics of a part or all of the effective data stored in the memory. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An audio signal classification apparatus configured to classify an input audio signal, comprising:
-
a memory comprising instructions; and one or more processors in communication with the memory, wherein the one or more processors execute the instructions to; store, based on at least one condition, data of a frequency spectrum fluctuation parameter of a current audio frame of an audio signal into the memory where data of frequency spectrum fluctuation parameters of a plurality of audio frames are stored, wherein the at least one condition comprises the current audio frame is an active frame, the frequency spectrum fluctuation parameter denotes an energy fluctuation of a frequency spectrum of the audio signal; modify data of frequency spectrum fluctuation parameters of audio frames preceding the current audio frame stored in the memory into ineffective data when the current audio frame is an active frame and an audio frame immediately preceding the current audio frame is an inactive frame, wherein data of frequency spectrum fluctuation parameters in the memory not having been modified into ineffective data are effective data; modify the effective data stored in the memory into a value that is less than or equal to a music threshold when a current signal is percussive music, wherein the current signal comprises the current audio frame and a plurality of audio frames precede the current audio frame; obtain statistics of a part or all of the effective data stored in the memory; classify the current audio frame as a speech frame or a music frame according to the statistics of a part or all of the effective data stored in the memory. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. An audio signal classification method, comprising:
-
storing, based on at least one condition, data of a frequency spectrum fluctuation parameter of a current audio frame of an audio signal into a memory where data of frequency spectrum fluctuation parameters of a plurality of audio frames are stored, wherein the at least one condition comprises the current audio frame is an active frame, the frequency spectrum fluctuation parameter denotes an energy fluctuation of a frequency spectrum of the audio signal; modifying data of frequency spectrum fluctuation parameters of audio frames preceding the current audio frame stored in the memory into ineffective data when the current audio frame is an active frame and an audio frame immediately preceding the current audio frame is an inactive frame;
wherein data of the frequency spectrum fluctuation parameters with negative values is the ineffective data, and data of frequency spectrum fluctuation parameters with a non-negative value is effective data;modifying the effective data stored in the memory into a value that is less than or equal to a music threshold when a current signal is percussive music, wherein the current signal comprises the current audio frame and a plurality of audio frames precede the current audio frame; obtaining statistics of a part or all of the effective data stored in the memory; and classifying the current audio frame as a speech frame or a music frame according to the statistics of a part or all of the effective data stored in the memory. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. An audio signal classification apparatus configured to classify an input audio signal, comprising:
-
a memory comprising instructions; and one or more processors in communication with the memory, wherein the one or more processors execute the instructions to; store, based on at least one condition, data of a frequency spectrum fluctuation parameter of a current audio frame of an audio signal into a memory where data of frequency spectrum fluctuation parameters of a plurality of audio frames are stored, wherein the at least one condition comprises the current audio frame is an active frame, the frequency spectrum fluctuation parameter denotes an energy fluctuation of a frequency spectrum of the audio signal; modify data of frequency spectrum fluctuation parameters of audio frames preceding the current audio frame stored in the memory into ineffective data when the current audio frame is an active frame and an audio frame immediately preceding the current audio frame is an inactive frame;
wherein data of the frequency spectrum fluctuation parameters with negative values is the ineffective data, and data of frequency spectrum fluctuation parameters with a non-negative value is effective data;modify the effective data stored in the memory into a value that is less than or equal to a music threshold when a current signal is percussive music, wherein the current signal comprises the current audio frame and a plurality of audio frames precede the current audio frame; obtain statistics of a part or all of the effective data stored in the memory; and classify the current audio frame as a speech frame or a music frame according to the statistics of a part or all of the effective data stored in the memory. - View Dependent Claims (20, 21, 22, 23, 24)
-
Specification