Audio classifier that includes analog signal voice activity detection and digital signal voice activity detection
First Claim
Patent Images
1. An audio classifier comprising:
- a first processor having hard-wired logic configured to receive an audio signal and detect audio activity from the audio signal, wherein the first processor is an analogue processor; and
a second processor having reconfigurable logic configured to classify the audio signal as a type of audio signal in response to the first processor detecting audio activity, wherein the second processor is a digital processor;
in which the second processor is a voice activity detector, in which the second processor is configured to classify the audio signal as either speech or not speech;
in which the second processor is configured to determine at least three features of the audio signal and classify the audio signal as either speech or not speech in accordance with the at least three features, in which the at least three features comprises;
short term energy;
tonal power ratio; and
spectral crest factor;
wherein the second processor is configured to compute the tonal power ratio and the crest factor using common computed quantities and is configured to classify the audio signal as speech only if each of the short term energy, the tonal power ratio, and the spectral crest factor exceeds a corresponding feature-specific predetermined threshold.
2 Assignments
0 Petitions
Accused Products
Abstract
The disclosure relates to an audio classifier comprising: a first processor having hard-wired logic configured to receive an audio signal and detect audio activity from the audio signal; and a second processor having reconfigurable logic configured to classify the audio signal as a type of audio signal in response to the first processor detecting audio activity.
-
Citations
14 Claims
-
1. An audio classifier comprising:
-
a first processor having hard-wired logic configured to receive an audio signal and detect audio activity from the audio signal, wherein the first processor is an analogue processor; and a second processor having reconfigurable logic configured to classify the audio signal as a type of audio signal in response to the first processor detecting audio activity, wherein the second processor is a digital processor; in which the second processor is a voice activity detector, in which the second processor is configured to classify the audio signal as either speech or not speech; in which the second processor is configured to determine at least three features of the audio signal and classify the audio signal as either speech or not speech in accordance with the at least three features, in which the at least three features comprises; short term energy; tonal power ratio; and spectral crest factor; wherein the second processor is configured to compute the tonal power ratio and the crest factor using common computed quantities and is configured to classify the audio signal as speech only if each of the short term energy, the tonal power ratio, and the spectral crest factor exceeds a corresponding feature-specific predetermined threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 14)
-
-
9. An audio recognition system comprising:
-
the audio classifier having; a first processor having hard-wired logic configured to receive an audio signal and detect audio activity from the audio signal, wherein the first processor is an analogue processor; and a second processor having reconfigurable logic configured to classify the audio signal as a type of audio signal in response to the first processor detecting audio activity, wherein the second processor is a digital processor; in which the second processor is a voice activity detector, in which the second processor is configured to classify the audio signal as either speech or not speech; in which the second processor is configured to determine at least three features of the audio signal and classify the audio signal as either speech or not speech in accordance with the at least three features, in which the at least three features comprises; short term energy; tonal power ratio; and crest factor; and wherein the second processor is configured to compute the tonal power ratio and the crest factor using common computed quantities and is configured to classify the audio signal as speech only if each of the short term energy, the tonal power ratio, and the spectral crest factor exceeds a corresponding feature-specific predetermined threshold; an audio recognition unit configured to determine one or more audio segments from the audio signal in response to the second processor classifying the audio as a particular type of audio signal. - View Dependent Claims (10, 11, 12)
-
-
13. An audio classifier comprising:
-
a first processor having hard-wired logic configured to receive an audio signal and detect audio activity from the audio signal, wherein the first processor is an analogue processor; and a second processor having reconfigurable logic configured to classify the audio signal as a type of audio signal in response to the first processor detecting audio activity, wherein the second processor is a digital processor; in which the second processor is a voice activity detector, in which the second processor is configured to classify the audio signal as either speech or not speech; in which the second processor is configured to determine at least three features for each frame of the audio signal and classify the audio signal as either speech or not speech in response to the at least three features, wherein the at least three features include short-term energy, spectral crest factor, and tonal power ratio; and wherein the second processor is configured to compute the tonal power ratio and the crest factor using common computed quantities and is configured to classify the audio signal as speech only if each of the short term energy, the tonal power ratio, and the spectral crest factor exceeds a corresponding feature-specific predetermined threshold; wherein the common computed quantity used by the second processor to compute the tonal power ratio and the crest factor comprises Mt[n], where Mt[n] is the magnitude of the Fourier transform at frame t and frequency bin n.
-
Specification