Audio classifier that includes analog signal voice activity detection and digital signal voice activity detection

US 10,115,399 B2
Filed: 07/20/2016
Issued: 10/30/2018
Est. Priority Date: 07/20/2016
Status: Active Grant

First Claim

Patent Images

1. An audio classifier comprising:

a first processor having hard-wired logic configured to receive an audio signal and detect audio activity from the audio signal, wherein the first processor is an analogue processor; and

a second processor having reconfigurable logic configured to classify the audio signal as a type of audio signal in response to the first processor detecting audio activity, wherein the second processor is a digital processor;

in which the second processor is a voice activity detector, in which the second processor is configured to classify the audio signal as either speech or not speech;

in which the second processor is configured to determine at least three features of the audio signal and classify the audio signal as either speech or not speech in accordance with the at least three features, in which the at least three features comprises;

short term energy;

tonal power ratio; and

spectral crest factor;

wherein the second processor is configured to compute the tonal power ratio and the crest factor using common computed quantities and is configured to classify the audio signal as speech only if each of the short term energy, the tonal power ratio, and the spectral crest factor exceeds a corresponding feature-specific predetermined threshold.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The disclosure relates to an audio classifier comprising: a first processor having hard-wired logic configured to receive an audio signal and detect audio activity from the audio signal; and a second processor having reconfigurable logic configured to classify the audio signal as a type of audio signal in response to the first processor detecting audio activity.

Citations

14 Claims

1. An audio classifier comprising:
- a first processor having hard-wired logic configured to receive an audio signal and detect audio activity from the audio signal, wherein the first processor is an analogue processor; and
  
  a second processor having reconfigurable logic configured to classify the audio signal as a type of audio signal in response to the first processor detecting audio activity, wherein the second processor is a digital processor;
  
  in which the second processor is a voice activity detector, in which the second processor is configured to classify the audio signal as either speech or not speech;
  
  in which the second processor is configured to determine at least three features of the audio signal and classify the audio signal as either speech or not speech in accordance with the at least three features, in which the at least three features comprises;
  
  short term energy;
  
  tonal power ratio; and
  
  spectral crest factor;
  
  wherein the second processor is configured to compute the tonal power ratio and the crest factor using common computed quantities and is configured to classify the audio signal as speech only if each of the short term energy, the tonal power ratio, and the spectral crest factor exceeds a corresponding feature-specific predetermined threshold.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 14)
- - 2. The audio classifier of claim 1 in which the second processor is configured to perform the classification in conjunction with software or firmware.
  - 3. The audio classifier of claim 1 comprising an analogue-to-digital converter configured to digitize the analogue audio signal, in which the second processor is configured to classify a digitized audio signal.
  - 4. The audio classifier of claim 1 in which the at least three features further comprises a zero crossing rate.
  - 5. The audio classifier of claim 1 in which the second processor is configured to generate one or more metrics associated with the audio signal.
  - 6. The audio classifier of claim 5 in which the metrics include an average background level of the audio signal over an interval of time.
  - 7. The audio classifier of claim 1 in which the first processor is configured to determine an energy of the audio signal in order to detect audio activity.
  - 8. The audio classifier of claim 1 in which the first processor is configured to operate on an analogue audio signal.
  - 14. The audio classifier of claim 1 wherein the common computed quantity used by the second processor to compute the tonal power ratio and the crest factor comprises M_t[n], where M_t[n] is the magnitude of the Fourier transform at frame t and frequency bin n.

9. An audio recognition system comprising:
- the audio classifier having;
  
  a first processor having hard-wired logic configured to receive an audio signal and detect audio activity from the audio signal, wherein the first processor is an analogue processor; and
  
  a second processor having reconfigurable logic configured to classify the audio signal as a type of audio signal in response to the first processor detecting audio activity, wherein the second processor is a digital processor;
  
  in which the second processor is a voice activity detector, in which the second processor is configured to classify the audio signal as either speech or not speech;
  
  in which the second processor is configured to determine at least three features of the audio signal and classify the audio signal as either speech or not speech in accordance with the at least three features, in which the at least three features comprises;
  
  short term energy;
  
  tonal power ratio; and
  
  crest factor; and
  
  wherein the second processor is configured to compute the tonal power ratio and the crest factor using common computed quantities and is configured to classify the audio signal as speech only if each of the short term energy, the tonal power ratio, and the spectral crest factor exceeds a corresponding feature-specific predetermined threshold;
  
  an audio recognition unit configured to determine one or more audio segments from the audio signal in response to the second processor classifying the audio as a particular type of audio signal.
- View Dependent Claims (10, 11, 12)
- - 10. The audio recognition system of claim 9 in which the audio recognition system is a voice recognition system and the audio recognition unit is a voice recognition unit configured to determine one or more words from the audio signal in response to the second processor classifying the audio signal as a voice signal.
  - 11. The audio recognition system of claim 9 in which the audio recognition system is a music recognition system and the audio recognition unit is a music recognition unit configured to recognize a piece of music from the audio signal in response to the second processor classifying the audio signal as music.
  - 12. A mobile computing device comprising the voice recognition system of claim 9.

13. An audio classifier comprising:
- a first processor having hard-wired logic configured to receive an audio signal and detect audio activity from the audio signal, wherein the first processor is an analogue processor; and
  
  a second processor having reconfigurable logic configured to classify the audio signal as a type of audio signal in response to the first processor detecting audio activity, wherein the second processor is a digital processor;
  
  in which the second processor is a voice activity detector, in which the second processor is configured to classify the audio signal as either speech or not speech;
  
  in which the second processor is configured to determine at least three features for each frame of the audio signal and classify the audio signal as either speech or not speech in response to the at least three features, wherein the at least three features include short-term energy, spectral crest factor, and tonal power ratio; and
  
  wherein the second processor is configured to compute the tonal power ratio and the crest factor using common computed quantities and is configured to classify the audio signal as speech only if each of the short term energy, the tonal power ratio, and the spectral crest factor exceeds a corresponding feature-specific predetermined threshold;
  
  wherein the common computed quantity used by the second processor to compute the tonal power ratio and the crest factor comprises M_t[n], where M_t[n] is the magnitude of the Fourier transform at frame t and frequency bin n.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Goodix Technology Co., Ltd. (Shenzhen Goodix Technology Co., Ltd.)
Original Assignee
NXP B.V. (NXP Semiconductors NV)
Inventors
Lepauloux, Ludovick Dominique Joel, Le Faucheur, Laurent
Primary Examiner(s)
He, Jialong

Application Number

US15/215,259
Publication Number

US 20180025732A1
Time in Patent Office

832 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 17/22   Interactive procedures; Man...

G10L 2025/937   Signal energy in various fr...

G10L 25/09   the extracted parameters be...

G10L 25/21   the extracted parameters be...

G10L 25/51   for comparison or discrimin...

G10L 25/81   for discriminating voice fr...

G10L 25/84   for discriminating voice fr...

Audio classifier that includes analog signal voice activity detection and digital signal voice activity detection

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Audio classifier that includes analog signal voice activity detection and digital signal voice activity detection

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links