Method for siren detection based on audio samples

US 9,275,136 B1
Filed: 12/03/2013
Issued: 03/01/2016
Est. Priority Date: 12/03/2013
Status: Expired due to Fees

First Claim

Patent Images

1. An apparatus comprising:

an audio unit configured to receive an audio signal;

a control unit configured to operate the apparatus; and

a processing unit configured to;

process the audio signal from the audio unit to create a plurality of windowed audio samples including at least a first windowed audio sample and a second windowed audio sample, wherein the first windowed audio sample and the second windowed audio sample each have a different length of time;

determine a likelihood that the first windowed audio sample comprises a siren signal based on a detection of a group of features in the first windowed audio signal associated with a siren-classification profile, wherein the group of features comprises mel-frequency cepstrum coefficients (MFCCs) associated with a reference siren signal;

based on the first windowed audio sample indicating a likelihood of a siren signal below a threshold, determine a likelihood that the second windowed audio sample includes a siren signal based a detection of a group of features of the second windowed audio signal with the siren-classification profile, wherein the group of features comprises the mel-frequency cepstrum coefficients (MFCCs) associated with the reference siren signal; and

alter control of the apparatus by the control system based on the likelihood of at least one of the first windowed audio sample and the second windowed audio sample including a siren signal being above the threshold.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present disclosure provides methods and apparatuses that enable an apparatus to identify sounds from short samples of audio. The apparatus may capture an audio sample and create several audio signals of different lengths, each containing audio from the captured audio sample. The apparatus my process the several audio signals in an attempt to identify features of the audio signal that indicate an identification of the captured sound. Because shorter audio samples can be analyzed more quickly, the system may first process the shortest audio samples in order to quickly identify features of the audio signal. Because longer audio samples contain more information, the system may be able to more accurately identify features in the audio signal in longer audio samples. However, analyzing longer audio signals takes more buffered audio than identifying features in shorter signals. Therefore, the present system attempts to identify features in the shortest audio signals first.

Citations

20 Claims

1. An apparatus comprising:
- an audio unit configured to receive an audio signal;
  
  a control unit configured to operate the apparatus; and
  
  a processing unit configured to;
  
  process the audio signal from the audio unit to create a plurality of windowed audio samples including at least a first windowed audio sample and a second windowed audio sample, wherein the first windowed audio sample and the second windowed audio sample each have a different length of time;
  
  determine a likelihood that the first windowed audio sample comprises a siren signal based on a detection of a group of features in the first windowed audio signal associated with a siren-classification profile, wherein the group of features comprises mel-frequency cepstrum coefficients (MFCCs) associated with a reference siren signal;
  
  based on the first windowed audio sample indicating a likelihood of a siren signal below a threshold, determine a likelihood that the second windowed audio sample includes a siren signal based a detection of a group of features of the second windowed audio signal with the siren-classification profile, wherein the group of features comprises the mel-frequency cepstrum coefficients (MFCCs) associated with the reference siren signal; and
  
  alter control of the apparatus by the control system based on the likelihood of at least one of the first windowed audio sample and the second windowed audio sample including a siren signal being above the threshold.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The apparatus according to claim 1, wherein the processor is further configured to determine the likelihood using a linear classifier analyzing the group of features of a respective audio signal and wherein each group of features further comprises a monotonicity estimation associated with the reference siren signal, and a spectral energy concentration estimation associated with the reference siren signal.
  - 3. The apparatus according to claim 1, wherein the audio unit is configured to periodically receive the audio signal.
  - 4. The apparatus of claim 1, wherein the processor is further configured to:
    - determine a fingerprint-based likelihood that the first windowed audio sample comprises a siren signal based on a comparison of the first windowed audio signal with a group of audio fingerprints, wherein the group of audio fingerprints comprises at least one audio fingerprint of a siren signal;
      
      based on the first windowed audio sample indicating a fingerprint-based likelihood of a siren signal below the threshold, determine a fingerprint-based likelihood that the second windowed audio sample comprises a siren signal based on a comparison of the second windowed audio signal with the group of audio fingerprints.
  - 5. The apparatus of claim 1, further comprising a communication unit, wherein the communication unit is configured to receive the siren-classification profile from a remote system.
  - 6. The apparatus of claim 1, further comprising an input device, wherein the input device is configured to receive an input, wherein the input comprises an override indication to provide an indication of a false siren detection.
  - 7. The apparatus of claim 6, wherein the processor is further configured to adjust the siren-classification profile based on the input device receiving the override indication.

8. A method comprising:
- receiving an audio signal with an audio unit;
  
  processing, with a processor, the audio signal from the audio unit to create a plurality of windowed audio samples including at least a first windowed audio sample and a second windowed audio sample, wherein the first windowed audio sample and the second windowed audio sample each have a different length of time;
  
  determining a likelihood that the first windowed audio sample comprises a siren signal based on the detection of a group of features of the first windowed audio signal, wherein the group of features comprises the mel-frequency cepstrum coefficients (MFCCs) associated with a reference siren signal;
  
  based on the first windowed audio sample indicating a likelihood of the first windowed audio sample including a siren signal below a threshold, determining a likelihood that the second windowed audio sample comprises a siren signal based on the detection of a group of features of the second windowed audio signal, wherein the group of features comprises the mel-frequency cepstrum coefficients (MFCCs) associated with the reference siren signal; and
  
  providing instructions to control an apparatus based on the likelihood of at least one of the first windowed audio sample and the second windowed audio sample including a siren signal being above the threshold.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method according to claim 8, wherein determining a likelihood comprises a linear classifier analyzing the group of features of a respective audio signal and wherein each group of features further comprises a monotonicity estimation associated with the reference siren signal, and a spectral energy concentration estimation associated with the reference siren signal.
  - 10. The method according to claim 8, wherein receiving an audio signal with an audio unit comprises periodically receiving the audio signal.
  - 11. The method of claim 8, further comprising:
    - determining a fingerprint-based likelihood that the first windowed audio sample comprises a siren signal based on a comparison of the first windowed audio signal with a group of audio fingerprints, wherein the group of audio fingerprints comprises at least one audio fingerprint of a siren signal;
      
      based on first windowed audio sample indicating a fingerprint-based likelihood of a siren signal below the threshold, determining a fingerprint-based likelihood that the second windowed audio sample comprises a siren signal based on a comparison of the second windowed audio signal with the group of audio fingerprints.
  - 12. The method of claim 8, further comprising receiving the siren-classification profile from a remote system.
  - 13. The method of claim 8, further comprising receiving an input, wherein the input comprises an override indication to provide an indication of a false siren detection.
  - 14. The method of claim 13, further comprising adjusting the siren-classification profile based on the input device receiving the override indication.

15. An article of manufacture including a non-transitory computer-readable medium having stored thereon instructions that, when executed by a processor in a vehicle system, cause the vehicle system to perform operations comprising:
- receiving an audio signal;
  
  processing the audio signal to create a plurality of windowed audio samples including at least a first windowed audio sample and a second windowed audio sample, wherein the first windowed audio sample and the second windowed audio sample each have a different length of time;
  
  determining a likelihood that the first windowed audio sample comprises a siren signal based on the detection of a group of features of the first windowed audio signal, wherein the group of features comprises the mel-frequency cepstrum coefficients (MFCCs) associated with a reference siren signal;
  
  based on the first windowed audio sample indicating a low likelihood of the first windowed audio sample including a siren signal, determining a likelihood that the second windowed audio sample comprises a siren signal based on the detection of a group of features of the second windowed audio signal, wherein the group of features comprises the mel-frequency cepstrum coefficients (MFCCs) associated with a reference siren signal; and
  
  providing instructions to control an apparatus based on the likelihood of at least one of the first windowed audio sample and the second windowed audio sample including a siren signal being above the threshold.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The article of manufacture according to claim 15, wherein determining a likelihood comprises a linear classifier analyzing the group of features of a respective audio signal and wherein each group of features further comprises a monotonicity estimation associated with the reference siren signal, and a spectral energy concentration estimation associated with the reference siren signal.
  - 17. The article of manufacture of claim 15, further comprising:
    - determining a fingerprint-based likelihood that the first windowed audio sample comprises a siren signal based on a comparison of the first windowed audio signal with a group of audio fingerprints, wherein the group of audio fingerprints comprises at least one audio fingerprint of a siren signal;
      
      based on first windowed audio sample indicating a fingerprint-based likelihood of a siren signal below the threshold, determining a fingerprint-based likelihood that the second windowed audio sample comprises a siren signal based on a comparison of the second windowed audio signal with the group of audio fingerprints.
  - 18. The article of manufacture of claim 15, further comprising receiving the siren-classification profile from a remote system.
  - 19. The article of manufacture of claim 15, further comprising receiving an input, wherein the input comprises an override indication to provide an indication of a false siren detection.
  - 20. The article of manufacture of claim 19, further comprising adjusting the siren-classification profile based on the input device receiving an override indication.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Sharifi, Matthew, Roblek, Dominick
Primary Examiner(s)
Wu, Daniel

Application Number

US14/095,199
Time in Patent Office

819 Days
Field of Search

340901-904, 381/86
US Class Current

1/1
CPC Class Codes

G06F 16/683   using metadata automaticall...

G08B 29/185   Signal analysis techniques ...

G08B 3/10   using electric transmission...

G10L 19/022   Blocking, i.e. grouping of ...

G10L 19/06   Determination or coding of ...

G10L 25/51   for comparison or discrimin...

H04R 29/00   Monitoring arrangements; Te...

Method for siren detection based on audio samples

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method for siren detection based on audio samples

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links