Audio processing techniques for semantic audio recognition and report generation

US 9,195,649 B2
Filed: 12/21/2012
Issued: 11/24/2015
Est. Priority Date: 12/21/2012
Status: Active Grant

First Claim

Patent Images

1. A method for forming an audio template for determining semantic audio information, comprising:

extracting a first audio feature from audio, the first audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature;

extracting a second audio feature from the audio, the second audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature, wherein the second audio feature is different from the first audio feature;

determining a first range for the first audio feature and a second range for the second audio feature; and

storing the first and second ranges to compare against other audio features from subsequent audio to generate tags signifying semantic audio information for the subsequent audio.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

System, apparatus and method for determining semantic information from audio, where incoming audio is sampled and processed to extract audio features, including temporal, spectral, harmonic and rhythmic features. The extracted audio features are compared to stored audio templates that include ranges and/or values for certain features and are tagged for specific ranges and/or values. Extracted audio features that are most similar to one or more templates from the comparison are identified according to the tagged information. The tags are used to determine the semantic audio data that includes genre, instrumentation, style, acoustical dynamics, and emotive descriptor for the audio signal.

82 Citations

View as Search Results

38 Claims

1. A method for forming an audio template for determining semantic audio information, comprising:
- extracting a first audio feature from audio, the first audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature;
  
  extracting a second audio feature from the audio, the second audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature, wherein the second audio feature is different from the first audio feature;
  
  determining a first range for the first audio feature and a second range for the second audio feature; and
  
  storing the first and second ranges to compare against other audio features from subsequent audio to generate tags signifying semantic audio information for the subsequent audio.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the temporal features include at least one of amplitude, power, or zero crossing of at least some of the audio.
  - 3. The method of claim 1, wherein the spectral features include at least one of a spectral centroid, a spectral rolloff, a spectral flux, a spectral flatness measure, a spectral crest factor, Mel-frequency cepstral coefficients, Daubechies wavelet coefficients, a spectral dissonance, a spectral irregularity, or a spectral inharmonicity of at least some of the audio.
  - 4. The method of claim 1, wherein the harmonic features include at least one of a pitch, a tonality, a pitch class profile, harmonic changes, a main pitch class, an octave range of dominant pitch, a main tonal interval relation, or an overall pitch strength of at least some of the audio.
  - 5. The method of claim 1, wherein the rhythmic features include at least one of a rhythmic structure, a beat period, a rhythmic fluctuation, or an average tempo for at least some of the audio.
  - 6. The method of claim 1, further including transforming at least some of the audio from a time domain to a frequency domain.
  - 7. The method of claim 1, wherein the tags are modifiable via a vocabulary library.

8. A processor-based method for determining semantic audio information for audio, comprising:
- extracting a first audio feature from the audio, the first audio feature including at least one of a rhythmic structure, a beat period, a rhythmic fluctuation, or an average tempo;
  
  extracting a second audio feature from the audio, the second audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature, wherein the second audio feature is different from the first audio feature;
  
  comparing the first and second audio features to a plurality of stored audio feature ranges having tags associated therewith; and
  
  determining the stored audio feature ranges having the closest matches to the first and second audio features, the tags associated with the audio feature ranges having the closest matches to be used to determine the semantic audio information for the audio.
- View Dependent Claims (9, 10, 11, 12, 13)
- - 9. The method of claim 8, wherein the temporal feature includes at least one of amplitude, power, or zero crossing of at least some of the audio.
  - 10. The method of claim 8, wherein the spectral feature includes at least one of a spectral centroid, a spectral rolloff, a spectral flux, a spectral flatness measure, a spectral crest factor, Mel-frequency cepstral coefficients, Daubechies wavelet coefficients, a spectral dissonance, a spectral irregularity, or a spectral inharmonicity of at least some of the audio.
  - 11. The method of claim 8, wherein the harmonic feature includes at least one of a pitch, a tonality, a pitch class profile, harmonic changes, a main pitch class, an octave range of dominant pitch, a main tonal interval relation, or an overall pitch strength of at least some of the audio.
  - 12. The method of claim 8, wherein comparing the first and second audio features to the plurality of stored audio feature ranges includes using at least one of a k-Nearest neighbor, a Gaussian Mixture Model, tree-based vector quantization, a linear discriminate analysis, a Euclidean distance, or a binary classification.
  - 13. The method of claim 8, wherein the semantic information includes at least one of a genre descriptor, an instrumentation descriptor, a style descriptor, an acoustical dynamics descriptor, or an emotive descriptor for the audio.

14. An apparatus to form an audio template for determining semantic audio information, comprising:
- a processor to;
  
  extract a first audio feature from audio, the first audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature;
  
  extract a second audio feature from the audio, the second audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature, and the second audio feature is different from the first audio feature; and
  
  determine a first range for the first audio feature and a second range for the second audio feature; and
  
  a storage to store the first and second ranges to compare against other audio features from subsequent audio to generate semantic audio information for the subsequent audio.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The apparatus of claim 14, wherein the temporal features include at least one of amplitude, power, or zero crossing of at least some of the audio.
  - 16. The apparatus of claim 14, wherein the spectral features include at least one of a spectral centroid, a spectral rolloff, a spectral flux, a spectral flatness measure, a spectral crest factor, Mel-frequency cepstral coefficients, Daubechies wavelet coefficients, a spectral dissonance, a spectral irregularity, or a spectral inharmonicity of at least some of the audio.
  - 17. The apparatus of claim 14, wherein the harmonic features include at least one of a pitch, a tonality, a pitch class profile, harmonic changes, a main pitch class, an octave range of dominant pitch, a main tonal interval relation, or an overall pitch strength of at least some of the audio.
  - 18. The apparatus of claim 14, wherein the rhythmic features include at least one of a rhythmic structure, a beat period, a rhythmic fluctuation, or an average tempo for at least some of the audio.
  - 19. The apparatus of claim 14, wherein the processor is further to transform at least some of the audio from a time domain to a frequency domain.

20. An article of manufacture comprising instructions that, when executed, cause a computing device to at least:
- extract a first audio feature from audio, the first audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature;
  
  extract a second audio feature from the audio, the second audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature, wherein the second audio feature is different from the first audio feature;
  
  determine a first range for the first audio feature and a second range for the second audio feature; and
  
  store the first range and the second range to compare against other audio features from subsequent audio to generate tags signifying semantic audio information for the subsequent audio.
- View Dependent Claims (21, 22, 23, 24, 25, 26)
- - 21. The article of manufacture of claim 20, wherein the temporal features include at least one of amplitude, power, or zero crossing of at least some of the audio.
  - 22. The article of manufacture of claim 20, wherein the spectral features include at least one of a spectral centroid, a spectral rolloff, a spectral flux, a spectral flatness measure, a spectral crest factor, Mel-frequency cepstral coefficients, Daubechies wavelet coefficients, a spectral dissonance, a spectral irregularity, or a spectral inharmonicity of at least some of the audio.
  - 23. The article of manufacture of claim 20, wherein the harmonic features include at least one of a pitch, a tonality, a pitch class profile, harmonic changes, a main pitch class, an octave range of dominant pitch, a main tonal interval relation, or an overall pitch strength of at least some of the audio.
  - 24. The article of manufacture of claim 20, wherein the rhythmic features include at least one of a rhythmic structure, a beat period, a rhythmic fluctuation, or an average tempo for at least some of the audio.
  - 25. The article of manufacture of claim 20, further including instructions that, when executed, cause the computing device to transform at least some of the audio from a time domain to a frequency domain.
  - 26. The article of manufacture of claim 20, wherein the tags are modifiable via a vocabulary library.

27. An apparatus to determine semantic audio information from audio, comprising:
- a processor to;
  
  extract a first audio feature from the audio, the first audio feature including at least one of a rhythmic structure, a beat period, a rhythmic fluctuation, or an average tempo;
  
  extract a second audio feature from the audio, the second audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature, wherein the second audio feature is different from the first audio feature;
  
  compare the first and second audio features to a plurality of stored audio feature ranges having tags associated therewith; and
  
  determine the stored audio feature ranges matching the first and second audio features, the tags associated with the matching audio feature ranges to be used to determine the semantic audio information for the audio.
- View Dependent Claims (28, 29, 30, 31, 32)
- - 28. The apparatus of claim 27, wherein the temporal feature includes at least one of amplitude, power, or zero crossing of at least some of the audio.
  - 29. The apparatus of claim 27, wherein the spectral feature includes at least one of a spectral centroid, a spectral rolloff, a spectral flux, a spectral flatness measure, a spectral crest factor, Mel-frequency cepstral coefficients, Daubechies wavelet coefficients, a spectral dissonance, a spectral irregularity, or a spectral inharmonicity of at least some of the audio.
  - 30. The apparatus of claim 27, wherein the harmonic feature includes at least one of a pitch, a tonality, a pitch class profile, harmonic changes, a main pitch class, an octave range of dominant pitch, a main tonal interval relation, or an overall pitch strength of at least some of the audio.
  - 31. The apparatus of claim 27, wherein comparing the first and second audio features to the plurality of stored audio feature ranges includes using at least one of a k-Nearest neighbor, a Gaussian Mixture Model, tree-based vector quantization, a linear discriminate analysis, a Euclidean distance, or a binary classification.
  - 32. The apparatus of claim 27, wherein the semantic information includes at least one of a genre descriptor, an instrumentation descriptor, a style descriptor, an acoustical dynamics descriptor, or an emotive descriptor for the audio.

33. An article of manufacture comprising instructions that, when executed, cause a computing device to at least:
- extract a first audio feature from audio, the first audio feature including at least one of a rhythmic structure, a beat period, a rhythmic fluctuation, or an average tempo;
  
  extract a second audio feature from the audio, the second audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature, wherein the second audio feature is different from the first audio feature;
  
  compare the first and second audio features to a plurality of stored audio feature ranges having tags associated therewith; and
  
  determine the stored audio feature ranges matching the first and second audio features, the tags associated with the matching audio feature ranges to be used to determine semantic audio information for the audio.
- View Dependent Claims (34, 35, 36, 37, 38)
- - 34. The article of manufacture of claim 33, wherein the temporal feature includes at least one of amplitude, power, or zero crossing of at least some of the audio.
  - 35. The article of manufacture of claim 33, wherein the spectral feature includes at least one of a spectral centroid, a spectral rolloff, a spectral flux, a spectral flatness measure, a spectral crest factor, Mel-frequency cepstral coefficients, Daubechies wavelet coefficients, a spectral dissonance, a spectral irregularity, or a spectral inharmonicity of at least some of the audio.
  - 36. The article of manufacture of claim 33, wherein the harmonic feature includes at least one of a pitch, a tonality, a pitch class profile, harmonic changes, a main pitch class, an octave range of dominant pitch, a main tonal interval relation, or an overall pitch strength of at least some of the audio.
  - 37. The article of manufacture of claim 33, further including instructions, that when executed, cause the computing device to compare the first and second audio features to the plurality of stored audio feature ranges using at least one of a k-Nearest neighbor, a Gaussian Mixture Model, tree-based vector quantization, a linear discriminate analysis, a Euclidean distance, or a binary classification.
  - 38. The article of manufacture of claim 33, wherein the semantic information includes at least one of a genre descriptor, an instrumentation descriptor, a style descriptor, an acoustical dynamics descriptor, or an emotive descriptor for the audio.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The Nielsen Company LLC (MOOD Media Corporation)
Original Assignee
The Nielsen Company LLC (MOOD Media Corporation)
Inventors
Neuhauser, Alan, Stavropoulos, John
Primary Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US13/724,836
Publication Number

US 20140180673A1
Time in Patent Office

1,068 Days
Field of Search

704/217, 704/237
US Class Current

1/1
CPC Class Codes

G06F 40/40   Processing or translation o...

G10H 1/40   Rhythm

G10H 2210/036   of musical genre, i.e. anal...

G10H 2210/066   for pitch analysis as part ...

G10H 2210/071   for rhythm pattern analysis...

G10H 2210/076   for extraction of timing, t...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/1822   Parsing for meaning underst...

G10L 19/018   Audio watermarking, i.e. em...

Audio processing techniques for semantic audio recognition and report generation

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

82 Citations

38 Claims

Specification

Solutions

Use Cases

Quick Links

Audio processing techniques for semantic audio recognition and report generation

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

82 Citations

38 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links