Audio Processing Techniques for Semantic Audio Recognition and Report Generation
First Claim
1. A method for forming an audio template for determining semantic audio information, comprising the steps of:
- receiving a plurality of audio signals in a computer processing device;
extracting a first audio feature from each of the received audio signals, said first audio feature comprising at least one of a temporal, spectral, harmonic and rhythmic feature;
extracting a second audio feature from each of the received audio signals, said second feature comprising at least one of a temporal, spectral, harmonic and rhythmic feature, wherein said second audio feature is different from the first audio feature;
determining a first range for the first audio features and a second range for the second audio feature;
associating and storing the first and second ranges for comparison against other audio features from subsequent audio signals to generate tags signifying semantic audio information for the subsequent audio signals.
10 Assignments
0 Petitions
Accused Products
Abstract
System, apparatus and method for determining semantic information from audio, where incoming audio is sampled and processed to extract audio features, including temporal, spectral, harmonic and rhythmic features. The extracted audio features are compared to stored audio templates that include ranges and/or values for certain features and are tagged for specific ranges and/or values. Extracted audio features that are most similar to one or more templates from the comparison are identified according to the tagged information. The tags are used to determine the semantic audio data that includes genre, instrumentation, style, acoustical dynamics, and emotive descriptor for the audio signal.
57 Citations
20 Claims
-
1. A method for forming an audio template for determining semantic audio information, comprising the steps of:
-
receiving a plurality of audio signals in a computer processing device; extracting a first audio feature from each of the received audio signals, said first audio feature comprising at least one of a temporal, spectral, harmonic and rhythmic feature; extracting a second audio feature from each of the received audio signals, said second feature comprising at least one of a temporal, spectral, harmonic and rhythmic feature, wherein said second audio feature is different from the first audio feature; determining a first range for the first audio features and a second range for the second audio feature; associating and storing the first and second ranges for comparison against other audio features from subsequent audio signals to generate tags signifying semantic audio information for the subsequent audio signals. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A processor-based method for determining semantic audio information from an audio signal, comprising the steps of:
-
receiving the audio signal in a computer processing device; extracting a first audio feature from the received audio signal, said first audio feature comprising at least one of a temporal, spectral, harmonic and rhythmic feature; extracting a second audio feature from the received audio signal, said second feature comprising at least one of a temporal, spectral, harmonic and rhythmic feature, wherein said second audio feature is different from the first audio feature; processing the first and second audio features to compare the first and second audio features to a plurality of stored audio feature ranges having tags associated therewith; and determining the stored audio feature ranges having the most similar comparison to the first and second audio features, wherein the tags associated with the audio feature ranges having the closest comparison are used to determine semantic audio information for the audio signal. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system for forming an audio template for determining semantic audio information, comprising:
-
an input for receiving a plurality of audio signals in a computer processing device; a processor, operatively coupled to the input, said processor being configured to extract a first audio feature from each of the received audio signals, said first audio feature comprising at least one of a temporal, spectral, harmonic and rhythmic feature, said processor being further configured to extract a second audio feature from each of the received audio signals, said second feature comprising at least one of a temporal, spectral, harmonic and rhythmic feature, and wherein said second audio feature is different from the first audio feature; and a storage, operatively coupled to the processor, wherein the processor is configured to determine a first range for the first audio features and a second range for the second audio feature and associate, and associate and store in the storage the first and second ranges for comparison against other audio features from subsequent audio signals for generating semantic audio information for the subsequent audio signals. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification