Audio processing techniques for semantic audio recognition and report generation
First Claim
1. A method for forming an audio template for determining semantic audio information, comprising:
- extracting a first audio feature from audio, the first audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature;
extracting a second audio feature from the audio, the second audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature, wherein the second audio feature is different from the first audio feature;
determining a first range for the first audio feature and a second range for the second audio feature; and
storing the first and second ranges to compare against other audio features from subsequent audio to generate tags signifying semantic audio information for the subsequent audio.
10 Assignments
0 Petitions
Accused Products
Abstract
System, apparatus and method for determining semantic information from audio, where incoming audio is sampled and processed to extract audio features, including temporal, spectral, harmonic and rhythmic features. The extracted audio features are compared to stored audio templates that include ranges and/or values for certain features and are tagged for specific ranges and/or values. Extracted audio features that are most similar to one or more templates from the comparison are identified according to the tagged information. The tags are used to determine the semantic audio data that includes genre, instrumentation, style, acoustical dynamics, and emotive descriptor for the audio signal.
82 Citations
38 Claims
-
1. A method for forming an audio template for determining semantic audio information, comprising:
-
extracting a first audio feature from audio, the first audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature; extracting a second audio feature from the audio, the second audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature, wherein the second audio feature is different from the first audio feature; determining a first range for the first audio feature and a second range for the second audio feature; and storing the first and second ranges to compare against other audio features from subsequent audio to generate tags signifying semantic audio information for the subsequent audio. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A processor-based method for determining semantic audio information for audio, comprising:
-
extracting a first audio feature from the audio, the first audio feature including at least one of a rhythmic structure, a beat period, a rhythmic fluctuation, or an average tempo; extracting a second audio feature from the audio, the second audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature, wherein the second audio feature is different from the first audio feature; comparing the first and second audio features to a plurality of stored audio feature ranges having tags associated therewith; and determining the stored audio feature ranges having the closest matches to the first and second audio features, the tags associated with the audio feature ranges having the closest matches to be used to determine the semantic audio information for the audio. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. An apparatus to form an audio template for determining semantic audio information, comprising:
-
a processor to; extract a first audio feature from audio, the first audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature; extract a second audio feature from the audio, the second audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature, and the second audio feature is different from the first audio feature; and determine a first range for the first audio feature and a second range for the second audio feature; and a storage to store the first and second ranges to compare against other audio features from subsequent audio to generate semantic audio information for the subsequent audio. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. An article of manufacture comprising instructions that, when executed, cause a computing device to at least:
-
extract a first audio feature from audio, the first audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature; extract a second audio feature from the audio, the second audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature, wherein the second audio feature is different from the first audio feature; determine a first range for the first audio feature and a second range for the second audio feature; and store the first range and the second range to compare against other audio features from subsequent audio to generate tags signifying semantic audio information for the subsequent audio. - View Dependent Claims (21, 22, 23, 24, 25, 26)
-
-
27. An apparatus to determine semantic audio information from audio, comprising:
a processor to; extract a first audio feature from the audio, the first audio feature including at least one of a rhythmic structure, a beat period, a rhythmic fluctuation, or an average tempo; extract a second audio feature from the audio, the second audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature, wherein the second audio feature is different from the first audio feature; compare the first and second audio features to a plurality of stored audio feature ranges having tags associated therewith; and determine the stored audio feature ranges matching the first and second audio features, the tags associated with the matching audio feature ranges to be used to determine the semantic audio information for the audio. - View Dependent Claims (28, 29, 30, 31, 32)
-
33. An article of manufacture comprising instructions that, when executed, cause a computing device to at least:
-
extract a first audio feature from audio, the first audio feature including at least one of a rhythmic structure, a beat period, a rhythmic fluctuation, or an average tempo; extract a second audio feature from the audio, the second audio feature including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature, wherein the second audio feature is different from the first audio feature; compare the first and second audio features to a plurality of stored audio feature ranges having tags associated therewith; and determine the stored audio feature ranges matching the first and second audio features, the tags associated with the matching audio feature ranges to be used to determine semantic audio information for the audio. - View Dependent Claims (34, 35, 36, 37, 38)
-
Specification