Audio processing techniques for semantic audio recognition and report generation
First Claim
1. An apparatus for forming an audio template for determining semantic audio information, the apparatus comprising:
- a processor to;
extract a plurality of audio features from audio, at least one of the plurality of audio features including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature;
determine a range for each of the plurality of audio features; and
store a set of ranges of the plurality of audio features to compare against other audio features from subsequent audio to generate a tag for the set of ranges signifying semantic audio information for the subsequent audio, wherein the set of ranges includes more than one range and the tag is associated with an audio timbre range, a beat range, a loudness range and a spectral histogram range.
8 Assignments
0 Petitions
Accused Products
Abstract
System, apparatus and method for determining semantic information from audio, where incoming audio is sampled and processed to extract audio features, including temporal, spectral, harmonic and rhythmic features. The extracted audio features are compared to stored audio templates that include ranges and/or values for certain features and are tagged for specific ranges and/or values. Extracted audio features that are most similar to one or more templates from the comparison are identified according to the tagged information. The tags are used to determine the semantic audio data that includes genre, instrumentation, style, acoustical dynamics, and emotive descriptor for the audio signal.
-
Citations
21 Claims
-
1. An apparatus for forming an audio template for determining semantic audio information, the apparatus comprising:
a processor to; extract a plurality of audio features from audio, at least one of the plurality of audio features including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature; determine a range for each of the plurality of audio features; and store a set of ranges of the plurality of audio features to compare against other audio features from subsequent audio to generate a tag for the set of ranges signifying semantic audio information for the subsequent audio, wherein the set of ranges includes more than one range and the tag is associated with an audio timbre range, a beat range, a loudness range and a spectral histogram range. - View Dependent Claims (2, 3, 4, 5, 6)
-
7. An apparatus for forming an audio template for determining semantic audio information, the apparatus comprising:
-
a processor to; extract a plurality of audio features from audio, at least one of the plurality of audio features including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature; determine a range for each of the plurality of audio features; and store a set of ranges of the plurality of audio features to compare against other audio features from subsequent audio to generate a tag for the set of ranges signifying semantic audio information for the subsequent audio, wherein the set of ranges includes more than one range and the tag is associated with timber and includes a range for a mean of a spectral centroid, a range for a variance of the spectral centroid, and a range of a percentage of low/high energy frames. - View Dependent Claims (8, 9, 10, 11)
-
-
12. An apparatus for forming an audio template for determining semantic audio information, the apparatus comprising:
-
a processor to; extract a plurality of audio features from audio, at least one of the plurality of audio features including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature; determine a range for each of the plurality of audio features; and store a set of ranges of the plurality of audio features to compare against other audio features from subsequent audio to generate a tag for the set of ranges signifying semantic audio information for the subsequent audio, wherein the set of ranges includes more than one range and the tag is associated with beat and includes a range for an amplitude of peaks in a beat histogram, a range for periods of peaks in the beat histogram, and a range for a ratio between a peak and a sum of all peaks in the beat histogram. - View Dependent Claims (13, 14, 15, 16)
-
-
17. An apparatus for forming an audio template for determining semantic audio information, the apparatus comprising:
-
a processor to; extract a plurality of audio features from audio, at least one of the plurality of audio features including at least one of a temporal feature, a spectral feature, a harmonic feature, or a rhythmic feature; determine a range for each of the plurality of audio features; and store a set of ranges of the plurality of audio features to compare against other audio features from subsequent audio to generate a tag for the set of ranges signifying semantic audio information for the subsequent audio, wherein the set of ranges includes more than one range and the tag is associated with pitch and includes a range for an amplitude of prominent peaks in a pitch histogram, and a range for periods of peaks in the pitch histogram, wherein the pitch histogram is on a full semitone scale or an octave independent scale. - View Dependent Claims (18, 19, 20, 21)
-
Specification