Methods and systems for performing signal analysis to identify content types
First Claim
1. A method of processing audio signals to identify content, the method comprising:
- receiving digitized audio content;
decoding the audio content using a decoder;
segmenting frames of the decoded audio content by applying a windowing function to a given audio frame using a first window type having a time width approximately equal to a delay time of the decoder;
calculating an estimate of a power spectrum of a given frame;
applying a mel filter bank to the power spectrum of the given frame and providing resulting filter bank energies;
applying a DCT matrix to the resulting filter bank energies to generate a DCT output;
taking a log of the DCT output to generate a mel coefficient 1;
dynamically calculating a first threshold for the content; and
utilizing the mel coefficient 1 and the dynamically calculated first threshold to detect a near silence between content of different types and to identify the types of content separated by the near silence.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods are configured to process audio signals to identify content-types. Audio content is received at an audio decoder which decodes the audio content. The decoded audio content is segmented into frames by applying a windowing function to a given audio frame using a window having a time width related to a delay time of the decoder. A power spectrum estimate of a given frame is determined. A mel filter bank is applied to the power spectrum of the frame. A DCT matrix is applied to filter bank energies to generate a DCT output. A log of the DCT output is used to generate a mel coefficient 1. A threshold for the content is dynamically determined. The mel coefficient 1 and the dynamically determined threshold are used to detect a near silence between content-types and to identify the content-types.
39 Citations
29 Claims
-
1. A method of processing audio signals to identify content, the method comprising:
-
receiving digitized audio content; decoding the audio content using a decoder; segmenting frames of the decoded audio content by applying a windowing function to a given audio frame using a first window type having a time width approximately equal to a delay time of the decoder; calculating an estimate of a power spectrum of a given frame; applying a mel filter bank to the power spectrum of the given frame and providing resulting filter bank energies; applying a DCT matrix to the resulting filter bank energies to generate a DCT output; taking a log of the DCT output to generate a mel coefficient 1; dynamically calculating a first threshold for the content; and utilizing the mel coefficient 1 and the dynamically calculated first threshold to detect a near silence between content of different types and to identify the types of content separated by the near silence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A content identification system, comprising:
-
an input circuit configured to receive bitstream audio channel content; an audio decoder circuit coupled to the input circuit and configured to decode the bitstream audio channel content; an analysis engine configured to; segment frames of the decoded audio content by applying a windowing function to a given audio frame using a first window type having a time width approximately equal to a delay time of the decoder; calculate an estimate of a power spectrum of a given frame; apply a mel filter bank to the power spectrum of the given frame and providing resulting filter bank energies; apply a DCT matrix to the resulting filter bank energies to generate a DCT output; take a log of the DCT output to generate a mel coefficient 1; dynamically calculate a first threshold for the content; and utilize the mel coefficient 1 and the dynamically calculated first threshold to detect a near silence between content of different types and to identify the types of content separated by the near silence. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A non-transitory computer-readable storage medium storing computer-executable instructions that when executed by a processor perform operations comprising:
-
receiving digitized audio content; decoding the audio content using a decoder; segmenting frames of the decoded audio content by applying a windowing function to a given audio frame using a first window type having a first window time width; calculating an estimate of a power spectrum of a given frame; applying a mel filter bank to the power spectrum of the given frame and providing resulting filter bank energies; applying a DCT matrix to the resulting filter bank energies to generate a DCT output; taking a log of the DCT output to generate a mel coefficient 1; dynamically calculating a first threshold for the content; and utilizing the mel coefficient 1 and the dynamically calculated first threshold to detect a near silence between content of different types and to identify the types of content separated by the near silence. - View Dependent Claims (26, 27, 28, 29)
-
Specification