Systems and methods for recognizing sound and music signals in high noise and distortion
First Claim
1. A method performed by a computing device, the method comprising:
- receiving a media sample;
converting, by a computing device, the media sample into frequency transform frames at periodic time intervals;
determining, by the computing device, an energy density representation of the frequency transform frames;
determining, by the computing device, time-frequency coordinates corresponding to local maxima of the energy density representation;
generating, by the computing device, fingerprints of the media sample based on determined time-frequency coordinates of the energy density representation; and
performing a content identification of the media sample using the fingerprint by comparing the fingerprint to stored fingerprints in memory of a database index.
4 Assignments
0 Petitions
Accused Products
Abstract
A method for recognizing an audio sample locates an audio file that closely matches the audio sample from a database indexing a large set of original recordings. Each indexed audio file is represented in the database index by a set of landmark timepoints and associated fingerprints. Landmarks occur at reproducible locations within the file, while fingerprints represent features of the signal at or near the landmark timepoints. To perform recognition, landmarks and fingerprints are computed for the unknown sample and used to retrieve matching fingerprints from the database. For each file containing matching fingerprints, the landmarks are compared with landmarks of the sample at which the same fingerprints were computed. If a large number of corresponding landmarks are linearly related, i.e., if equivalent fingerprints of the sample and retrieved file have the same time evolution, then the file is identified with the sample.
40 Citations
18 Claims
-
1. A method performed by a computing device, the method comprising:
-
receiving a media sample; converting, by a computing device, the media sample into frequency transform frames at periodic time intervals; determining, by the computing device, an energy density representation of the frequency transform frames; determining, by the computing device, time-frequency coordinates corresponding to local maxima of the energy density representation; generating, by the computing device, fingerprints of the media sample based on determined time-frequency coordinates of the energy density representation; and performing a content identification of the media sample using the fingerprint by comparing the fingerprint to stored fingerprints in memory of a database index. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system, comprising:
-
a computing device; and a memory, the memory configured to store instructions that when executed by the computing device cause the system to perform functions comprising; receiving a media sample; converting, by the computing device, the media sample into frequency transform frames at periodic time intervals; determining, by the computing device, an energy density representation of the frequency transform frames; determining, by the computing device, time-frequency coordinates corresponding to local maxima of the energy density representation; generating, by the computing device, fingerprints of the media sample based on determined time-frequency coordinates of the energy density representation; and performing a content identification of the media sample using the fingerprint by comparing the fingerprint to stored fingerprints in memory of a database index. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A non-transitory computer readable medium having stored there instructions, that when executed by a computing device, cause the computing device to perform functions comprising:
-
receiving a media sample; converting, by the computing device, the media sample into frequency transform frames at periodic time intervals; determining, by the computing device, an energy density representation of the frequency transform frames; determining, by the computing device, time-frequency coordinates corresponding to local maxima of the energy density representation; and generating, by the computing device, fingerprints of the media sample based on determined time-frequency coordinates of the energy density representation. - View Dependent Claims (17, 18)
-
Specification