System and methods for recognizing sound and music signals in high noise and distortion
First Claim
1. A method for recognizing a media entity from a media sample, comprising:
- computing a set of sample fingerprints, each sample fingerprint characterizing a particular sample landmark within said media sample;
obtaining a set of file fingerprints, each file fingerprint characterizing a particular file landmark within a media entity to be identified;
generating correspondences between said sample landmarks and said obtained file landmarks, wherein corresponding landmarks have equivalent fingerprints; and
identifying said media entity if a plurality of said corresponding landmarks are substantially linearly related.
5 Assignments
0 Petitions
Accused Products
Abstract
A method for recognizing an audio sample locates an audio file that most closely matches the audio sample from a database indexing a large set of original recordings. Each indexed audio file is represented in the database index by a set of landmark timepoints and associated fingerprints. Landmarks occur at reproducible locations within the file, while fingerprints represent features of the signal at or near the landmark timepoints. To perform recognition, landmarks and fingerprints are computed for the unknown sample and used to retrieve matching fingerprints from the database. For each file containing matching fingerprints, the landmarks are compared with landmarks of the sample at which the same fingerprints were computed. If a large number of corresponding landmarks are linearly related, i.e., if equivalent fingerprints of the sample and retrieved file have the same time evolution, then the file is identified with the sample. The method can be used for any type of sound or music, and is particularly effective for audio signals subject to linear and nonlinear distortion such as background noise, compression artifacts, or transmission dropouts. The sample can be identified in a time proportional to the logarithm of the number of entries in the database; given sufficient computational power, recognition can be performed in nearly real time as the sound is being sampled.
872 Citations
114 Claims
-
1. A method for recognizing a media entity from a media sample, comprising:
-
computing a set of sample fingerprints, each sample fingerprint characterizing a particular sample landmark within said media sample; obtaining a set of file fingerprints, each file fingerprint characterizing a particular file landmark within a media entity to be identified; generating correspondences between said sample landmarks and said obtained file landmarks, wherein corresponding landmarks have equivalent fingerprints; and identifying said media entity if a plurality of said corresponding landmarks are substantially linearly related. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 60)
-
-
55. A method for recognizing a media entity from a media sample, comprising:
-
receiving a set of sample fingerprints, each sample fingerprint characterizing a particular sample landmark within said media sample; obtaining a set of file fingerprints, each file fingerprint characterizing a particular file landmark within a media entity to be identified; generating correspondences between said sample landmarks and said obtained file landmarks, wherein corresponding landmarks have equivalent fingerprints; and identifying said media entity if a plurality of said corresponding landmarks are substantially linearly related.
-
-
56. A method for recognizing a media sample, comprising:
-
continually sampling into a sound buffer N seconds of said media sample; computing a set of sample fingerprints characterizing a segment of said media sample stored in said sound buffer, wherein said segment has one or more distinct landmarks occurring at reproducible locations of said media sample; storing said fingerprints in a rolling buffer; obtaining a set of matching fingerprints in a database index, each matching fingerprint characterizing at least one distinct landmark of a media file and is equivalent to at least one fingerprint in said rolling buffer; identifying at least one media file having a plurality of matching fingerprints; reporting presence of said at least one media file; and removing at least one sample fingerprint from said rolling buffer. - View Dependent Claims (57, 58, 59, 61)
-
-
62. A program storage device accessible by a computer, tangibly embodying a program of instructions executable by said computer to perform method steps for recognizing a media entity from a media sample, said program of instructions comprising:
-
code for computing a set of sample fingerprints, each sample fingerprint characterizing a particular sample landmark within said media sample; code for obtaining a set of file fingerprints, each file fingerprint characterizing a particular file landmark within a media entity to be identified; code for generating correspondences between said sample landmarks and said obtained file landmarks, wherein corresponding landmarks have equivalent fingerprints; and code for identifying said media entity if a plurality of said corresponding landmarks are substantially linearly related.
-
-
63. A system for recognizing a media entity from a media sample, comprising:
-
a landmarking and fingerprinting object for computing a set of particular sample landmarks within said media sample and a set of sample fingerprints, each sample fingerprint characterizing one of said particular sample landmarks; a database index containing file landmarks and corresponding file fingerprints for at least one media entity to be identified; and an analysis object for; locating a set of matching fingerprints in said database index, wherein said matching fingerprints are equivalent to said sample fingerprints; generating correspondences between said sample landmarks and said file landmarks, wherein corresponding landmarks have equivalent fingerprints; and identifying at least one media entity for which a plurality of said corresponding landmarks are substantially linearly related.
-
-
64. A computer-implemented method for recognizing an audio sample, comprising:
-
creating a database index of at least one audio file in a database, comprising; computing landmarks and fingerprints for each audio file, wherein each landmark occurs at a particular location within said audio file and is associated with a fingerprint; associating, for each audio file, said landmarks and fingerprints with an identifier; and storing said fingerprints, said landmarks, and said identifier in a memory. - View Dependent Claims (65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91)
-
-
92. A method for recognizing a media entity from a media sample, comprising:
-
generating correspondences between landmarks of said media sample and corresponding landmarks of a media entity to be identified, wherein said landmarks of said media sample and said corresponding landmarks of said media entity have equivalent fingerprints; and identifying said media sample and said media entity if a plurality of said correspondences have a linear relationship defined by
landmark*n=m*landmarkn+offset,where landmarkn is a sample landmark, landmark*n is a file landmark that corresponds to landmarkn, and m represents slope.
-
-
93. A method for recognizing a media sample, comprising
identifying media files that have file landmarks that are substantially linearly related to sample landmarks of said media sample; - wherein
said file landmarks and said sample landmarks have equivalent fingerprints; and
whereinsaid file landmarks and said sample landmarks have a linear correspondence defined by
landmark*n=m*landmarkn+offset,where landmarkn is a sample landmark, landmark*n is a file landmark that corresponds to landmarkn, and m represents slope.
- wherein
-
94. A method for comparing an audio sample and an audio entity, comprising:
-
for each of at least one audio entity to be identified, computing a plurality of entity fingerprints representing said audio entity;
wherein each entity fingerprint characterizes one or more features of said audio entity at or near an entity landmark in at least one dimensions including time;computing a plurality of sample fingerprints representing said audio sample, wherein said sample fingerprints are invariant to time stretching of said audio sample; and identifying a matching audio entity that has at least a threshold number of said file fingerprints that are equivalent to said sample fingerprints. - View Dependent Claims (95, 96)
-
- 97. A method of characterizing an audio sample, comprising computing at least one fingerprint from a spectrogram of said audio sample, wherein said spectrogram comprises an anchor salient point and linked salient points, and wherein said fingerprint is computed from frequency coordinates of said anchor salient point and at least one linked salient point.
-
106. A method for comparing an audio sample and an audio entity, comprising:
-
for each of at least one audio entity to be identified, computing a plurality of entity landmark/fingerprint pairs representing said audio entity, wherein each landmark occurs at a particular location within said audio entity in at least one dimension including time, and wherein each fingerprint characterizes one or more features of said audio entity at or near said particular location; computing a plurality of sample landmark/fingerprint pairs representing said audio sample by obtaining time and frequency coordinates of at least one salient point of a spectrogram of said audio sample, wherein each salient point serves as an anchor point defining a sample landmark; and generating at least one multidimensional sample landmark/fingerprint pair from said at least one salient point, wherein sample landmarks of said audio sample are taken to be time coordinates and wherein corresponding sample fingerprints are computed from at least one of the remaining coordinates; and identifying a winning audio entity that has at least a threshold number of said file fingerprints that are equivalent to said sample fingerprints. - View Dependent Claims (107, 108, 109, 110, 111, 112, 113, 114)
-
Specification