Generating and matching hashes of multimedia content
First Claim
1. A method to identify content in an information signal, the method comprising:
- dividing the information signal into frames;
dividing each frame of the information signal into disjoint bands;
calculating a property of the information signal in each of said bands to compute a hash word for each frame;
generating a hash signal by concatenating successive hash words; and
obtaining a match of the information signal with a known content item, based on the hash signal.
7 Assignments
0 Petitions
Accused Products
Abstract
Hashes are short summaries or signatures of data files which can be used to identify the file. Hashing multimedia content (audio, video, images) is difficult because the hash of original content and processed (e.g. compressed) content may differ significantly. The disclosed method generates robust hashes for multimedia content, for example, audio clips. The audio clip is divided (12) into successive (preferably overlapping) frames. For each frame, the frequency spectrum is divided (15) into bands. A robust property of each band (e.g. energy) is computed (16) and represented (17) by a respective hash bit. An audio clip is thus represented by a concatenation of binary hash words, one for each frame. To identify a possibly compressed audio signal, a block of hash words derived therefrom is matched by a computer (20) with a large database (21). Such matching strategies are also disclosed. In an advantageous embodiment, the extraction process also provides information (19) as to which of the hash bits are the least reliable. Flipping these bits considerably improves the speed and performance of the matching process.
-
Citations
28 Claims
-
1. A method to identify content in an information signal, the method comprising:
-
dividing the information signal into frames; dividing each frame of the information signal into disjoint bands; calculating a property of the information signal in each of said bands to compute a hash word for each frame; generating a hash signal by concatenating successive hash words; and obtaining a match of the information signal with a known content item, based on the hash signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method to identify content in an information signal, the method comprising:
-
dividing the information signal into blocks; extracting for each block a feature of the information signal within a block; comparing the value of the extracted feature with a threshold; generating for each block a hash bit based on the outcome of the comparing; determining for each hash bit reliability information based on the difference between a value associated with the extracted feature and the threshold; and combining hash bits generated for the blocks and the reliability information for each hash bit into a hash value, the hash value having reliable hash bits and unreliable hash bits. - View Dependent Claims (13)
-
-
14. A method to identify multimedia content, the method comprising:
-
receiving an input block of hash words, the input block representing at least a part of an information signal; interrogating a look-up table with a selected hash word from the input block to obtain a found hash word; comparing the input block and a stored block of hash words in which the found hash word has the same position as the selected hash word; and selectively identifying the stored block of hash words as a matching reference signal based on the outcome of the comparing. - View Dependent Claims (15, 16, 17)
-
-
18. A method to identify subject content, the method comprising:
-
receiving a hash value associated with an information signal, the hash value comprising one or more reliable hash bits and one or more unreliable hash bits; interrogating a look up table with the one or more reliable bits to determine one or more matching stored hash values; for each of the one or more matching stored hash values, calculating a bit error rate, the bit error rate representing a relationship between the one or more reliable bits and the corresponding bits of a matching stored hash value from the one or more matching stored hash values; selecting a matching stored hash value from the one or more matching stored hash values, for which the bit error rate is minimal; and identifying the matching stored hash value, for which the bit error rate is minimal, as the matching stored hash value associated with the subject content. - View Dependent Claims (19, 20)
-
-
21. A machine-readable medium embodying instructions which, when executed by a machine, cause the machine to:
-
divide the information signal into frames; divide each frame of the information signal into disjoint bands; calculate a property of the information signal in each of said bands to compute a hash word for each frame; generate a hash signal by concatenating successive hash words; and obtain a match of the information signal with a known content item based on the hash signal.
-
-
22. An apparatus to identify content in an information signal, the method comprising:
-
a framing circuit to divide the information signal into frames; a band division circuit to divide each frame of the information signal into disjoint bands; a computing circuit to determine a property of the information signal in each of said bands to compute a hash word for each frame; a hash signal generator to generate a hash signal by concatenating successive hash words; and a matching circuit to obtain a match of the information signal with a known content item, based on the hash signal. - View Dependent Claims (23, 24, 25, 26, 27)
-
-
28. A system to identify content in an information signal, the system comprising:
-
means for dividing the information signal into frames; means for dividing each frame of the information signal into disjoint bands; means for calculating a property of the information signal in each of said bands to compute a hash word for each frame; means for generating a hash signal by concatenating successive hash words; and means for obtaining a match of the information signal with a known content item, based on the hash signal.
-
Specification