Method and system for generating a characteristic identifier for digital data and for detecting identical digital data
First Claim
1. A method for generating a characteristic identifier for digital data, comprising the steps of:
- mapping frequency coordinates and peak values, of a predetermined number of prominent peaks occurring in an energy spectrum generated from the digital data, into a number of equivalence classes, wherein the prominent peaks are selected by using peak values of a plurality of peaks in the energy spectrum, and wherein the peak values for all equivalent frequency coordinates are summed into one of the equivalence classes; and
generating the characteristic identifier comprising results of the mapping.
1 Assignment
0 Petitions
Accused Products
Abstract
A characteristic identifier for digital data is generated. Thereby, the information contained in a digital data set is reduced such that the resulting identifier is made comparable to another identifier made in the same manner. The generated identifiers are used for detecting identical digital data or to determine inexact copies of digital data. In one embodiment of the invention, the digital data is a digital audio signal and the characteristic identifier is called an audio signature. The comparison of identical audio data according to the invention can be carried out without a person actually listening to the audio data. The present invention can be used to establish automated processes to find potential unauthorized copies of audio data, e.g., music recordings, and therefore enables a better enforcement of copyrights in the audio industry.
-
Citations
24 Claims
-
1. A method for generating a characteristic identifier for digital data, comprising the steps of:
-
mapping frequency coordinates and peak values, of a predetermined number of prominent peaks occurring in an energy spectrum generated from the digital data, into a number of equivalence classes, wherein the prominent peaks are selected by using peak values of a plurality of peaks in the energy spectrum, and wherein the peak values for all equivalent frequency coordinates are summed into one of the equivalence classes; and
generating the characteristic identifier comprising results of the mapping. - View Dependent Claims (2, 3, 4)
generating the energy spectrum from the digital data;
selecting the predetermined number of prominent peaks from the energy spectrum by using peak values of the plurality of peaks in the energy spectrum to select the predetermined number of peaks having highest peak values; and
selecting frequency coordinates belonging to the prominent peaks.
-
-
4. The method of claim 3, further comprising the steps of:
-
transforming the frequency coordinates into an interval scale; and
quantizing the frequency coordinates.
-
-
5. A method for generating a characteristic identifier for digital data, comprising the steps of:
-
generating an energy spectrum from the digital data;
selecting a predetermined number of prominent peaks from the energy spectrum, wherein the prominent peaks are selected by using peak values of a plurality of peaks in the energy spectrum;
selecting frequency coordinates belonging to the prominent peaks;
transforming the frequency coordinates into an interval scale;
quantizing the frequency coordinates;
applying an equivalence transformation to the quantized frequency coordinates and peak values of the prominent peaks, wherein a constant number of equivalence classes is used, and wherein the peak values for all equivalent frequency coordinates are summed into one of the equivalence classes; and
generating the characteristic identifier comprising results of the equivalence transformation. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13)
determining whether the digital data is represented as a series expansion with respect to a complete set of elementary signals; and
if the digital data is not represented as a series expansion, carrying out a series expansion.
-
-
10. The method of claim 5, whereby the digital data is represented as a series expansion with respect to a complete set of elementary signals, and the energy spectrum is generated using information from the series expansion.
-
11. The method of claim 5, further comprising the step of:
storing the characteristic identifier into a database.
-
12. The method of claim 5, further comprising the steps of:
-
performing the steps of generating, selecting a predetermined number of prominent peaks, selecting frequency coordinates, transforming, quantizing, and applying for two digital data sets, thereby creating two resulting characteristic identifiers;
determining a distance between the two resulting characteristic identifiers; and
determining whether or not the digital data sets are identical.
-
-
13. The method of claim 12, further comprising the step of:
generating a report comprising a result of the step of determining whether or not the digital data sets are identical.
-
14. System for generating a characteristic identifier for digital data, the system comprising:
-
a signature generator comprising a data input module;
a spectrum module connected to the data input module, the spectrum module generating an energy spectrum from the digital data;
a peak selection module connected to the spectrum module, the peak selection module selecting a predetermined number of prominent peaks from the energy spectrum and selecting frequency coordinates belonging to the prominent peaks, wherein the prominent peaks are selected by using peak values of a plurality of peaks in the energy spectrum;
a peak quantization module connected to the peak selection module, the peak quantization module transforming the frequency coordinates into an interval scale and quantizing the frequency coordinates;
a peak folding module connected to the peak quantization module, the peak folding module applying an equivalence transformation to the quantized frequency coordinates and peak values of the prominent peaks, wherein a constant number of equivalence classes is used, and wherein the peak values for all equivalent frequency coordinates are summed into one of the equivalence classes, the peak folding module generating the characteristic identifier comprising results of the equivalence transformation; and
a data output module connected to the peak folding module. - View Dependent Claims (15, 16, 17, 18, 19, 20)
a format check module connected to the data input module, and the format check module determining whether the digital data is represented as a series expansion with respect to a complete set of elementary signals; and
a series expansion module connected to the format check module, the series expansion module carrying out a series expansion if the digital data is not represented as a series expansion.
-
-
17. The system of claim 14, wherein the digital data is represented as a series expansion with respect to a complete set of elementary signals, and the spectrum module generates the energy spectrum using information from the series expansion.
-
18. The system of claim 14, further comprising:
-
a signature analyzer comprising;
a data input module; and
a computing and evaluation module connected to the data input module, the computing and evaluation module determining a distance between identifiers of digital data, and determining whether or not digital data sets are identical; and
a data output module connected to the computing and evaluation module.
-
-
19. The system of claim 18, further comprising a report generator connected to the signature generator, the report generator generating a report that comprises a result of the determination of whether or not digital data sets are identical.
-
20. The system of claim 18, further comprising a database connected to the signature generator.
-
21. A computer program product for generating a characteristic identifier for digital data, the computer program product directly loadable into an internal memory of a computer, comprising software code portions comprising:
-
a step to generate an energy spectrum from the digital data;
a step to select a predetermined number of prominent peaks from the energy spectrum, wherein the prominent peaks are selected by using peak values of a plurality of peaks in the energy spectrum;
a step to select frequency coordinates belonging to the prominent peaks;
a step to transform the frequency coordinates into an interval scale;
a step to quantize the frequency coordinates;
a step to apply an equivalence transformation to the quantized frequency coordinates and peak values of the prominent peaks, wherein a constant number of equivalence classes is used, and wherein the peak values for all equivalent frequency coordinates are summed into one of the equivalence classes; and
a step to generate the characteristic identifier comprising results of the equivalence transformation. - View Dependent Claims (22)
a step to perform the steps of generate, select a predetermined number of prominent peaks, select frequency coordinates, transform, quantize, and apply for two digital data sets, thereby creating two resulting characteristic identifiers;
a step to determine a distance between the two resulting characteristic identifiers; and
a step to determine whether or not the digital data sets are identical.
-
-
23. A computer system for generating a characteristic identifier for digital data, the computer system comprising an internal memory and an execution environment, the execution environment configured to:
-
generate an energy spectrum from the digital data;
select a predetermined number of prominent peaks from the energy spectrum, wherein the prominent peaks are selected by using peak values of a plurality of peaks in the energy spectrum;
select frequency coordinates belonging to the prominent peaks;
transform the frequency coordinates into an interval scale;
quantize the frequency coordinates;
apply an equivalence transformation to the quantized frequency coordinates and peak values of the prominent peaks, wherein a constant number of equivalence classes is used, and wherein the peak values for all equivalent frequency coordinates are summed into one of the equivalence classes; and
generate the characteristic identifier comprising results of the equivalence transformation. - View Dependent Claims (24)
perform the steps of generate, select a predetermined number of prominent peaks, select frequency coordinates, transform, quantize, and apply for two digital data sets, thereby creating two resulting characteristic identifiers;
determine a distance between the two resulting characteristic identifiers; and
determine whether or not the digital data sets are identical.
-
Specification