Method and apparatus for automatically recognizing input audio and/or video streams
First Claim
1. Audio signal recognition server apparatus adapted to receive, from a capture device, feature data that corresponds to a captured audio sample that is less than an entire reference audio work, the recognition server apparatus comprising:
- interface structure configured to receive the sample feature data from the capture device;
a memory storing a library comprising (i) a first plurality of reference feature data sets which correspond to a first recorded reference audio work, and (ii) a second plurality of reference feature data sets which correspond to a second recorded reference audio work, each recorded reference audio work being longer than the captured audio sample; and
server processing structure configured to;
receive a first reference input audio signal corresponding to the first recorded reference audio work;
separate the received first reference input audio signal into a first plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the first plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the first plurality of frequency bands;
compute the first plurality of reference feature data sets, which correspond to spectrally distinct portions of the first plurality of frequency bands of the first received reference input audio signal, this computing comprising performing envelope extraction on the first plurality of frequency bands to provide low-bandwidth amplitude measurements of each of the first plurality of frequency bands to provide the first plurality of reference feature data sets;
store in the memory the first plurality of reference feature data sets which correspond to the first reference input audio signal;
receive a second reference input audio signal corresponding to the second recorded reference audio work;
separate the received second reference input audio signal into a second plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the second plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the second plurality of frequency bands;
compute the second plurality of reference feature data sets, which correspond to spectrally distinct portions of the second plurality of frequency bands of the second received reference input audio signal, this computing comprising performing envelope extraction on the second plurality of frequency bands to provide low-bandwidth amplitude measurements of each of the second plurality of frequency bands to provide the second plurality of reference feature data sets;
store in the memory the second plurality of reference feature data sets which correspond to the second reference input audio signal;
compare the sample feature data received by said interface structure with the stored first and second pluralities of reference feature data sets; and
generate a recognition signal in response to the received sample feature data matching at least one reference feature data set of the stored first and second pluralities of reference feature data sets.
0 Assignments
0 Petitions
Accused Products
Abstract
A method and system for the automatic identification of audio, video, multimedia, and/or data recordings based on immutable characteristics of these works. The invention does not require the insertion of identifying codes or signals into the recording. This allows the system to be used to identify existing recordings that have not been through a coding process at the time that they were generated. Instead, each work to be recognized is “played” into the system where it is subjected to an automatic signal analysis process that locates salient features and computes a statistical representation of these properties. These features are then stored as patterns for later recognition of live input signal streams. A different set of features is derived for each audio or video work to be identified and stored. During real-time monitoring of a signal stream, a similar automatic signal analysis process is carried out, and many features are computed for comparison with the patterns stored in a large feature database. For each particular pattern stored in the database, only the relevant characteristics are compared with the real-time feature set. Preferably, during analysis and generation of reference patterns, data are extracted from all time intervals of a recording. This allows a work to be recognized from a single sample taken from any part of the recording.
-
Citations
19 Claims
-
1. Audio signal recognition server apparatus adapted to receive, from a capture device, feature data that corresponds to a captured audio sample that is less than an entire reference audio work, the recognition server apparatus comprising:
-
interface structure configured to receive the sample feature data from the capture device; a memory storing a library comprising (i) a first plurality of reference feature data sets which correspond to a first recorded reference audio work, and (ii) a second plurality of reference feature data sets which correspond to a second recorded reference audio work, each recorded reference audio work being longer than the captured audio sample; and server processing structure configured to; receive a first reference input audio signal corresponding to the first recorded reference audio work; separate the received first reference input audio signal into a first plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the first plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the first plurality of frequency bands; compute the first plurality of reference feature data sets, which correspond to spectrally distinct portions of the first plurality of frequency bands of the first received reference input audio signal, this computing comprising performing envelope extraction on the first plurality of frequency bands to provide low-bandwidth amplitude measurements of each of the first plurality of frequency bands to provide the first plurality of reference feature data sets; store in the memory the first plurality of reference feature data sets which correspond to the first reference input audio signal; receive a second reference input audio signal corresponding to the second recorded reference audio work; separate the received second reference input audio signal into a second plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the second plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the second plurality of frequency bands; compute the second plurality of reference feature data sets, which correspond to spectrally distinct portions of the second plurality of frequency bands of the second received reference input audio signal, this computing comprising performing envelope extraction on the second plurality of frequency bands to provide low-bandwidth amplitude measurements of each of the second plurality of frequency bands to provide the second plurality of reference feature data sets; store in the memory the second plurality of reference feature data sets which correspond to the second reference input audio signal; compare the sample feature data received by said interface structure with the stored first and second pluralities of reference feature data sets; and generate a recognition signal in response to the received sample feature data matching at least one reference feature data set of the stored first and second pluralities of reference feature data sets. - View Dependent Claims (2, 3, 4, 5)
-
-
6. Audio signal recognition server apparatus adapted to receive, from a capture device, feature data that corresponds to a captured audio sample that is less than an entire reference audio work, the recognition server apparatus comprising:
-
interface structure configured to receive the sample feature data from the capture device; a memory storing a library comprising (i) a first plurality of reference feature data sets which correspond to a first recorded reference audio work, and (ii) a second plurality of reference feature data sets which correspond to a second recorded reference audio work, each recorded audio work being longer than the captured audio sample, the first plurality of reference feature data sets corresponding to a first plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the first plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the first plurality of frequency bands, the first plurality of reference feature data sets including features which correspond to spectrally distinct portions of the first plurality of frequency bands, the second plurality of reference feature data sets corresponding to a second plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the second plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the second plurality of frequency bands, the second plurality of reference feature data sets including features which correspond to spectrally distinct portions of the second plurality of frequency bands; and server processing structure configured to; compare the sample feature data received by said interface structure with the stored first and second pluralities of reference feature data sets; and generate a recognition signal in response to the received sample feature data matching at least one reference feature data set of the stored first and second pluralities of reference feature data sets. - View Dependent Claims (7)
-
-
8. Audio signal recognition server apparatus comprising:
-
interface structure configured to (i) receive a first reference input audio signal corresponding to a first recorded reference audio work, and (ii) receive a second reference input audio signal corresponding to a second recorded reference audio work; a memory storing a library comprising (i) a first plurality of reference feature data sets which correspond to the first recorded reference audio work, and (ii) a second plurality of reference feature data sets which correspond to the second recorded reference audio work, each recorded reference audio work being longer than a received sample signal; and server processing structure configured to; separate the received first reference input audio signal into a first plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the first plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the first plurality of frequency bands; compute the first plurality of reference feature data sets, which correspond to spectrally distinct portions of the first plurality of frequency bands of the first received reference input audio signal; store in the memory the first plurality of reference feature data sets which correspond to the first reference input audio signal; separate the received second reference input audio signal into a second plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the second plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the second plurality of frequency bands; compute the second plurality of reference feature data sets, which correspond to spectrally distinct portions of the second plurality of frequency bands of the second received reference input audio signal; and store in the memory the second plurality of reference feature data sets which correspond to the second reference input audio signal. - View Dependent Claims (9)
-
-
10. An audio signal recognition method adapted to receive, from a capture device, feature data that corresponds to a captured audio sample that is less than an entire reference audio work, the recognition server method comprising:
-
receiving, with an interface structure, the sample feature data from the capture device; storing, in a memory, a library comprising (i) a first plurality of reference feature data sets which correspond to a first recorded reference audio work, and (ii) a second plurality of reference feature data sets which correspond to a second recorded reference audio work, each recorded reference audio work being longer than the captured audio sample; and using a server processing structure to; receive a first reference input audio signal corresponding to the first recorded reference audio work; separate the received first reference input audio signal into a first plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the first plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the first plurality of frequency bands; compute the first plurality of reference feature data sets, which correspond to spectrally distinct portions of the first plurality of frequency bands of the first received reference input audio signal; store in the memory the first plurality of reference feature data sets which correspond to the first reference input audio signal; receive a second reference input audio signal corresponding to the second recorded reference audio work; separate the received second reference input audio signal into a second plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the second plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the second plurality of frequency bands; compute the second plurality of reference feature data sets, which correspond to spectrally distinct portions of the second plurality of frequency bands of the second received reference input audio signal; store in the memory the second plurality of reference feature data sets which correspond to the second reference input audio signal; compare the sample feature data received by said interface structure with the stored first and second pluralities of reference feature data sets; and generate a recognition signal in response to the received sample feature data matching at least one reference feature data set of the stored first and second pluralities of reference feature data sets.
-
-
11. An audio signal recognition server method adapted to receive, from a capture device, feature data that corresponds to a captured audio sample that is less than an entire reference audio work, the recognition server method comprising:
-
receiving, with an interface structure, the sample feature data from the capture device; storing, in a memory, a library comprising (i) a first plurality of reference feature data sets which correspond to a first recorded reference audio work, and (ii) a second plurality of reference feature data sets which correspond to a second recorded reference audio work, each recorded reference audio work being longer than the captured audio sample, the first plurality of reference feature data sets corresponding to a first plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the first plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the first plurality of frequency bands, the first plurality of reference feature data sets including features which correspond to spectrally distinct portions of the first plurality of frequency bands, the second plurality of reference feature data sets corresponding to a second plurality of frequency bands which have different frequencies, the second plurality of reference feature data sets including features which correspond to spectrally distinct portions of the second plurality of frequency bands, a frequency bandwidth of a lower frequency band of the second plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the second plurality of frequency bands; and using a server processing structure to; compare the sample feature data received by said interface structure with the stored first and second pluralities of reference feature data sets; and generate a recognition signal in response to the received sample feature data matching at least one feature data set of the stored first and second pluralities of reference feature data sets. - View Dependent Claims (12)
-
-
13. An audio signal recognition server method comprising:
-
receiving, with an interface structure, (i) a first reference input audio signal corresponding to the a first recorded reference audio work, and (ii) a second reference input audio signal corresponding to a second recorded reference audio work; storing, in a memory, a library comprising (i) a first plurality of reference feature data sets which correspond to the first recorded reference audio work, and (ii) a second plurality of reference feature data sets which correspond to the second recorded reference audio work, each recorded reference audio work being longer than a portion of a captured audio sample; and using a server processing structure to; separate the received first reference input audio signal into a first plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the first plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the first plurality of frequency bands; compute the first plurality of reference feature data sets, which correspond to spectrally distinct portions of the first plurality of frequency bands of the first received reference input audio signal; store in the memory the first plurality of reference feature data sets which correspond to the first reference input audio signal; separate the received second reference input audio signal into a second plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the second plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the second plurality of frequency bands; compute the second plurality of reference feature data sets, which correspond to spectrally distinct portions of the second plurality of frequency bands of the second received reference input audio signal; and store in the memory the second plurality of reference feature data sets which correspond to the second reference input audio signal. - View Dependent Claims (14)
-
-
15. At least one computer readable non-transitory medium for one or more audio signal recognition servers which are adapted to receive, from a capture device, feature data that corresponds to a captured audio sample that is less than an entire reference audio work, the at least one computer readable medium having instructions which, when read by one or more processing structures of the one or more recognition servers, cause the one or more processing structures to:
-
store, in a memory, a library comprising (i) a first plurality of reference feature data sets which correspond to a first recorded reference audio work, and (ii) a second plurality of reference feature data sets which correspond to a second recorded reference audio work, each recorded reference audio work being longer than the portion of the captured audio sample; receive a first reference input audio signal corresponding to the first recorded reference audio work; separate the received first reference input audio signal into a first plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the first plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the first plurality of frequency bands; compute the first plurality of reference feature data sets, which correspond to spectrally distinct portions of the first plurality of frequency bands of the first received reference input audio signal; store in the memory the first plurality of reference feature data sets which correspond to the first reference input audio signal; receive a second reference input audio signal corresponding to the second recorded reference audio work; separate the received second reference input audio signal into a second plurality of frequency bands which have different frequencies;
a frequency bandwidth of a lower frequency band of the second plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the second plurality of frequency bandscompute the second plurality of reference feature data sets, which correspond to spectrally distinct portions of the second plurality of frequency bands of the second received reference input audio signal; store in the memory the second plurality of reference feature data sets which correspond to the second reference input audio signal; compare the sample feature data received from the capture device with the stored first and second pluralities of reference feature data sets; and generate a recognition signal in response to the received sample feature data matching at least one feature data set of the stored first and second pluralities of reference feature data sets.
-
-
16. At least one computer readable non-transitory medium for one or more audio signal recognition servers which are adapted to receive, from a capture device, feature data that corresponds to a captured audio sample that is less than an entire reference audio work, the at least one computer readable medium having instructions which, when read by one or more processing structures of the one or more recognition servers, cause the one or more processing structures to:
-
store, in a memory, a library comprising (i) a first plurality of reference feature data sets which correspond to a first recorded reference audio work, and (ii) a second plurality of reference feature data sets which correspond to a second recorded reference audio work, each recorded reference audio work being longer than the captured audio sample, the first plurality of reference feature data sets corresponding to a first plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the first plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the first plurality of frequency bands, the first plurality of reference feature data sets including features which correspond to spectrally distinct portions of the first plurality of frequency bands, the second plurality of reference feature data sets corresponding to a second plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the second plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the second plurality of frequency bands, the second plurality of reference feature data sets including features which correspond to spectrally distinct portions of the second plurality of frequency bands; compare the sample feature data received by said interface structure with the stored first and second pluralities of reference feature data sets; and generate a recognition signal in response to the received sample feature data matching at least one feature data set of the stored first and second pluralities of reference feature data sets. - View Dependent Claims (17)
-
-
18. At least one computer readable non-transitory medium for one or more audio signal recognition servers which are adapted to receive, from a capture device, feature data that corresponds to a captured audio sample that is less than an entire reference audio work, the at least one computer readable medium having instructions which, when read by one or more processing structures of the one or more recognition servers, cause the one or more processing structures to:
-
receive, at the one or more recognition servers;
(i) a first reference input audio signal corresponding to the a first recorded reference audio work, and (ii) a second reference input audio signal corresponding to a second recorded reference audio work;storing, in a memory, a library comprising (i) a first plurality of reference feature data sets which correspond to the first recorded reference audio work, and (ii) a second plurality of reference feature data sets which correspond to the second recorded reference audio work, each recorded reference audio work being longer than a portion of the captured audio sample; separate the received first reference input audio signal into a first plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the first plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the first plurality of frequency bands; compute the first plurality of reference feature data sets, which correspond to spectrally distinct portions of the first plurality of frequency bands of the first received reference input audio signal; store in the memory the first plurality of reference feature data sets which correspond to the first reference input audio signal; separate the received second reference input audio signal into a second plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the second plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the second plurality of frequency bands; compute the second plurality of reference feature data sets, which correspond to spectrally distinct portions of the second plurality of frequency bands of the second received reference input audio signal; and store in the memory the second plurality of reference feature data sets which correspond to the second reference input audio signal. - View Dependent Claims (19)
-
Specification