Audio identification system and method
First Claim
1. Apparatus for recognizing free-field audio signals, comprising:
- a hand-held device having a microphone to capture free-field audio signals;
a local processor, coupleable to said hand-held device, to transmit audio signal features corresponding to the captured free-field audio signals to a recognition site;
one of said hand-held device and said local processor including circuitry which extracts a time series of spectrally distinct audio signal features from the captured free-field audio signals; and
a recognition processor and a recognition memory at the recognition site, said recognition memory storing data corresponding to a plurality of audio templates, said recognition processor correlating the audio signal features transmitted from said local processor with at least one of the audio templates stored in said recognition processor memory, said recognition processor providing a recognition signal based on the correlation.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for direct audio capture and identification of the captured audio. A user may then be offered the opportunity to purchase recordings directly over the Internet or similar outlet. The system preferably includes one or more user-carried portable audio capture devices that employ a microphone, analog to digital converter, signal processor, and memory to store samples of ambient audio or audio features calculated from the audio. Users activate their capture devices when they hear a recording that they would like to identify or purchase. Later, the user may connect the capture device to a personal computer to transfer the audio samples or audio feature samples to an Internet site for identification. The Internet site preferably uses automatic pattern recognition techniques to identify the captured samples from a library of recordings offered for sale. The user can then verify that the sample is from the desired recording and place an order online. The pattern recognition process uses features of the audio itself and does not require the presence of artificial codes or watermarks. Audio to be identified can be from any source, including radio and television broadcasts or recordings that are played locally.
361 Citations
85 Claims
-
1. Apparatus for recognizing free-field audio signals, comprising:
-
a hand-held device having a microphone to capture free-field audio signals;
a local processor, coupleable to said hand-held device, to transmit audio signal features corresponding to the captured free-field audio signals to a recognition site;
one of said hand-held device and said local processor including circuitry which extracts a time series of spectrally distinct audio signal features from the captured free-field audio signals; and
a recognition processor and a recognition memory at the recognition site, said recognition memory storing data corresponding to a plurality of audio templates, said recognition processor correlating the audio signal features transmitted from said local processor with at least one of the audio templates stored in said recognition processor memory, said recognition processor providing a recognition signal based on the correlation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 69)
-
-
10. A hand-held device for capturing audio signals to be transmitted from a network computer to a recognition site, the recognition site having a processor which receives extracted feature signals that correspond to the captured audio signals and compares them to a plurality of stored song information, the hand-held device comprising:
-
a microphone receiving analog audio signals;
an A/D converter converting the received analog audio signals to digital audio signals;
a signal processor extracting spectrally distinct feature signals from the digital audio signals;
a memory storing the extracted feature signals; and
a terminal transmitting the stored extracted feature signals to the network computer. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 19, 20, 21)
-
-
18. A local processor for an audio signal recognition system having a hand-held device and a recognition server, the hand-held device capturing audio signals and downloading them to the local processor, the recognition server (i) receiving from the local processor extracted feature signals that correspond to the captured audio signals and (ii) comparing received extracted feature signals to a plurality of stored song information, the local processor comprising:
-
an interface for receiving the captured audio signals from the hand-held device;
a processor for forming extracted feature signals corresponding to the received captured audio signals, the extracted feature signals corresponding to different frequency bands of the captured audio signals;
a memory for storing the extracted feature signals; and
an activation device which causes the stored extracted feature signals to be sent to the recognition server.
-
-
22. A recognition server for an audio signal recognition system having a hand-held device and a local processor, the hand-held device capturing audio signals and transmitting to the local processor signals which correspond to the captured audio signals, the local processor transmitting extracted feature signals to the recognition server, the recognition server comprising:
-
an interface receiving the extracted feature signals from the local server;
a memory storing a plurality of feature signal sets, each set corresponding to an entire audio work; and
processing circuitry which (i) receives an input audio stream and separates the received audio stream into a plurality of different frequency bands;
(ii) forms a plurality of feature time series waveforms which correspond to spectrally distinct portions of the received input audio stream;
(iii) stores in the memory the plurality of feature signal sets which correspond to the feature time series waveforms, (iv) compares the received feature signals with the stored feature signal sets, and (v) provides a recognition signal when the received feature signals match at least one of the stored feature signal sets. - View Dependent Claims (23, 24, 25, 26, 27, 29, 30)
-
-
28. A hand-held music capture device, comprising:
-
a microphone which receives an arbitrary portion of an analog audio signal;
an analog-to-digital converter to convert the received portion of the audio signal into a digital signal;
a signal processor which receives a fixed-time-portion of the digital signal and signal processes same into a digital time series representing the voltage waveform of the captured audio signal;
a memory which stores the processed fixed-time portion of the digital signal that corresponds to less than a complete audio work; and
a terminal which is connectable to a computer device and transmits the stored portion of the digital signal to the computer device.
-
-
31. A portable device to capture and store samples of free-field audio signals and store these samples for later identification, comprising:
-
a microphone to receive an audio waveform;
an analog to digital converter to convert the received audio waveform into a digital time series;
a trigger to allow the user to manually initiate audio waveform reception;
a signal processor to extract and compress spectrally distinct features of the received audio waveform;
a memory to store the compressed spectrally distinct features; and
an interface to allow transfer of the stored features to recognition equipment.
-
-
32. A method for recognizing an input data stream, comprises the steps of:
-
receiving the input data stream with a hand held device;
with the hand held device, randomly selecting any one portion of the received data stream;
forming a first plurality of feature time series waveforms corresponding to spectrally distinct portions of the received data stream;
transmitting to a recognition site the first plurality of feature time series waveforms;
storing a second plurality of feature time series waveforms at the recognition site;
at the recognition site, correlating the first plurality of feature time series waveforms with the second plurality of feature time series waveforms; and
designating a recognition when a correlation probability value between the first plurality of feature time series waveforms and one of the second plurality of feature time series waveforms reaches a predetermined value.
-
-
33. A method for recognizing free-field audio signals, comprising the steps of:
-
capturing free-field audio signals with a hand-held device having a microphone;
transmitting signals corresponding to the captured free-field audio signals to a local processor;
transmitting from the local processor to a recognition site, audio signal features which correspond to the signals transmitted from the hand-held device;
one of the hand-held device and the local processor extracting a time series of spectrally distinct audio signal features from the captured free-field audio signals;
storing data corresponding to a plurality of audio templates in a memory at the recognition site;
correlating the audio signal features transmitted from the local processor with at least one of the audio templates stored in the recognition site memory, using a recognition processor; and
providing a recognition signal based on the correlation. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41)
-
-
42. A method for a hand-held device to capture audio signals to be transmitted from a network computer to a recognition site, the recognition site having a processor which receives extracted feature signals that correspond to the captured audio signals and compares them to a plurality of stored song information, the method comprising the steps of:
-
receiving analog audio signals with a microphone;
A/D converting the received analog audio signals to digital audio signals;
extracting spectrally distinct feature signals from the digital audio signals with a signal processor;
storing the extracted feature signals in a memory; and
transmitting the stored extracted feature signals to the network computer through a terminal. - View Dependent Claims (43, 44, 45, 46, 47, 48, 49)
-
-
50. A local processor method in an audio signal recognition system having a handheld device and a recognition server, the hand-held device capturing audio signals and downloading them to the local processor, the recognition server (i) receiving from the local processor extracted feature signals that correspond to the captured audio signals and (ii) comparing received extracted feature signals to a plurality of stored song information, the method comprising the steps of:
-
receiving the captured audio signals from the hand-held device through an interface;
forming extracted feature signals corresponding to the received captured audio signals with a processor, the extracted feature signals corresponding to different frequency bands of the captured audio signals;
storing the extracted feature signals in a memory; and
causing the stored extracted feature signals to be sent to the recognition server. - View Dependent Claims (51, 52, 53)
-
-
54. A recognition server method in an audio signal recognition system having a hand-held device and a local processor, the hand-held device capturing audio signals and transmitting to the local processor signals which correspond to the captured audio signals, the local processor transmitting extracted feature signals to the recognition server, the method comprising the steps of:
-
receiving the extracted feature signals from the local server through an interface;
storing a plurality of feature signal sets in a memory, each set corresponding to an entire audio work; and
with processing circuitry (i) receiving an input audio stream and separates the received audio stream into a plurality of different frequency bands;
(ii) forming a plurality of feature time series waveforms which correspond to spectrally distinct portions of the received input audio stream;
(iii) storing in the memory the plurality of feature signal sets which correspond to the feature time series waveforms, (iv) comparing the received feature signals with the stored feature signal sets, and (v) providing a recognition signal when the received feature signals match at least one of the stored feature signal sets. - View Dependent Claims (55, 56, 57, 58, 59)
-
-
60. Computer readable storage media storing code which causes one or more processors to carry out a method for recognizing an input data stream, the code causing the one or more processors to perform the steps of:
-
receiving the input data stream with a hand held device;
with the hand held device, randomly selecting any one portion of the received data stream;
forming a first plurality of feature time series waveforms corresponding to spectrally distinct portions of the received data stream;
transmitting to a recognition site the first plurality of feature time series waveforms;
storing a second plurality of feature time series waveforms at the recognition site;
at the recognition site, correlating the first plurality of feature time series waveforms with the second plurality of feature time series waveforms; and
designating a recognition when a correlation probability value between the first plurality of feature time series waveforms and one of the second plurality of feature time series waveforms reaches a predetermined value.
-
-
61. Computer readable storage media storing code which causes one or more processors to carry out a method for recognizing free-field audio signals, the code causing the one or more processors to perform the steps of:
-
capturing free-field audio signals with a hand-held device having a microphone;
transmitting signals corresponding to the captured free-field audio signals to a local processor;
transmitting from the local processor to a recognition site, audio signal features which correspond to the signals transmitted from the hand-held device;
at least one of the hand-held device and the local processor extracting a time series of spectrally distinct audio signal features from the captured free-field audio signals;
storing data corresponding to a plurality of audio templates in a memory at the recognition site;
correlating the audio signal features transmitted from the local processor with at least one of the audio templates stored in the recognition site memory, using a recognition processor; and
providing a recognition signal based on the correlation. - View Dependent Claims (62, 63, 64, 65, 66, 67, 68)
-
-
70. Computer readable storage media storing code which causes a hand-held device to capture audio signals to be transmitted from a network computer to a recognition site, the recognition site having a processor which receives extracted feature signals that correspond to the captured audio signals and compares them to a plurality of stored song information, the code causing the hand-held device to perform the steps of:
-
receiving analog audio signals with a microphone;
A/D converting the received analog audio signals to digital audio signals;
extracting spectrally distinct feature signals from the digital audio signals with a signal processor;
storing the extracted feature signals in a memory; and
transmitting the stored extracted feature signals to the network computer through a terminal. - View Dependent Claims (71, 72, 73, 74, 75)
-
-
76. Computer readable storage media storing code which causes a local processor to transmit extracted feature signals to a recognition server, in an audio signal recognition system having a hand-held device and the recognition server, the hand-held device capturing audio signals and downloading them to the local processor, the recognition server (i) receiving from the local processor extracted feature signals that correspond to the captured audio signals and (ii) comparing received extracted feature signals to a plurality of stored song information, the code causing the local processor to perform the steps of:
-
receiving the captured audio signals from the hand-held device through an interface;
forming extracted feature signals corresponding to the received captured audio signals with a processor, the extracted feature signals corresponding to different frequency bands of the captured audio signals;
storing the extracted feature signals in a memory; and
causing the stored extracted feature signals to be sent to the recognition server. - View Dependent Claims (77, 78, 79)
-
-
80. Computer readable storage media storing code which causes a recognition server to recognize signals in an audio signal recognition system having a hand-held device and a local processor, the hand-held device capturing audio signals and transmitting to the local processor signals which correspond to the captured audio signals, the local processor transmitting extracted feature signals to the recognition server, the code causing the recognition server to perform the steps of:
-
receiving the extracted feature signals from the local server through an interface;
storing a plurality of feature signal sets in a memory, each set corresponding to an entire audio work; and
with processing circuitry (i) receiving an input audio stream and separates the received audio stream into a plurality of different frequency bands;
(ii) forming a plurality of feature time series waveforms which correspond to spectrally distinct portions of the received input audio stream;
(iii) storing in the memory the plurality of feature signal sets which correspond to the feature time series waveforms, (iv) comparing the received feature signals with the stored feature signal sets, and (v) providing a recognition signal when the received feature signals match at least one of the stored feature signal sets. - View Dependent Claims (81, 82, 83, 85)
-
-
84. A business method of recognizing free-field audio signals, comprising the steps of:
-
capturing free-field audio signals with a hand-held device having a microphone;
transmitting signals corresponding to the captured free-field audio signals to a local processor;
transmitting from the local processor to a recognition site, audio signal features which correspond to the signals transmitted from the hand-held device;
at least one of the hand-held device and the local processor extracting a time series of spectrally distinct audio signal features from the captured free-field audio signals;
storing data corresponding to a plurality of audio templates in a memory at the recognition site;
correlating the audio signal features transmitted from the local processor with at least one of the audio templates stored in the recognition site memory, using a recognition processor;
providing a recognition signal based on the correlation;
forwarding the recognition signal to a user at the local processor, together with instruction for the purchase of an audio work which corresponds to the at least one of the audio templates stored in the recognition site memory.
-
Specification