Audio identification system and method
DCFirst Claim
1. Apparatus for recognizing free-field audio signals, comprising:
- a hand-held device having a microphone to capture free-field audio signals, the captured free field audio signals corresponding to a portion of a recorded audio work that is less than an entire recorded audio work;
a local processor, coupleable to said hand-held device, to transmit audio signal features corresponding to the captured free-field audio signals to a recognition site;
one of said hand-held device and said local processor including circuitry which extracts a time series of spectrally distinct audio signal features from any portion of the captured free-field audio signals, the time series of spectrally distinct audio signal features corresponding to the portion of a recorded audio work that is less than the entire recorded audio work; and
a recognition processor and a recognition memory at the recognition site, said recognition memory storing data corresponding to a plurality of audio templates, said recognition processor correlating the audio signal features transmitted from said local processor with at least one of the audio templates stored in said recognition processor memory, said recognition processor providing a recognition signal based on the correlation.
1 Assignment
Litigations
1 Petition
Accused Products
Abstract
A method and system for direct audio capture and identification of the captured audio. A user may then be offered the opportunity to purchase recordings directly over the Internet or similar outlet. The system preferably includes one or more user-carried portable audio capture devices that employ a microphone, analog to digital converter, signal processor, and memory to store samples of ambient audio or audio features calculated from the audio. Users activate their capture devices when they hear a recording that they would like to identify or purchase. Later, the user may connect the capture device to a personal computer to transfer the audio samples or audio feature samples to an Internet site for identification. The Internet site preferably uses automatic pattern recognition techniques to identify the captured samples from a library of recordings offered for sale. The user can then verify that the sample is from the desired recording and place an order online. The pattern recognition process uses features of the audio itself and does not require the presence of artificial codes or watermarks. Audio to be identified can be from any source, including radio and television broadcasts or recordings that are played locally.
450 Citations
122 Claims
-
1. Apparatus for recognizing free-field audio signals, comprising:
-
a hand-held device having a microphone to capture free-field audio signals, the captured free field audio signals corresponding to a portion of a recorded audio work that is less than an entire recorded audio work; a local processor, coupleable to said hand-held device, to transmit audio signal features corresponding to the captured free-field audio signals to a recognition site; one of said hand-held device and said local processor including circuitry which extracts a time series of spectrally distinct audio signal features from any portion of the captured free-field audio signals, the time series of spectrally distinct audio signal features corresponding to the portion of a recorded audio work that is less than the entire recorded audio work; and a recognition processor and a recognition memory at the recognition site, said recognition memory storing data corresponding to a plurality of audio templates, said recognition processor correlating the audio signal features transmitted from said local processor with at least one of the audio templates stored in said recognition processor memory, said recognition processor providing a recognition signal based on the correlation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A hand-held device for capturing audio signals which correspond to a portion of a recorded audio work that is less than the entire recorded audio work, and providing to a recognition processor extracted feature signals that correspond to the captured audio signals, the recognition processor comparing the received extracted feature signals to a plurality of stored audio works, the hand-held device comprising:
-
a microphone configured to receive analog audio signals which correspond to the portion of the recorded audio work that is less than the entire recorded audio work; an A/D converter configured to convert the received analog audio signals to digital audio signals which correspond to the portion of the recorded audio work that is less than the entire recorded audio work; a signal processor configured to extract a plurality of distinct feature signals from any random portion of the digital audio signals, the extracted feature signals corresponding to the portion of the recorded audio work that is less than the entire recorded audio work; a memory configured to store the extracted feature signals; and output circuitry configured to output the stored extracted feature signals from said hand-held device. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A local processor for an audio signal recognition system having a hand-held device and a recognition server, the hand-held device capturing audio signals and downloading them to the local processor, the recognition server (i) receiving from the local processor extracted feature signals that correspond to the captured audio signals and (ii) comparing received extracted feature signals to a plurality of stored recorded entire audio works, the local processor comprising:
-
an interface for receiving the captured audio signals from the hand-held device, the captured audio signals corresponding to a portion of a recorded audio work that is less than the entire recorded audio work; a processor for forming extracted feature signals corresponding to the received captured audio signals, the extracted feature signals corresponding to the portion of the recorded audio work that is less than the entire recorded audio work, the extracted feature signals also corresponding to different frequency bands of the captured audio signals; a memory for storing the extracted feature signals; and an activation device which causes the stored extracted feature signals to be sent to the recognition server. - View Dependent Claims (19, 20)
-
-
21. A recognition server for an audio signal recognition system having a hand-held device and a local processor, the hand-held device capturing audio signals and transmitting to the local processor signals which correspond to the captured audio signals, the local processor transmitting extracted feature signals to the recognition server, the recognition server comprising:
-
an interface receiving the extracted feature signals from the local processor, the extracted feature signals corresponding to a portion of a recorded audio work that is less than the entire recorded audio work; a memory storing a plurality of feature signal sets, each set corresponding to an entire recorded audio work; and processing circuitry which (i) receives an input audio stream and separates the received audio stream into a plurality of different frequency bands;
(ii) forms a plurality of feature time series waveforms which correspond to spectrally distinct portions of the received input audio stream;
(iii) stores in the memory the plurality of feature signal sets which correspond to the feature time series waveforms, (iv) compares the extracted feature signals received by said interface with the stored feature signal sets, and (v) provides a recognition signal when the received extracted feature signals match at least one of the stored feature signal sets. - View Dependent Claims (22, 23, 24, 25)
-
-
26. A method for recognizing an input audio data stream, comprising the steps of:
-
receiving the input audio data stream with a hand held device; with the hand held device, randomly selecting any one portion of the received audio data stream, the one portion of the received audio data stream comprising a portion of a recorded audio work that is less than the entire recorded audio work; forming a first plurality of feature time series waveforms corresponding to spectrally distinct portions of the received audio data stream; transmitting to a recognition site the first plurality of feature time series waveforms; storing a second plurality of feature time series waveforms at the recognition site, the second plurality of feature time series waveforms corresponding to the entire recorded audio work; at the recognition site, correlating the first plurality of feature time series waveforms with the second plurality of feature time series waveforms; and designating a recognition when a correlation probability value between the first plurality of feature time series waveforms and one of the second plurality of feature time series waveforms reaches a predetermined value.
-
-
27. A method for recognizing free-field audio signals, comprising the steps of:
-
capturing free-field audio signals with a hand-held device having a microphone, the captured free-field audio signals corresponding to a portion of a recorded audio work that is less than an entire recorded audio work; transmitting signals corresponding to the captured free-field audio signals to a local processor; transmitting from the local processor to a recognition site, audio signal features which correspond to the signals transmitted from the hand-held device and correspond to the portion of a recorded audio work that is less than an entire recorded audio work; one of the hand-held device and the local processor extracting a time series of spectrally distinct audio signal features from the captured free-field audio signals; storing data corresponding to a plurality of audio templates in a memory at the recognition site, each audio template corresponding to substantially an entire recorded audio work; correlating the audio signal features transmitted from the local processor with at least one of the audio templates stored in the recognition site memory, using a recognition processor; and providing a recognition signal based on the correlation. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34)
-
-
35. A method for a hand-held device to capture audio signals to be transmitted from a network computer to a recognition site, the recognition site having a processor which receives extracted feature signals that correspond to the captured audio signals and compares them to a plurality of stored audio information, each of the plurality of stored audio information corresponding to substantially an entire audio work, the method comprising the steps of:
-
receiving analog audio signals with a microphone; A/D converting the received analog audio signals to digital audio signals, the digital audio signals corresponding to a portion of a recorded audio work that is less than the entire recorded audio work; extracting spectrally distinct feature signals from the digital audio signals with a signal processor; storing the extracted feature signals in a memory; and transmitting the stored extracted feature signals to the network computer through a terminal. - View Dependent Claims (36, 37, 38, 39, 40, 41, 42)
-
-
43. A local processor method in an audio signal recognition system having a hand-held device and a recognition server, the hand-held device capturing audio signals and downloading them to the local processor, the captured audio signals corresponding to a portion of a recorded audio work that is less then the entire audio work, the recognition server (i) receiving from the local processor extracted feature signals that correspond to the captured audio signals and (ii) comparing received extracted feature signals to a plurality of stored audio information, each of the plurality of stored audio information corresponding to substantially an entire recorded audio work, the method comprising the steps of:
-
receiving the captured audio signals from the hand-held device through an interface, the received audio signals corresponding to the portion of a recorded audio work that is less then the entire audio work; forming extracted feature signals corresponding to the received audio signals with a processor, the extracted feature signals corresponding to different frequency bands of the captured audio signals; storing the extracted feature signals in a memory; and causing the stored extracted feature signals to be sent to the recognition server. - View Dependent Claims (44, 45)
-
-
46. A recognition server method in an audio signal recognition system having a hand-held device and a local processor, the hand-held device capturing audio signals and transmitting to the local processor signals which correspond to the captured audio signals, the local processor transmitting extracted feature signals to the recognition server, the method comprising the steps of:
-
receiving the extracted feature signals from the local server through an interface, the received extracted feature signals corresponding to a portion of a recorded audio work that is less than an entire recorded audio work; storing a plurality of feature signal sets in a memory, each set corresponding to an entire recorded audio work; and with processing circuitry (i) receiving an input audio stream and separating the received audio stream into a plurality of different frequency bands;
(ii) forming a plurality of feature time series waveforms which correspond to spectrally distinct portions of the received input audio stream;
(iii) storing in the memory the plurality of feature signal sets which correspond to the feature time series waveforms, (iv) comparing the received feature signals, which correspond to the portion of a recorded audio work that is less than the entire audio work, with the stored feature signal sets, each of which corresponds to an entire audio work, and (v) providing a recognition signal when the received feature signals match at least one of the stored feature signal sets. - View Dependent Claims (47, 48, 49, 50)
-
-
51. Computer readable storage media storing code which causes one or more processors to carry out a method for recognizing an input audio data stream, the code causing the one or more processors to perform the steps of:
-
receiving the input data stream with a hand held device; with the hand held device, randomly selecting any one portion of the received audio data stream, the selected portion being less than an entire recorded audio work; forming a first plurality of feature time series waveforms corresponding to spectrally distinct portions of the selected portion; transmitting to a recognition site the first plurality of feature time series waveforms; storing a second plurality of feature time series waveforms at the recognition site, each of the second plurality of time series waveforms corresponding to substantially an entire recorded audio work; at the recognition site, correlating the first plurality of feature time series waveforms with the second plurality of feature time series waveforms; and designating a recognition when a correlation probability value between the first plurality of feature time series waveforms and one of the second plurality of feature time series waveforms reaches a predetermined value.
-
-
52. Computer readable storage media storing code which causes one or more processors to carry out a method for recognizing free-field audio signals, the code causing the one or more processors to perform the steps of:
-
capturing free-field audio signals with a hand-held device having a microphone; transmitting signals corresponding to the captured free-field audio signals to a local processor, the transmitted signals corresponding to a portion of a recorded audio work that is less than an entire recorded audio work; transmitting from the local processor to a recognition site, audio signal features which correspond to the signals transmitted from the hand-held device, the transmitted audio signal features corresponding to the portion of a recorded audio work that is less than an entire recorded audio work; at least one of the hand-held device and the local processor extracting a time series of spectrally distinct audio signal features from the captured free-field audio signals; storing data corresponding to a plurality of audio templates in a memory at the recognition site, each of the audio templates corresponding to substantially an entire recorded audio work; correlating the audio signal features transmitted from the local processor with at least one of the audio templates stored in the recognition site memory, using a recognition processor; and providing a recognition signal based on the correlation. - View Dependent Claims (53, 54, 55, 56, 57, 58)
-
-
59. Computer readable storage media storing code which causes a hand-held device to capture audio signals to be transmitted from a network computer to a recognition site, the recognition site having a processor which receives extracted feature signals that correspond to the captured audio signals and compares them to a plurality of stored entire recorded audio works, the code causing the hand-held device to perform the steps of:
-
receiving analog audio signals with a microphone; A/D converting the received analog audio signals to digital audio signals; extracting spectrally distinct feature signals from the digital audio signals with a signal processor, the extracted feature signals corresponding to a portion of a recorded audio work that is less than an entire audio work; storing the extracted feature signals in a memory; and transmitting the stored extracted feature signals to the network computer through a terminal. - View Dependent Claims (60, 61, 62, 63, 64)
-
-
65. Computer readable storage media storing code which causes a local processor to transmit extracted feature signals to a recognition server, in an audio signal recognition system having a hand-held device and the recognition server, the hand-held device capturing audio signals and downloading them to the local processor, the recognition server (i) receiving from the local processor extracted feature signals that correspond to the captured audio signals and (ii) comparing received extracted feature signals to a plurality of stored recorded audio works, each stored recorded audio work comprising an entire recorded audio work, the code causing the local processor to perform the steps of:
-
receiving the captured audio signals from the hand-held device through an interface; forming extracted feature signals corresponding to the received captured audio signals with a processor, the extracted feature signals corresponding to different frequency bands of the captured audio signals, the extracted feature signals corresponding to a portion of a recorded audio work that is less than an entire recorded audio work; storing the extracted feature signals in a memory; and causing the stored extracted feature signals to be sent to the recognition server. - View Dependent Claims (66, 67, 68)
-
-
69. Computer readable storage media storing code which causes a recognition server to recognize signals in an audio signal recognition system having a hand-held device and a local processor, the hand-held device capturing audio signals and transmitting to the local processor signals which correspond to the captured audio signals, the local processor transmitting extracted feature signals to the recognition server, the code causing the recognition server to perform the steps of:
-
receiving the extracted feature signals from the local processor through an interface, the extracted feature signals corresponding to a portion of a recorded audio work that is less than an entire recorded audio work; storing a plurality of feature signal sets in a memory, each set corresponding to an entire recorded audio work; and with processing circuitry (i) receiving an input audio stream and separate the received audio stream into a plurality of different frequency bands;
(ii) forming a plurality of feature time series waveforms which correspond to spectrally distinct portions of the received input audio stream;
(iii) storing in the memory the plurality of feature signal sets which correspond to the feature time series waveforms, each of the plurality of feature signal sets corresponding to an entire recorded audio work, (iv) comparing the received extracted feature signals with the stored feature signal sets, and (v) providing a recognition signal when the received extracted feature signals match at least one of the stored feature signal sets. - View Dependent Claims (70, 71, 72)
-
-
73. Apparatus for recognizing free-field audio signals, comprising:
-
a hand-held device having a microphone to capture free-field audio signals; a local transmitter, integral to said hand-held device, to transmit a signal corresponding to the captured free-field audio signals to a recognition site, the transmitted signal corresponding to a portion of a recorded audio work that is less than an entire recorded audio work; said local transmitter further comprising an analog-to-digital converter to convert the free-field audio signal to a digital format; and a recognition processor and a recognition memory at the recognition site, said recognition memory storing data corresponding to a plurality of audio templates, each audio template corresponding to substantially an entire recorded audio work, said recognition processor comparing the signal transmitted from said local transmitter with at least one of the audio templates stored in said recognition processor memory, said recognition processor providing a recognition signal based on the comparison. - View Dependent Claims (74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88)
-
-
89. A method of identifying information associated with an audio signal comprising the steps of:
-
establishing a connection between a hand-held device and a recognition site; transmitting a sample signal corresponding to the audio signal over said connection, the sample signal corresponding to a portion of a recorded audio work that is less than an entire recorded audio work; creating a unique audio template from said sample signal by applying a predetermined algorithm whereby said unique audio template is smaller than said sample signal; comparing said unique audio template with a plurality of audio signatures stored on said recognition site, said plurality of audio signatures being created by application of said predetermined algorithm to a plurality of predetermined source signals, each of said plurality of audio signatures corresponding to substantially an entire recorded audio work; determining the identifying information associated with the audio signal based on the comparison of said unique audio template with said plurality of audio signatures; and transmitting the identifying information to said hand-held device over said connection. - View Dependent Claims (90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105)
-
-
106. A hand held device for the transmission of a signal corresponding to a free-field audio signal to a recognition site comprising a recognition processor and a recognition memory, the recognition memory adapted to store data corresponding to a plurality of audio templates where each audio template corresponds to substantially an entire recorded audio work, and the recognition processor adapted to compare the signal to at least one of the audio templates, said hand held device comprising:
-
a receiving means for receipt of the free-field signal; an analog to digital converter to convert the free-field audio signal to a digital format, the captured free-field audio signal corresponding to a portion of a recorded audio work that is less than the entire recorded audio work; and
,a transmitter, integral to said hand-held device, to transmit said signal corresponding to the captured free-field audio signals to the recognition site. - View Dependent Claims (107, 108, 109, 110, 111, 112)
-
-
113. A recognition site adapted to process signals corresponding to free field audio signals transmitted from a hand-held device comprising:
-
a receiving means for receipt of a signal from the hand-held device, the received signal corresponding to a portion of a recorded audio work that is less than an entire audio work; a memory means for storing a plurality of audio templates, each audio template corresponding to substantially an entire recorded audio work; a processing means for comparison of said received signal to at least one audio template; and a signal generation means for transmission of a signal to the hand held device corresponding to the comparison performed by said processing means. - View Dependent Claims (114, 115, 116, 117, 118, 119, 120, 121, 122)
-
Specification