Method and apparatus for automatically recognizing input audio and/or video streams

US 9,715,626 B2
Filed: 12/02/2010
Issued: 07/25/2017
Est. Priority Date: 09/21/1999
Status: Expired due to Fees

First Claim

Patent Images

1. Audio signal recognition server apparatus adapted to receive, from a capture device, feature data that corresponds to a captured audio sample that is less than an entire reference audio work, the recognition server apparatus comprising:

interface structure configured to receive the sample feature data from the capture device;

a memory storing a library comprising (i) a first plurality of reference feature data sets which correspond to a first recorded reference audio work, and (ii) a second plurality of reference feature data sets which correspond to a second recorded reference audio work, each recorded reference audio work being longer than the captured audio sample; and

server processing structure configured to;

receive a first reference input audio signal corresponding to the first recorded reference audio work;

separate the received first reference input audio signal into a first plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the first plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the first plurality of frequency bands;

compute the first plurality of reference feature data sets, which correspond to spectrally distinct portions of the first plurality of frequency bands of the first received reference input audio signal, this computing comprising performing envelope extraction on the first plurality of frequency bands to provide low-bandwidth amplitude measurements of each of the first plurality of frequency bands to provide the first plurality of reference feature data sets;

store in the memory the first plurality of reference feature data sets which correspond to the first reference input audio signal;

receive a second reference input audio signal corresponding to the second recorded reference audio work;

separate the received second reference input audio signal into a second plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the second plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the second plurality of frequency bands;

compute the second plurality of reference feature data sets, which correspond to spectrally distinct portions of the second plurality of frequency bands of the second received reference input audio signal, this computing comprising performing envelope extraction on the second plurality of frequency bands to provide low-bandwidth amplitude measurements of each of the second plurality of frequency bands to provide the second plurality of reference feature data sets;

store in the memory the second plurality of reference feature data sets which correspond to the second reference input audio signal;

compare the sample feature data received by said interface structure with the stored first and second pluralities of reference feature data sets; and

generate a recognition signal in response to the received sample feature data matching at least one reference feature data set of the stored first and second pluralities of reference feature data sets.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for the automatic identification of audio, video, multimedia, and/or data recordings based on immutable characteristics of these works. The invention does not require the insertion of identifying codes or signals into the recording. This allows the system to be used to identify existing recordings that have not been through a coding process at the time that they were generated. Instead, each work to be recognized is “played” into the system where it is subjected to an automatic signal analysis process that locates salient features and computes a statistical representation of these properties. These features are then stored as patterns for later recognition of live input signal streams. A different set of features is derived for each audio or video work to be identified and stored. During real-time monitoring of a signal stream, a similar automatic signal analysis process is carried out, and many features are computed for comparison with the patterns stored in a large feature database. For each particular pattern stored in the database, only the relevant characteristics are compared with the real-time feature set. Preferably, during analysis and generation of reference patterns, data are extracted from all time intervals of a recording. This allows a work to be recognized from a single sample taken from any part of the recording.

Citations

19 Claims

1. Audio signal recognition server apparatus adapted to receive, from a capture device, feature data that corresponds to a captured audio sample that is less than an entire reference audio work, the recognition server apparatus comprising:
- interface structure configured to receive the sample feature data from the capture device;
  
  a memory storing a library comprising (i) a first plurality of reference feature data sets which correspond to a first recorded reference audio work, and (ii) a second plurality of reference feature data sets which correspond to a second recorded reference audio work, each recorded reference audio work being longer than the captured audio sample; and
  
  server processing structure configured to;
  
  receive a first reference input audio signal corresponding to the first recorded reference audio work;
  
  separate the received first reference input audio signal into a first plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the first plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the first plurality of frequency bands;
  
  compute the first plurality of reference feature data sets, which correspond to spectrally distinct portions of the first plurality of frequency bands of the first received reference input audio signal, this computing comprising performing envelope extraction on the first plurality of frequency bands to provide low-bandwidth amplitude measurements of each of the first plurality of frequency bands to provide the first plurality of reference feature data sets;
  
  store in the memory the first plurality of reference feature data sets which correspond to the first reference input audio signal;
  
  receive a second reference input audio signal corresponding to the second recorded reference audio work;
  
  separate the received second reference input audio signal into a second plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the second plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the second plurality of frequency bands;
  
  compute the second plurality of reference feature data sets, which correspond to spectrally distinct portions of the second plurality of frequency bands of the second received reference input audio signal, this computing comprising performing envelope extraction on the second plurality of frequency bands to provide low-bandwidth amplitude measurements of each of the second plurality of frequency bands to provide the second plurality of reference feature data sets;
  
  store in the memory the second plurality of reference feature data sets which correspond to the second reference input audio signal;
  
  compare the sample feature data received by said interface structure with the stored first and second pluralities of reference feature data sets; and
  
  generate a recognition signal in response to the received sample feature data matching at least one reference feature data set of the stored first and second pluralities of reference feature data sets.
- View Dependent Claims (2, 3, 4, 5)
- - 2. Apparatus according to claim 1, further comprising the capture device.
  - 3. Apparatus according to claim 2, wherein the capture device comprises a non-portable device.
  - 4. Apparatus according to claim 1, wherein said interface structure is configured to receive the sample feature data at least in part via the Internet.
  - 5. Apparatus according to claim 1, wherein the recognition signal identifies a video work.

6. Audio signal recognition server apparatus adapted to receive, from a capture device, feature data that corresponds to a captured audio sample that is less than an entire reference audio work, the recognition server apparatus comprising:
- interface structure configured to receive the sample feature data from the capture device;
  
  a memory storing a library comprising (i) a first plurality of reference feature data sets which correspond to a first recorded reference audio work, and (ii) a second plurality of reference feature data sets which correspond to a second recorded reference audio work, each recorded audio work being longer than the captured audio sample, the first plurality of reference feature data sets corresponding to a first plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the first plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the first plurality of frequency bands, the first plurality of reference feature data sets including features which correspond to spectrally distinct portions of the first plurality of frequency bands, the second plurality of reference feature data sets corresponding to a second plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the second plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the second plurality of frequency bands, the second plurality of reference feature data sets including features which correspond to spectrally distinct portions of the second plurality of frequency bands; and
  
  server processing structure configured to;
  
  compare the sample feature data received by said interface structure with the stored first and second pluralities of reference feature data sets; and
  
  generate a recognition signal in response to the received sample feature data matching at least one reference feature data set of the stored first and second pluralities of reference feature data sets.
- View Dependent Claims (7)
- - 7. Apparatus according to claim 6, wherein the server processing structure is also configured to:
    - receive a first reference input audio signal corresponding to the first recorded reference audio work;
      
      separate the received first reference input audio signal into the first plurality of frequency bands which have different frequencies;
      
      compute the first plurality of reference feature data sets, which correspond to spectrally distinct portions of the first plurality of frequency bands of the first received reference input audio signal, this computing comprising performing envelope extraction on the first plurality of frequency bands to provide low-bandwidth amplitude measurements of each of the first plurality of frequency bands to provide the first plurality of reference feature data sets;
      
      store in the memory the first plurality of reference feature data sets which correspond to the first reference input audio signal;
      
      receive a second reference input audio signal corresponding to the second recorded reference audio work;
      
      separate the received second reference input audio signal into the second plurality of frequency bands which have different frequencies;
      
      compute the second plurality of reference feature data sets, which correspond to spectrally distinct portions of the second plurality of frequency bands of the second received reference input audio signal, this computing comprising performing envelope extraction on the second plurality of frequency bands to provide low-bandwidth amplitude measurements of each of the second plurality of frequency bands to provide the second plurality of reference feature data sets; and
      
      store in the memory the second plurality of reference feature data sets which correspond to the second reference input audio signal.

8. Audio signal recognition server apparatus comprising:
- interface structure configured to (i) receive a first reference input audio signal corresponding to a first recorded reference audio work, and (ii) receive a second reference input audio signal corresponding to a second recorded reference audio work;
  
  a memory storing a library comprising (i) a first plurality of reference feature data sets which correspond to the first recorded reference audio work, and (ii) a second plurality of reference feature data sets which correspond to the second recorded reference audio work, each recorded reference audio work being longer than a received sample signal; and
  
  server processing structure configured to;
  
  separate the received first reference input audio signal into a first plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the first plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the first plurality of frequency bands;
  
  compute the first plurality of reference feature data sets, which correspond to spectrally distinct portions of the first plurality of frequency bands of the first received reference input audio signal;
  
  store in the memory the first plurality of reference feature data sets which correspond to the first reference input audio signal;
  
  separate the received second reference input audio signal into a second plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the second plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the second plurality of frequency bands;
  
  compute the second plurality of reference feature data sets, which correspond to spectrally distinct portions of the second plurality of frequency bands of the second received reference input audio signal; and
  
  store in the memory the second plurality of reference feature data sets which correspond to the second reference input audio signal.
- View Dependent Claims (9)
- - 9. Apparatus according to claim 8, wherein the interface structure is configured to receive, from a capture device, feature data that corresponds to a portion of a captured audio sample that is less than an entire reference audio work, and wherein the server processing structure is also configured to:
    - compare the sample feature data received by said interface structure with the stored first and second pluralities of reference feature data sets; and
      
      generate a recognition signal in response to the sample feature data matching at least one feature data set of the stored first and second pluralities of reference feature data sets.

10. An audio signal recognition method adapted to receive, from a capture device, feature data that corresponds to a captured audio sample that is less than an entire reference audio work, the recognition server method comprising:
- receiving, with an interface structure, the sample feature data from the capture device;
  
  storing, in a memory, a library comprising (i) a first plurality of reference feature data sets which correspond to a first recorded reference audio work, and (ii) a second plurality of reference feature data sets which correspond to a second recorded reference audio work, each recorded reference audio work being longer than the captured audio sample; and
  
  using a server processing structure to;
  
  receive a first reference input audio signal corresponding to the first recorded reference audio work;
  
  separate the received first reference input audio signal into a first plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the first plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the first plurality of frequency bands;
  
  compute the first plurality of reference feature data sets, which correspond to spectrally distinct portions of the first plurality of frequency bands of the first received reference input audio signal;
  
  store in the memory the first plurality of reference feature data sets which correspond to the first reference input audio signal;
  
  receive a second reference input audio signal corresponding to the second recorded reference audio work;
  
  separate the received second reference input audio signal into a second plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the second plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the second plurality of frequency bands;
  
  compute the second plurality of reference feature data sets, which correspond to spectrally distinct portions of the second plurality of frequency bands of the second received reference input audio signal;
  
  store in the memory the second plurality of reference feature data sets which correspond to the second reference input audio signal;
  
  compare the sample feature data received by said interface structure with the stored first and second pluralities of reference feature data sets; and
  
  generate a recognition signal in response to the received sample feature data matching at least one reference feature data set of the stored first and second pluralities of reference feature data sets.

11. An audio signal recognition server method adapted to receive, from a capture device, feature data that corresponds to a captured audio sample that is less than an entire reference audio work, the recognition server method comprising:
- receiving, with an interface structure, the sample feature data from the capture device;
  
  storing, in a memory, a library comprising (i) a first plurality of reference feature data sets which correspond to a first recorded reference audio work, and (ii) a second plurality of reference feature data sets which correspond to a second recorded reference audio work, each recorded reference audio work being longer than the captured audio sample, the first plurality of reference feature data sets corresponding to a first plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the first plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the first plurality of frequency bands, the first plurality of reference feature data sets including features which correspond to spectrally distinct portions of the first plurality of frequency bands, the second plurality of reference feature data sets corresponding to a second plurality of frequency bands which have different frequencies, the second plurality of reference feature data sets including features which correspond to spectrally distinct portions of the second plurality of frequency bands, a frequency bandwidth of a lower frequency band of the second plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the second plurality of frequency bands; and
  
  using a server processing structure to;
  
  compare the sample feature data received by said interface structure with the stored first and second pluralities of reference feature data sets; and
  
  generate a recognition signal in response to the received sample feature data matching at least one feature data set of the stored first and second pluralities of reference feature data sets.
- View Dependent Claims (12)
- - 12. A method according to claim 11, wherein the server processing structure is further used to:
    - receive a first reference input audio signal corresponding to the first recorded reference audio work;
      
      separate the received first reference input audio signal into the first plurality of frequency bands which have different frequencies;
      
      compute the first plurality of reference feature data sets, which correspond to spectrally distinct portions of the first plurality of frequency bands of the first received reference input audio signal, this computing comprising performing envelope extraction on the first plurality of frequency bands to provide low-bandwidth amplitude measurements of each of the first plurality of frequency bands to provide the first plurality of reference feature data sets;
      
      store in the memory the first plurality of reference feature data sets which correspond to the first reference input audio signal;
      
      receive a second reference input audio signal corresponding to the second recorded reference audio work;
      
      separate the received second reference input audio signal into the second plurality of frequency bands which have different frequencies;
      
      compute the second plurality of reference feature data sets, which correspond to spectrally distinct portions of the second plurality of frequency bands of the second received reference input audio signal, this computing comprising performing envelope extraction on the second plurality of frequency bands to provide low-bandwidth amplitude measurements of each of the second plurality of frequency bands to provide the second plurality of reference feature data sets; and
      
      store in the memory the second plurality of reference feature data sets which correspond to the second reference input audio signal.

13. An audio signal recognition server method comprising:
- receiving, with an interface structure, (i) a first reference input audio signal corresponding to the a first recorded reference audio work, and (ii) a second reference input audio signal corresponding to a second recorded reference audio work;
  
  storing, in a memory, a library comprising (i) a first plurality of reference feature data sets which correspond to the first recorded reference audio work, and (ii) a second plurality of reference feature data sets which correspond to the second recorded reference audio work, each recorded reference audio work being longer than a portion of a captured audio sample; and
  
  using a server processing structure to;
  
  separate the received first reference input audio signal into a first plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the first plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the first plurality of frequency bands;
  
  compute the first plurality of reference feature data sets, which correspond to spectrally distinct portions of the first plurality of frequency bands of the first received reference input audio signal;
  
  store in the memory the first plurality of reference feature data sets which correspond to the first reference input audio signal;
  
  separate the received second reference input audio signal into a second plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the second plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the second plurality of frequency bands;
  
  compute the second plurality of reference feature data sets, which correspond to spectrally distinct portions of the second plurality of frequency bands of the second received reference input audio signal; and
  
  store in the memory the second plurality of reference feature data sets which correspond to the second reference input audio signal.
- View Dependent Claims (14)
- - 14. A method according to claim 13, wherein the interface structure receives, from a capture device, feature data that corresponds to a captured audio sample that is less than an entire reference audio work, and wherein the server processing structure is further used to:
    - compare the sample feature data received by said interface structure with the stored first and second pluralities of reference feature data sets; and
      
      generate a recognition signal in response to the received sample feature data matching at least one feature data set of the stored first and second pluralities of reference feature data sets.

15. At least one computer readable non-transitory medium for one or more audio signal recognition servers which are adapted to receive, from a capture device, feature data that corresponds to a captured audio sample that is less than an entire reference audio work, the at least one computer readable medium having instructions which, when read by one or more processing structures of the one or more recognition servers, cause the one or more processing structures to:
- store, in a memory, a library comprising (i) a first plurality of reference feature data sets which correspond to a first recorded reference audio work, and (ii) a second plurality of reference feature data sets which correspond to a second recorded reference audio work, each recorded reference audio work being longer than the portion of the captured audio sample;
  
  receive a first reference input audio signal corresponding to the first recorded reference audio work;
  
  separate the received first reference input audio signal into a first plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the first plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the first plurality of frequency bands;
  
  compute the first plurality of reference feature data sets, which correspond to spectrally distinct portions of the first plurality of frequency bands of the first received reference input audio signal;
  
  store in the memory the first plurality of reference feature data sets which correspond to the first reference input audio signal;
  
  receive a second reference input audio signal corresponding to the second recorded reference audio work;
  
  separate the received second reference input audio signal into a second plurality of frequency bands which have different frequencies;
  
  a frequency bandwidth of a lower frequency band of the second plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the second plurality of frequency bandscompute the second plurality of reference feature data sets, which correspond to spectrally distinct portions of the second plurality of frequency bands of the second received reference input audio signal;
  
  store in the memory the second plurality of reference feature data sets which correspond to the second reference input audio signal;
  
  compare the sample feature data received from the capture device with the stored first and second pluralities of reference feature data sets; and
  
  generate a recognition signal in response to the received sample feature data matching at least one feature data set of the stored first and second pluralities of reference feature data sets.

16. At least one computer readable non-transitory medium for one or more audio signal recognition servers which are adapted to receive, from a capture device, feature data that corresponds to a captured audio sample that is less than an entire reference audio work, the at least one computer readable medium having instructions which, when read by one or more processing structures of the one or more recognition servers, cause the one or more processing structures to:
- store, in a memory, a library comprising (i) a first plurality of reference feature data sets which correspond to a first recorded reference audio work, and (ii) a second plurality of reference feature data sets which correspond to a second recorded reference audio work, each recorded reference audio work being longer than the captured audio sample, the first plurality of reference feature data sets corresponding to a first plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the first plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the first plurality of frequency bands, the first plurality of reference feature data sets including features which correspond to spectrally distinct portions of the first plurality of frequency bands, the second plurality of reference feature data sets corresponding to a second plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the second plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the second plurality of frequency bands, the second plurality of reference feature data sets including features which correspond to spectrally distinct portions of the second plurality of frequency bands;
  
  compare the sample feature data received by said interface structure with the stored first and second pluralities of reference feature data sets; and
  
  generate a recognition signal in response to the received sample feature data matching at least one feature data set of the stored first and second pluralities of reference feature data sets.
- View Dependent Claims (17)
- - 17. A computer readable non-transitory medium according to claim 16, wherein the at least one computer readable medium instructions cause the one or more processing structures to:
    - receive a first reference input audio signal corresponding to the first recorded reference audio work;
      
      separate the received first reference input audio signal into the first plurality of frequency bands which have different frequencies;
      
      compute the first plurality of reference feature data sets, which correspond to spectrally distinct portions of the first plurality of frequency bands of the first received reference input audio signal, this computing comprising performing envelope extraction on the first plurality of frequency bands to provide low-bandwidth amplitude measurements of each of the first plurality of frequency bands to provide the first plurality of reference feature data sets;
      
      store in the memory the first plurality of reference feature data sets which correspond to the first reference input audio signal;
      
      receive a second reference input audio signal corresponding to the second recorded reference audio work;
      
      separate the received second reference input audio signal into the second plurality of frequency bands which have different frequencies;
      
      compute the second plurality of reference feature data sets, which correspond to spectrally distinct portions of the second plurality of frequency bands of the second reference received input audio signal, this computing comprising performing envelope extraction on the second plurality of frequency bands to provide low-bandwidth amplitude measurements of each of the second plurality of frequency bands to provide the second plurality of reference feature data sets; and
      
      store in the memory the second plurality of reference feature data sets which correspond to the second reference input audio signal.

18. At least one computer readable non-transitory medium for one or more audio signal recognition servers which are adapted to receive, from a capture device, feature data that corresponds to a captured audio sample that is less than an entire reference audio work, the at least one computer readable medium having instructions which, when read by one or more processing structures of the one or more recognition servers, cause the one or more processing structures to:
- receive, at the one or more recognition servers;
  
  (i) a first reference input audio signal corresponding to the a first recorded reference audio work, and (ii) a second reference input audio signal corresponding to a second recorded reference audio work;
  
  storing, in a memory, a library comprising (i) a first plurality of reference feature data sets which correspond to the first recorded reference audio work, and (ii) a second plurality of reference feature data sets which correspond to the second recorded reference audio work, each recorded reference audio work being longer than a portion of the captured audio sample;
  
  separate the received first reference input audio signal into a first plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the first plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the first plurality of frequency bands;
  
  compute the first plurality of reference feature data sets, which correspond to spectrally distinct portions of the first plurality of frequency bands of the first received reference input audio signal;
  
  store in the memory the first plurality of reference feature data sets which correspond to the first reference input audio signal;
  
  separate the received second reference input audio signal into a second plurality of frequency bands which have different frequencies, a frequency bandwidth of a lower frequency band of the second plurality of frequency bands being narrower than a frequency bandwidth of a higher frequency band of the second plurality of frequency bands;
  
  compute the second plurality of reference feature data sets, which correspond to spectrally distinct portions of the second plurality of frequency bands of the second received reference input audio signal; and
  
  store in the memory the second plurality of reference feature data sets which correspond to the second reference input audio signal.
- View Dependent Claims (19)
- - 19. A computer readable non-transitory medium according to claim 18, wherein the at least one computer readable medium instructions cause the one or more processing structures to:
    - receive at the one or more recognition servers, from a capture device, feature data that corresponds to the captured audio sample that is less than an entire reference audio work;
      
      compare the sample feature data received by said interface structure with the stored first and second pluralities of reference feature data sets; and
      
      generate a recognition signal in response to the received sample feature data matching at least one feature data set of the stored first and second pluralities of reference feature data sets.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Iceberg Industries LLC
Original Assignee
Iceberg Industries LLC
Inventors
Kenyon, Stephen C., Simkins, Laura
Primary Examiner(s)
Kumar, Pankaj
Assistant Examiner(s)
BROWN, RUEBEN M

Application Number

US12/958,883
Publication Number

US 20110078719A1
Time in Patent Office

2,427 Days
Field of Search

725 18- 20, 704221, 704247, 704243
US Class Current
CPC Class Codes

G06F 2218/16 by matching signal segments

G10L 25/48 specially adapted for parti...

Method and apparatus for automatically recognizing input audio and/or video streams

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for automatically recognizing input audio and/or video streams

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links