Method for automatic analysis of audio including music and speech

US 6,542,869 B1
Filed: 05/11/2000
Issued: 04/01/2003
Est. Priority Date: 05/11/2000
Status: Expired due to Term

First Claim

Patent Images

1. A method for identifying novelty points in a source audio signal comprising the steps of:

sampling the audio signal and dividing the audio signal into windowed portions with a plurality of samples taken from within each of the windowed portions;

parameterizing the windowed portions of the audio signal by applying a first function to each windowed portion to form a vector parameter for each window; and

embedding the parameters by applying a second function which provides a measurement of similarity between the parameters.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for determining points of change or novelty in an audio signal measures the self similarity of components of the audio signal. For each time window in an audio signal, a formula is used to determine a vector parameterization value. The self-similarity as well as cross-similarity between each of the parameterization values is then determined for all past and future window regions. A significant point of novelty or change will have a high self-similarity in the past and future, and a low cross-similarity. The extent of the time difference between “past” and “future” can be varied to change the scale of the system so that, for example, individual musical notes can be found using a short time extent while longer events, such as musical themes or changing of speakers, can be identified by considering windows further into the past or future. The result is a measure of the degree of change, or how novel the source audio is at any time. The method can be used in a wide variety of applications, including segmenting or indexing for classification and retrieval, beat tracking, and summarizing of speech or music.

351 Citations

36 Claims

1. A method for identifying novelty points in a source audio signal comprising the steps of:
- sampling the audio signal and dividing the audio signal into windowed portions with a plurality of samples taken from within each of the windowed portions;
  
  parameterizing the windowed portions of the audio signal by applying a first function to each windowed portion to form a vector parameter for each window; and
  
  embedding the parameters by applying a second function which provides a measurement of similarity between the parameters.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
- - 2. The method of claim 1, wherein the first function used in the parameterization step comprises a log magnitude of a Fast Fourier Transform (FFT).
  - 3. The method of claim 1, wherein the first function used in the parameterization step comprises a Mel-Frequency Cepstral Coefficients (MFCC) analysis.
  - 4. The method of claim 1, wherein the first function used in the parameterization step comprises a Moving Picture Experts Group (MPEG) audio standard.
  - 5. The method of claim 1, wherein the second function used in the embedding step comprises a Euclidean distance measurement:
6. The method of claim 1, wherein the second function used in the embedding step comprises a dot product:
7. The method of claim 1, wherein the second function used in the embedding step comprises a normalized dot product:
8. The method of claim 1, wherein the embedded parameters are provided in a form of a matrix S(i,j), wherein i identifies rows of the matrix and j identifies columns of the matrix, the method further comprising the step of:
- identifying a slanted domain matrix L(i,l) from the matrix S(i,j), wherein l is a lag value l=i−
  
  j.
9. The method of claim 1, wherein the embedded parameters are provided in a form of a matrix S, the method further comprising the step of:
- correlating the matrix S with a matrix kernel C to determine a novelty score.
10. The method of claim 9, wherein the matrix kernel C comprises a 2×
- 2 checkerboard kernel defined as follows;
  
  $[\begin{matrix} 1 & - 1 \\ - 1 & 1 \end{matrix}] .$
11. The method of claim 9, wherein the matrix kernel C comprises a coherence kernel defined as follows:
- $[\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}] .$
12. The method-of claim 9, wherein the matrix kernel C comprises an anti-coherence kernel defined as follows:
- $[\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}] .$
13. The method of claim 9, wherein the matrix kernel C comprises a checkerboard kernel including four quadrants with ones (1s) in two opposing quadrants and negative ones (−
- 1s) in two opposing quadrants.
14. The method of claim 13, wherein the matrix kernel C comprises the checkerboard kernel smoothed using a function which tapers toward-zero at the edges of the matrix kernel C.
15. The method of claim 14, wherein the function comprises a radially-symmetrical Gaussian taper.
16. The method of claim 9, wherein the matrix kernel C comprises a checkerboard kernel including four quadrants with ones (1s) in two opposing quadrants and zeros (0s) in two opposing quadrants.
17. The method of claim 9, wherein the novelty score D(i), where i is a frame number, is determined as follows:
- $D (i) = \sum_{m = - L / 2}^{L / 2} \sum_{n = - L / 2}^{L / 2} C (m, n) S (i + m, i + n)$ wherein the matrix kernel C has a width of L and is centered at m=0,n=0.
18. The method of claim 9, further comprising the step of:
- thresholding the novelty score by determining points in the novelty score above a predetermined threshold.
19. The method of claim 18 further comprising the step of forming a binary tree structure from the points in the novelty score above the predetermined threshold by performing the steps of:
- identifying one of the points above the threshold which is highest above the threshold as a root of the binary tree and dividing remaining points from the novelty score above the predetermined threshold into first left and first right points relative to the root point;
  
  identifying a first left subsequent tree point which is highest above the threshold in the first left points and dividing remaining points from the first left points into second left and second right points relative to the left subsequent tree point; and
  
  identifying a first right subsequent tree point which is highest above the threshold in the first right points and dividing remaining points from the first right points into third left and third right points relative to the right subsequent tree point.
20. The method of claim 18 further comprising the steps of:
- defining segments in the audio signal as groups of points in the novelty score above the threshold with each point in the group adjacent to a point above the threshold; and
  
  audio gisting by averaging points in each of the segments and identifying a number of segments with points in the novelty score which are most similar.
21. The method of claim 18 further comprising the steps of:
- defining segments in the audio signal as groups of points in the novelty score above the threshold with each point in the group adjacent to a point above the threshold; and
  
  indexing the audio signal by defining numbers to identify locations of individual ones of the segments.
22. The method of claim 21 further comprising the step of:
- browsing the audio signal by playing a portion of the audio signal at the beginning of each segment.
23. The method of claim 18 further comprising the steps of:
- defining segments in the audio signal as groups of points in the novelty score above the threshold with each point in the group adjacent to a point above the threshold; and
  
  editing the audio signal by cutting portions of the audio signal which are not part of the segments.
24. The method of claim 18 further comprising the steps of:
- defining segments in the audio signal as groups of points in the novelty score above the threshold with each point in the group adjacent to a point above the threshold; and
  
  warping the audio signal so that segment boundaries occur at predetermined times in the audio signal.
25. The method of claim 18 further comprising the steps of:
- defining segments in the audio signal as groups of points in the novelty score above the threshold with each point in the group adjacent to a point above the threshold; and
  
  aligning portions of a video signal to the audio signal based on locations of the segments.
26. The method of claim 9, further comprising the steps of:
- defining a beat spectrum by summing points forming a diagonal line in the matrix S;
  
  correlating the novelty score with the beat spectrum; and
  
  identifying peaks in the correlated novelty score and beat spectrum to determine a tempo for music in the audio signal source.
27. The method of claim 1, wherein the embedded parameters are provided in a form of a matrix S, the method further comprising the steps of:
- correlating the matrix S with a coherence kernel to determine a first novelty-score;
  
  correlating the matrix S with an anti-coherence kernel to determine a second novelty score; and
  
  determining a difference between the first novelty score and the second novelty score, wherein the coherence kernel is defined as follows;
  
  $[\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}]$ and, wherein the anti-coherence kernel is defined as follows;
  
  $[\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}] .$
28. The method of claim 1, wherein the embedded parameters are provided in a form of a matrix S, the method further comprising the steps of:
- correlating the matrix S with a coherence kernel to determine a first novelty score;
  
  correlating the matrix S with an anti-coherence kernel to determine a second novelty score; and
  
  determining a difference between the first novelty score and the second novelty score, wherein the coherence and the anti-coherence kernels each comprise four quadrants with ones (1s) in two opposing quadrants and zeros (0s) in two opposing quadrants, wherein the ones (1s) in the coherence kernel are in opposing quadrants from the ones (1s) in the anti-coherence kernel.
29. The method of claim 1, wherein the embedded parameters are provided in a form of a matrix S, the method further comprising the step of:
- defining a beat spectrum by summing points forming a diagonal line in the matrix.
30. The method of claim 29, wherein the diagonal line is a main diagonal of the matrix S.
31. The method of claim 29, wherein the diagonal line is a sub-diagonal of the matrix S, wherein the sub-diagonal is parallel to a main diagonal.
32. The method of claim 29 further comprising the step of:
- identifying peaks in the beat spectrum to determine a tempo for music in the audio signal source.
33. The method of claim 32 further comprising the steps of:
- warping the audio signal so that the tempo of the audio signal matches a tempo of a second audio signal.
34. The method of claim 1, wherein the embedded parameters are provided in a form of a matrix S(i,j), wherein i identifies rows of the matrix and j identifies columns of the matrix, the method further comprising the step of:
- auto-correlating the matrix S(i,j) to define a beat spectrum B(k,l), wherein k and l are predetermined integers, as follows;
  
  $B (k, l) = \sum_{i, j} S (i, j) S (i + k, j + l) .$
35. The method of claim 34 further comprising the step of:
- identifying peaks in the beat spectrum to determine a tempo for music in the audio signal source.
36. The method of claim 1, wherein in the step of embedding the parameters, the second function is applied to determine self-similarity between one of the parameters and itself, as well as cross similarity between two different ones of the parameters.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fuji Xerox Company Limited (Xerox Holdings Corp.), Xerox Corporation (Xerox Holdings Corp.)
Original Assignee
Fuji Xerox Company Limited (Xerox Holdings Corp.)
Inventors
Foote, Jonathan
Primary Examiner(s)
Dorvil, Richemond

Application Number

US09/569,230
Time in Patent Office

1,055 Days
Field of Search

704/500, 704/200.1, 704/270, 704/270.1, 704/200, 704/231, 704/233, 704/251, 704/253, 704/257, 704/210, 704/205, 704/207, 704/246, 704/250
US Class Current

704/500
CPC Class Codes

G06F 16/634   Query by example, e.g. quer...

G06F 16/683   using metadata automaticall...

G10H 1/00   Details of electrophonic mu...

G10H 2210/041   based on mfcc [mel -frequen...

G10H 2210/046   for differentiation between...

G10H 2210/061   for extraction of musical p...

G10H 2240/135   Library retrieval index, i....

G10H 2250/235   Fourier transform; Discrete...

G10L 15/04   Segmentation; Word boundary...

G10L 19/02   using spectral analysis, e....

G11B 27/28   by using information signal...

Method for automatic analysis of audio including music and speech

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

351 Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

Method for automatic analysis of audio including music and speech

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

351 Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links