MUSIC INFORMATION RETRIEVAL USING A 3D SEARCH ALGORITHM

US 20070131094A1
Filed: 11/09/2006
Published: 06/14/2007
Est. Priority Date: 11/09/2005
Status: Active Grant

First Claim

Patent Images

1. A method for the retrieval of music information based on audio input (102, 300a), the method comprising the following steps:

pre-storing (S11a) a defined set of music sequences with associated information, entering (S11b) speech (400) and/or music information (102, 300a) and arranging (S11c) a coding representing said speech and music information, respectively, as a first (S) and a second dimension (H) of a three-dimensional search space, time (t) being the third dimension, and carrying out (S11d) a search in the three-dimensional search space in order to find the music sequence out of the set of music sequences matching best to the entered speech (400) and/or music information (102, 300a).

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention generally relates to the field of content-based music information retrieval systems, in particular to a method and a query-by-humming (QbH) database system (100′) for processing queries in the form of analog audio sequences which encompass recorded parts of sung, hummed or whistled tunes (102), recorded parts of a melody (300a) played on a musical instrument and/or a speaker'"'"'s recorded voice (400) articulating at least one part of a song'"'"'s lyrics to retrieve textual background information about a musical piece whose score is stored in an integrated database (103, 105) of said system after having analyzed and recognized said melody (300a).

According to one embodiment of the present invention, said method is characterized by the steps of recording (S1) said analog audio sequences (102, 300a, 400), extracting (S4a) and analyzing (S4b) various acoustic-phonetic speech characteristics of the speaker'"'"'s voice and pronunciation from spoken parts (400) of a recorded song'"'"'s lyrics (102″) and recognizing (S4c) syntax and semantics of said lyrics (102″). The method further comprises the steps of extracting (S2a), analyzing (S2b) and recognizing (S2c) musical key characteristics from the analog audio sequences (102, 300a, 400), which are given by the semitone numbers of the particular notes, the intervals and/or interval directions of the melody and the time values of the notes and pauses the rhythm of said melody is composed of, the key, beat, tempo, volume, agogics, dynamics, phrasing, articulation, timbre and instrumentation of said melody, the harmonies of accompaniment chords and/or electronic sound effects generated by said musical instrument. The invention is characterized by the step of calculating (S3a) a similarity measure indicating the similarity of melody and lyrics of the recorded audio sequence (102, 300a) compared to melody and lyrics of various music files stored in said database (103, 105) by performing a Viterbi search algorithm on a three-dimensional search space, said search space having a first dimension (t) for the time, a second dimension (S) for an appropriate coding of the acoustic-phonetic speech characteristics and a third dimension (H) for an appropriate coding of the musical key characteristics, and generating (S3b) a ranked list (107) of said music files.

Citations

10 Claims

1. A method for the retrieval of music information based on audio input (102, 300a), the method comprising the following steps:
- pre-storing (S11a) a defined set of music sequences with associated information, entering (S11b) speech (400) and/or music information (102, 300a) and arranging (S11c) a coding representing said speech and music information, respectively, as a first (S) and a second dimension (H) of a three-dimensional search space, time (t) being the third dimension, and carrying out (S11d) a search in the three-dimensional search space in order to find the music sequence out of the set of music sequences matching best to the entered speech (400) and/or music information (102, 300a).
- View Dependent Claims (2, 3, 4, 5, 6, 10)
- - 2. A method according to claim 1, characterized in that each of the entered speech (400) and/or music information (102, 300a) is individually pre-processed (S12) before respectively being represented as a coding in the three-dimensional search space.
  - 3. A method according to anyone of claims 1 or 2, comprising the following steps:
    - calculating (S9c) a similarity measure indicating the similarity of the entered speech and music information compared to melody and lyrics of the pre-stored music files in a database (103, 105), and generating (S9d) a ranked list (107) of said music files, the ranking of a music file depending on the respecitve similarity measure.
  - 4. A method according to claim 3, characterized by the steps of encoding (S5a) a sung or hummed tune 102 and/or a played melody 300a consisting of N notes and/or pauses, wherein N denotes an integer value greater than one, by a first character string, in the following referred to as “
    - melody reference string”
      
      (REF), retrieving (S5b) encoded previously analyzed melodies consisting of N notes and/or pauses whose scores are stored in a database (103 or 105) and encoded by a second character string from a number (M) of stored character strings, in the following referred to as “
      
      melody hypothesis strings”
      
      (HYPO₀, HYPO₁, . . . , HYPO_k, . . . , HYPO_M-1), whose elements are given as described above, encoding (S9a) recognized phonemes from spoken parts 400 of a recorded song'"'"'s lyrics 102″
      
      consisting of P phonemes, wherein P denotes an integer value greater than one, by a first character string, in the following referred to as “
      
      speech reference string”
      
      (REF_s), and concatenating said speech reference string (REF_s) to said melody reference string REF, thus yielding a combined reference string (REF_ms), retrieving (S9b) phonemes of previously analyzed speech signals consisting of P phonemes, said phonemes being encoded by a second character string, from a number (Q) of pre-stored character strings, in the following referred to as “
      
      speech hypothesis strings”
      
      (HYPO_s0, HYPO_s1, HYPO_s2, . . . , HYPO_s,k, . . . , HYPO_s,Q-1), and concatenating said speech hypothesis strings to said melody hypothesis strings, thus yielding combined hypothesis strings (HYPO_ms0, HYPO_ms1, HYPO_ms2, . . . , HYPO_ms,k, . . . , HYPO_ms,M+Q−
      
      1), and calculating (S9c) a similarity measure indicating the similarity between melody and lyrics of the recorded audio sequence 102 compared to melody and lyrics of a variety of music files stored in said database by using a single two-dimensional search space in form of an (N+P−
      
      1)×
      
      (N+P−
      
      1) alignment matrix (D_ms) having the character index i of the k-th combined hypothesis string (a_ms;
      
      =(Interval₁, . . . , Interval_N-1, Phoneme₁, . . . , Phoneme_P)^T) as column coordinate and the character index j of the combined reference string (b_ms;
      
      =(Interval₁, . . . , Interval_N-1, Phoneme₁, . . . , Phoneme_P)^T) as row coordinate.
  - 5. A method according to claim 4, wherein the step of calculating (S9c) said similarity measure is characterized by the following steps:
    - creating (S10a) an (N+P−
      
      1)×
      
      (N+P−
      
      1) alignment matrix (D_ms) by setting (S6a1) the character index i of the k-th hypothesis string (a_m;
      
      =(Interval₁, . . . , Interval_N-1, Phoneme₁, . . . , Phoneme_P)^T) as coordinate for the columns and the character index j of the reference string (b_ms;
      
      =(Interval₁, . . . , Interval_N-1, Phoneme₁, . . . , Phoneme_P)^T) as coordinate for the rows of said matrix and filling (S6a2) the alignment matrix (D_ms) by calculating and setting each (i,j)-element of said matrix according to a filling scheme for filling accumulated cost factors (d_i,j=f(d_i-1,j, d_i,j-1, d_i-1,j-1, w(a_i, b_j))) into the cells of said alignment matrix (D_ms), executing (S10b) an alignment function based on the Viterbi search algorithm to compare the combined reference string (REF_ms) with the combined hypothesis strings (HYPO_ms0, HYPO_ms1, HYPO_ms2, . . . , HYPO_ms,k, . . . , HYPO_ms,M+Q−
      
      1) of all stored melodies and lyrics, which returns a string of characters and/or a sequence of cost factors (w(a_i, b_j)) indicating which characters of the combined reference string (REF_ms) how closely match with the characters of the k-th combined hypothesis string (HYPO_ms,k), and executing (S10c) a backtracking algorithm which starts with the lowest cost factor in the last column of the alignment matrix (D_ms) and goes back through the alignment matrix towards the first row and the first column of said matrix along a tracking path derived by the alignment function.
  - 6. A method according to claim 5, characterized by the step of calculating (S7) the elements (d_i,j) of said alignment matrix (D_ms) according to the following filling scheme:
    - $d_{ij};= \min {\begin{matrix} d_{i - 1, j} + w (a_{i}, 0) & \forall i, j \in {1, 2, \dots, N + P - 1} & (case #1) \\ d_{i - 1, j - 1} + w (a_{i}, b_{j}) & \forall i, j \in {1, 2, \dots, N + P - 1} & (case #2) \\ d_{i, j - 1} + w (0, b_{j}) & \forall i, j \in {1, 2, \dots, N + P - 1} & (case #3) \end{matrix}$ with the initial conditions
      d_0,0;
      
      =0,
      d_i,0;
      
      =d_i-1,0+w(a_i,
      
      0) ∀
      
      i ε
      
      {1, 2, 3, . . . , N+P−
      
      1}, and
      d_0,j;
      
      =d_0,j-1+w(0, b_j) ∀
      
      j ε
      
      {1, 2, 3, . . . , N+P−
      
      1}, wherein w(a_i, 0) is a cost factor associated with the deletion of the character a_iof the k-th hypothesis string (HYPO_ms,k) according to case #1, w(0, b_j) is a cost factor associated with the insertion of the character b_jinto the combined reference string (REF_ms) according to case #3, and w(a_i, b_j) is a cost factor associated with the replacement of the element a_iof the k-th combined hypothesis string (HYPO_ms,k) by the element b_jof the combined reference string (REF_ms) according to case #2, wherein w(a_i, b_j) is set to zero if a_i=b_jand set to a value greater than zero if a_i≠
      
      b_j.
  - 10. A computer software program product implementing a method according to anyone of claims 1 to 6 when running on a computing device.

7. A music information retrieval system based on audio input (102, 300a), said system comprising:
- a database (103, 105) for prestoring (S11a) a defined set of music sequences with associated information, means (101) for entering (S11b) speech (400) and/or music information (102, 300a), coding means (100′
  
  , 104″
  
  ) for arranging (S11c) a coding representing said speech and music information respectively as a first (S) and a second dimension (H) of a three-dimensional search space, time (t) being the third dimension, characterized by matching means (106) for carrying out (S11d) a search in the three-dimensional search space in order to find the music sequence out of the set of music sequences matching best to the entered speech (400) and/or music information (102, 300a).
- View Dependent Claims (8, 9)
- - 8. A music information retrieval system according to claim 7, characterized in that said coding means (100′
    - , 104″
      
      ) comprises an automatic music recognition system (100′
      
      ) for extracting (S2a), analyzing (S2b) and recognizing (S2c) musical key characteristics from the analog audio sequences (102, 300a), and an automatic speech recognition system (104″
      
      ) for extracting (S4a) and analyzing (S4b) acoustic-phonetic speech characteristics of the speaker'"'"'s voice and pronunciation from spoken parts (400) of the recorded song'"'"'s lyrics (102″
      
      ) and for recognizing (S4c) syntax and semantics of said lyrics (102″
      
      ).
  - 9. A music information retrieval system according to anyone of claims 7 or 8, characterized in that said matching means (106) comprises means for calculating (S3a) a similarity measure indicating the similarity of melody and lyrics of the entered audio sequence (102, 300a) compared to melody and lyrics of various music files stored in said database (103, 105) by performing a Viterbi search algorithm on the three-dimensional search space and for generating (S3b) a ranked list (107) of said music files.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Deutschland GmbH (Sony Group Corp.)
Original Assignee
Sony Deutschland GmbH (Sony Group Corp.)
Inventors
Kemp, Thomas

Granted Patent

US 7,488,886 B2
Time in Patent Office

Days
Field of Search
US Class Current

84/609
CPC Class Codes

G06F 16/632   Query formulation

G06F 16/634   Query by example, e.g. quer...

G06F 16/68   Retrieval characterised by ...

G06F 16/683   using metadata automaticall...

G06F 16/685   using automatically derived...

G10H 1/0008   Associated control or indic...

G10H 2210/031   Musical analysis, i.e. isol...

G10H 2240/131   Library retrieval, i.e. sea...

G10H 2240/135   Library retrieval index, i....

G10H 2240/141   Library retrieval matching,...

G10H 2250/005   Algorithms for electrophoni...

G10H 2250/021   Dynamic programming, e.g. V...

G10L 25/48   specially adapted for parti...

G10L 25/90   Pitch determination of spee...

MUSIC INFORMATION RETRIEVAL USING A 3D SEARCH ALGORITHM

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

MUSIC INFORMATION RETRIEVAL USING A 3D SEARCH ALGORITHM

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links