World Wide Web-based melody retrieval system with thresholds determined by using distribution of pitch and span of notes
First Claim
Patent Images
1. A World Wide Web-based melody retrieval method, comprising:
- obtaining sequences of relative-pitch difference and relative-span ratio from sequences of pitch and span values of each musical composition in a database;
making a single histogram of the relative-pitch values of the database based on the distribution of all notes in the database;
making a single histogram of the relative-span values of the database based on the distribution of all notes in the database;
determining thresholds for relative-pitch so that each category, which will be defined by the thresholds, equally includes M1(=Sum1/Category-- Num1) values, wherein M1 is the number of frequency of relative-pitch values which each category will contain, Sum1 is the total frequency of values of a histogram of relative-pitch, and Category-- Num1 is the number of categories for relative-pitch values, and simultaneously determining thresholds for relative-span values so that each category equally includes M2(Sum2/Category-- Num2) values, wherein M2 is the number of frequency of relative-span values which each category will contain, Sum2 is the total frequency of values of a histogram of relative-span, and Category-- Num2 is the number of categories;
converting the relative-pitch values and the relative-span values into coarse pitch values, which are approximate relative-values of pitch, and coarse span values, which are approximate relative-values of span, respectively, according to the previously obtained thresholds;
inputting a song by singing, humming, or whistling a melody into a microphone;
subjecting the inputted melody to an A/D conversion so as to produce a digital signal;
detecting a voiced sound from a sound signal in the digital signal produced by the A/D conversion;
defining a fundamental frequency of each frame from the detected voiced sound;
dividing off an onset time of the voiced sound as an onset time of each note;
determining a time difference in units of frame number as the span value of the note;
determining the maximum value among the fundamental frequencies of each note contained during the span of said each note as the highest pitch value;
calculating the relative pitch and span values of each note by using the determined highest pitch value and the determined span value of the note preceding the note for which the relative pitch and span values are being calculated;
transmitting the calculated relative pitch and relative span values to a melody retrieval system over the World Wide Web network;
converting the relative pitch values and the relative span values of the inputted melody into coarse pitch values and coarse span values, respectively, by using the previously obtained thresholds;
comparing the coarse pitch values and coarse span values of the inputted melody with the coarse pitch values and coarse span values, respectively, of each melody in the music database;
calculating a distance between the pitch and span of each melody in the music database and the inputted melody; and
displaying information indicative of musical compositions matching the inputted melody on a display of an input device.
0 Assignments
0 Petitions
Accused Products
Abstract
A World Wide Web-based melody retrieval system takes a sung melody as a query and retrieves the song'"'"'s title or other information from a music database over a WWW network which comprises a method of obtaining search clues with the maximum quantity of information from pitch and span (dynamic threshold determination) and a method of effectively reducing the number of answer candidates (coarse-to-fine matching), thus increasing the matching accuracy, and it is characterized in that a user can retrieve music or media with music by singing.
276 Citations
4 Claims
-
1. A World Wide Web-based melody retrieval method, comprising:
-
obtaining sequences of relative-pitch difference and relative-span ratio from sequences of pitch and span values of each musical composition in a database; making a single histogram of the relative-pitch values of the database based on the distribution of all notes in the database; making a single histogram of the relative-span values of the database based on the distribution of all notes in the database; determining thresholds for relative-pitch so that each category, which will be defined by the thresholds, equally includes M1(=Sum1/Category-- Num1) values, wherein M1 is the number of frequency of relative-pitch values which each category will contain, Sum1 is the total frequency of values of a histogram of relative-pitch, and Category-- Num1 is the number of categories for relative-pitch values, and simultaneously determining thresholds for relative-span values so that each category equally includes M2(Sum2/Category-- Num2) values, wherein M2 is the number of frequency of relative-span values which each category will contain, Sum2 is the total frequency of values of a histogram of relative-span, and Category-- Num2 is the number of categories; converting the relative-pitch values and the relative-span values into coarse pitch values, which are approximate relative-values of pitch, and coarse span values, which are approximate relative-values of span, respectively, according to the previously obtained thresholds; inputting a song by singing, humming, or whistling a melody into a microphone; subjecting the inputted melody to an A/D conversion so as to produce a digital signal; detecting a voiced sound from a sound signal in the digital signal produced by the A/D conversion; defining a fundamental frequency of each frame from the detected voiced sound; dividing off an onset time of the voiced sound as an onset time of each note; determining a time difference in units of frame number as the span value of the note; determining the maximum value among the fundamental frequencies of each note contained during the span of said each note as the highest pitch value; calculating the relative pitch and span values of each note by using the determined highest pitch value and the determined span value of the note preceding the note for which the relative pitch and span values are being calculated; transmitting the calculated relative pitch and relative span values to a melody retrieval system over the World Wide Web network; converting the relative pitch values and the relative span values of the inputted melody into coarse pitch values and coarse span values, respectively, by using the previously obtained thresholds; comparing the coarse pitch values and coarse span values of the inputted melody with the coarse pitch values and coarse span values, respectively, of each melody in the music database; calculating a distance between the pitch and span of each melody in the music database and the inputted melody; and displaying information indicative of musical compositions matching the inputted melody on a display of an input device. - View Dependent Claims (2)
-
-
3. A World Wide Web-based melody retrieval system, comprising:
-
means for obtaining sequences of relative-pitch difference and relative-span ratio from sequences of pitch and span values of each musical composition in a database; means for making a single histogram of the relative-pitch values of the database based on the distribution of all notes in the database; means for making a single histogram of the relative-span values of the database based on the distribution of all notes in the database; means for determining thresholds for relative-pitch so that each category, which will be defined by the thresholds, equally includes M1(=Sum1/Category-- Num1) values, wherein M1 is the number of frequency of relative-pitch values which each category will contain, Sum1 is the total frequency of values of a histogram of relative-pitch, and Category-- Num1 is the number of categories for relative-pitch values, and simultaneously determining thresholds for relative-span values so that each category equally includes M2(=Sum2/Category-- Num2) values, wherein M2 is the number of frequency of relative-span values which each category will contain, Sum2 is the total frequency of values of a histogram of relative-span, and Category-- Num2 is the number of categories; means for converting the relative-pitch values and the relative-span values into coarse pitch values, which are approximate relative-values of pitch, and coarse span values, which are approximate relative-values of span, respectively, by using the previously obtained thresholds; means for inputting a song by singing, humming, or whistling a melody into a microphone; means for subjecting the inputted melody to an A/D conversion so as to produce a digital signal; means for detecting a voiced sound from a sound signal in the digital signal produced by the A/D conversion; means for defining a fundamental frequency of each frame from the detected voiced sound; means for dividing off an onset time of the voiced sound as an onset time of each note; means for determining a time difference in units of frame number as the span value of the note; means for determining the maximum value among the fundamental frequencies of each note contained during the span of said each note as the highest pitch value; means for calculating the relative pitch and span values of each note by using the determined highest pitch value and the determined span value of the note preceding the note for which the relative pitch and span values are being calculated; means for transmitting the calculated relative pitch and relative span values to a song retrieval system over the World Wide Web network; means for converting the relative pitch values and the relative span values of the inputted melody into coarse pitch values and coarse span values, respectively, according to the previously obtained thresholds; means for comparing the coarse pitch values and coarse span values of the inputted melody with the coarse pitch values and coarse span values, respectively, of each melody in the music database; means for calculating a distance between the pitch and span of each melody in the music database and the inputted melody; and means for display information indicative of musical compositions matching the inputted melody on a display of an input device. - View Dependent Claims (4)
-
Specification