Vibrato detection modules in a system for automatic transcription of sung or hummed melodies
First Claim
1. A method of vibrato detection applied to a sequence of detected pitches, the method including:
- processing electronically a sequence of detected pitches for frames and estimating a rate and a pitch depth of oscillations in the sequence;
comparing the estimated rate and pitch depth to a predetermined vibrato detection envelope and determining whether the sequence of detected pitches would be perceived as vibrato;
wherein the predetermined vibrato detection envelope maps combinations of a dominant pitch variation rate and a pitch depth at the dominant pitch on a perceptual basis to whether the combinations are likely to be perceived by a listener as vibrato;
repeatedly determining vibrato perception of successive sequences of frames and repeating the processing and comparing actions; and
outputting data regarding whether the successive sequences would be perceived as vibrato.
12 Assignments
0 Petitions
Accused Products
Abstract
The technology disclosed relates to audio signal processing. It includes a series of modules that individually are useful to solve audio signal processing problems. Among the problems addressed are buzz removal, selecting a pitch candidate among pitch candidates based on local continuity of pitch and regional octave consistency, making small adjustments in pitch, ensuring that a selected pitch is consistent with harmonic peaks, determining whether a given frame or region of frames includes harmonic, voiced signal, extracting harmonics from voice signals and detecting vibrato. One environment in which these modules are useful is transcribing singing or humming into a symbolic melody. Another environment that would usefully employ some of these modules is speech processing. Some of the modules, such as buzz removal, are useful in many other environments as well.
-
Citations
19 Claims
-
1. A method of vibrato detection applied to a sequence of detected pitches, the method including:
-
processing electronically a sequence of detected pitches for frames and estimating a rate and a pitch depth of oscillations in the sequence; comparing the estimated rate and pitch depth to a predetermined vibrato detection envelope and determining whether the sequence of detected pitches would be perceived as vibrato; wherein the predetermined vibrato detection envelope maps combinations of a dominant pitch variation rate and a pitch depth at the dominant pitch on a perceptual basis to whether the combinations are likely to be perceived by a listener as vibrato; repeatedly determining vibrato perception of successive sequences of frames and repeating the processing and comparing actions; and outputting data regarding whether the successive sequences would be perceived as vibrato. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An electronic signal processing component for detecting vibrato in frames that represent an audio signal, the component including:
-
an input port adapted to receive a stream of data frames including detected pitches; an FFT processor coupled to the input that processes sequences of data frames in the stream and estimates rate and pitch depth of oscillations in pitch; a comparison processor including data representing an envelope of combinations of rates and pitch depths of oscillation that would be perceived by listeners as vibrato, the comparison processor coupled to the estimates of rate and pitch depth of oscillations in pitch and operative to compare the estimates to the data representing the envelope; wherein the envelope of combinations maps combinations of a dominant pitch variation rate and a pitch depth at the dominant pitch on a perceptual basis to whether the combinations are likely to be perceived by a listener as vibrato; an output port coupled to the comparison processor that outputs results of the comparisons. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
-
15. An electronic signal processing component for detecting vibrato in frames that represent an audio signal, the component including:
-
an input port adapted to receive a stream of data frames including detected pitches; an FFT means for processing the sequences of data frames in the stream, coupled to the input port, and for estimating rate and pitch depth of oscillations in pitch; a comparison means for evaluating whether estimated pitch variation rates and pitch depth at a dominant pitch would be perceived as vibrato, based on comparison to data representing a psychoacoustic envelope of perceived vibrato; and an output port to which the comparison means reports results. - View Dependent Claims (16)
-
-
17. A computer readable non-volatile storage medium including program instructions for carrying out a method including:
-
processing a sequence of detected pitches for frames and estimating a rate and a pitch depth of oscillations in the sequence; comparing the estimated rate and pitch depth to a predetermined vibrato detection envelope and determining whether the sequence of detected pitches would be perceived as vibrato; wherein the predetermined vibrato detection envelope maps combinations of a dominant pitch variation rate and a pitch depth at the dominant pitch on a perceptual basis to whether the combinations are likely to be perceived by a listener as vibrato; repeatedly determining vibrato perception of successive sequences of frames by indexing through the sequences of frame and repeating the processing and comparing actions; and outputting data regarding whether the successive sequences would be perceived as vibrato. - View Dependent Claims (18, 19)
-
Specification