Method of analysing an audio signal
First Claim
Patent Images
1. A method of analysing an audio signal, the audio signal comprising a speech signal, the method comprising the steps of:
- (a) receiving a digital representation of the audio signal;
(b) generating a first output function, said first output function being a response of a physiological model of a human cochlea to the digital representation, the amplitude of the response representing presence of speech in the audio signal in terms of time and space;
(c) selecting a temporal region of the first output function;
(d) identifying a plurality of peaks from the selected temporal region of the first output function, said plurality of peaks being identified according to a rate of change of the amplitude in the temporal region;
(e) comparing a first one of the plurality of peaks in a first temporal location with a spatially adjacent peak at a second temporal location to determine at least one property of the first output function by(i) comparing said first peak to said spatially adjacent peak to determine if said spatially adjacent peak is in a neighbourhood of said first peak; and
(ii) generating a track function using the results of step (i), the track function storing locations of a plurality of said peaks in terms of time and space, wherein if said spatially adjacent peak is within the neighbourhood, said spatially adjacent peak is part of the same track as said first peak; and
(f) determining one or more values for use in analysing the audio signal, based on the determined property of the first output function by(i) selecting a relevant spatial range according to a signal dependent threshold of energy of the first output function, and temporal distance between a plurality of neighbouring tracks of the first output function;
(ii) determining a track center point for each of the plurality of tracks within the spatial range, each track center point representing a center of mass of a corresponding track of the first output function in the spatial range, each track centre point belonging to a plurality of second output functions; and
(iii) determining a centre of mass of the determined track centre points to generate a salient formant point, wherein the salient formant point belongs to the plurality of the second output functions.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of analyzing an audio signal is disclosed. A digital representation of an audio signal is received and a first output function is generated based on a response of a physiological model to the digital representation. At least one property of the first output function may be determined. One or more values are determined for use in analyzing the audio signal, based on the determined property of the first output function.
56 Citations
12 Claims
-
1. A method of analysing an audio signal, the audio signal comprising a speech signal, the method comprising the steps of:
-
(a) receiving a digital representation of the audio signal; (b) generating a first output function, said first output function being a response of a physiological model of a human cochlea to the digital representation, the amplitude of the response representing presence of speech in the audio signal in terms of time and space; (c) selecting a temporal region of the first output function; (d) identifying a plurality of peaks from the selected temporal region of the first output function, said plurality of peaks being identified according to a rate of change of the amplitude in the temporal region; (e) comparing a first one of the plurality of peaks in a first temporal location with a spatially adjacent peak at a second temporal location to determine at least one property of the first output function by (i) comparing said first peak to said spatially adjacent peak to determine if said spatially adjacent peak is in a neighbourhood of said first peak; and (ii) generating a track function using the results of step (i), the track function storing locations of a plurality of said peaks in terms of time and space, wherein if said spatially adjacent peak is within the neighbourhood, said spatially adjacent peak is part of the same track as said first peak; and (f) determining one or more values for use in analysing the audio signal, based on the determined property of the first output function by (i) selecting a relevant spatial range according to a signal dependent threshold of energy of the first output function, and temporal distance between a plurality of neighbouring tracks of the first output function; (ii) determining a track center point for each of the plurality of tracks within the spatial range, each track center point representing a center of mass of a corresponding track of the first output function in the spatial range, each track centre point belonging to a plurality of second output functions; and (iii) determining a centre of mass of the determined track centre points to generate a salient formant point, wherein the salient formant point belongs to the plurality of the second output functions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. An apparatus for analysing an audio signal, the audio signal comprising a speech signal, the apparatus comprising:
-
means for receiving a digital representation of an audio signal; means for generating a first output function, said first output function being a response of a physiological model of the human cochlea to the digital representation, the amplitude of the response representing presence of speech in the audio signal in terms of time and space; means for selecting a temporal region of the first output function; means for identifying a plurality of peaks from the selected temporal region of the first output function, said plurality of peaks being identified according to a rate of change of the amplitude in the temporal region; means for comparing a first one of the plurality of peaks in a first temporal location with a spatially adjacent peak at a second temporal location to determine at least one property of the first output function by (i) comparing said first peak to said spatially adjacent peak to determine if said spatially adjacent peak is in a neighbourhood of said first peak; and (ii) generating a track function using the results of step (i), the track function storing locations of a plurality of said peaks in terms of time and space, wherein if said spatially adjacent peak is within the neighbourhood, said spatially adjacent peak is part of the same track as said first peak; and means for determining one or more values for use in analysing the audio signal, based on the determined property of the first output function by (i) selecting a relevant spatial range according to a signal dependent threshold of energy of the first output function, and temporal distance between a plurality of neighbouring tracks of the first output function; (ii) determining a track center point for each of the plurality of tracks within the spatial range, each track center point representing a center of mass of a corresponding track of the first output function in the spatial range, each track centre point belonging to a plurality of second output functions; and (iii) determining a centre of mass of the determined track centre points to generate a salient formant point, wherein the salient formant point belongs to the plurality of the second output functions.
-
-
12. A system for analysing an audio signal, the audio signal comprising a speech signal, the system comprising:
-
a memory comprising data and a computer program; a processor coupled to the memory for executing the computer program, the computer program comprising instructions for; (a) receiving a digital representation of an audio signal; (b) generating a first output function, said first output function being a response of a physiological model of the human cochlea to the digital representation, the amplitude of the response representing presence of speech in the audio signal in terms of time and space; (c) selecting a temporal region of the first output function; (d) identifying a plurality of peaks from the selected temporal region of the first output function, said plurality of peaks being identified according to a rate of change of the amplitude in the temporal region; (e) comparing a first one of the plurality of peaks in a first temporal location with a spatially adjacent peak at a second temporal location to determine at least one property of the first output function by (i) comparing said first peak to said spatially adjacent peak to determine if said spatially adjacent peak is in a neighbourhood of said first peak; and (ii) generating a track function using the results of step (i), the track function storing locations of a plurality of said peaks in terms of time and space, wherein if said spatially adjacent peak is within the neighbourhood, said spatially adjacent peak is part of the same track as said first peak; and (f) determining one or more values for use in analysing the audio signal, based on the determined property of the first output function by (i) selecting a relevant spatial range according to a signal dependent threshold of energy of the first output function, and temporal distance between a plurality of neighbouring tracks of the first output function; (ii) determining a track center point for each of the plurality of tracks within the spatial range, each track center point representing a center of mass of a corresponding track of the first output function in the spatial range, each track centre point belonging to a plurality of second output functions; and (iii) determining a centre of mass of the determined track centre points to generate a salient formant point, wherein the salient formant point belongs to the plurality of the second output functions.
-
Specification