Method of analysing an audio signal

US 8,990,081 B2
Filed: 09/11/2009
Issued: 03/24/2015
Est. Priority Date: 09/19/2008
Status: Active Grant

First Claim

Patent Images

1. A method of analysing an audio signal, the audio signal comprising a speech signal, the method comprising the steps of:

(a) receiving a digital representation of the audio signal;

(b) generating a first output function, said first output function being a response of a physiological model of a human cochlea to the digital representation, the amplitude of the response representing presence of speech in the audio signal in terms of time and space;

(c) selecting a temporal region of the first output function;

(d) identifying a plurality of peaks from the selected temporal region of the first output function, said plurality of peaks being identified according to a rate of change of the amplitude in the temporal region;

(e) comparing a first one of the plurality of peaks in a first temporal location with a spatially adjacent peak at a second temporal location to determine at least one property of the first output function by(i) comparing said first peak to said spatially adjacent peak to determine if said spatially adjacent peak is in a neighbourhood of said first peak; and

(ii) generating a track function using the results of step (i), the track function storing locations of a plurality of said peaks in terms of time and space, wherein if said spatially adjacent peak is within the neighbourhood, said spatially adjacent peak is part of the same track as said first peak; and

(f) determining one or more values for use in analysing the audio signal, based on the determined property of the first output function by(i) selecting a relevant spatial range according to a signal dependent threshold of energy of the first output function, and temporal distance between a plurality of neighbouring tracks of the first output function;

(ii) determining a track center point for each of the plurality of tracks within the spatial range, each track center point representing a center of mass of a corresponding track of the first output function in the spatial range, each track centre point belonging to a plurality of second output functions; and

(iii) determining a centre of mass of the determined track centre points to generate a salient formant point, wherein the salient formant point belongs to the plurality of the second output functions.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of analyzing an audio signal is disclosed. A digital representation of an audio signal is received and a first output function is generated based on a response of a physiological model to the digital representation. At least one property of the first output function may be determined. One or more values are determined for use in analyzing the audio signal, based on the determined property of the first output function.

56 Citations

View as Search Results

12 Claims

1. A method of analysing an audio signal, the audio signal comprising a speech signal, the method comprising the steps of:
- (a) receiving a digital representation of the audio signal;
  
  (b) generating a first output function, said first output function being a response of a physiological model of a human cochlea to the digital representation, the amplitude of the response representing presence of speech in the audio signal in terms of time and space;
  
  (c) selecting a temporal region of the first output function;
  
  (d) identifying a plurality of peaks from the selected temporal region of the first output function, said plurality of peaks being identified according to a rate of change of the amplitude in the temporal region;
  
  (e) comparing a first one of the plurality of peaks in a first temporal location with a spatially adjacent peak at a second temporal location to determine at least one property of the first output function by(i) comparing said first peak to said spatially adjacent peak to determine if said spatially adjacent peak is in a neighbourhood of said first peak; and
  
  (ii) generating a track function using the results of step (i), the track function storing locations of a plurality of said peaks in terms of time and space, wherein if said spatially adjacent peak is within the neighbourhood, said spatially adjacent peak is part of the same track as said first peak; and
  
  (f) determining one or more values for use in analysing the audio signal, based on the determined property of the first output function by(i) selecting a relevant spatial range according to a signal dependent threshold of energy of the first output function, and temporal distance between a plurality of neighbouring tracks of the first output function;
  
  (ii) determining a track center point for each of the plurality of tracks within the spatial range, each track center point representing a center of mass of a corresponding track of the first output function in the spatial range, each track centre point belonging to a plurality of second output functions; and
  
  (iii) determining a centre of mass of the determined track centre points to generate a salient formant point, wherein the salient formant point belongs to the plurality of the second output functions.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method according to claim 1, wherein the determination is made at step (f) using the first output function.
  - 3. A method according to claim 1, wherein the physiological model is a one, two 15 or three dimensional hydro-mechanical cochlear model, wherein the dimension refers to spatial dimensions.
  - 4. A method according to claim 3, wherein the first output function includes a basilar membrane response.
  - 5. A method according to claim 3, wherein the first output function includes an inner hair cell response.
  - 6. A method according to claim 4, wherein the first output function is a dimensional matrix comprising first and second dimensions, the first dimension corresponding to a temporal axis and the second dimension corresponding to a spatial axis.
  - 7. A method according to claim 5, wherein the first output function is a dimensional matrix comprising first and second dimensions, the first dimension corresponding to a temporal axis and the second dimension corresponding to a spatial axis.
  - 8. The method according to claim 1, further comprising the step of determining an objective measure of speech quality based on the determined values extracted from the audio signal.
  - 9. The method according to claim 1, further comprising the step of matching a word based on the determined values extracted from the audio signal.
  - 10. The method according to claim 1, further comprising the step of identifying a speaker based on the determined values extracted from the audio signal.

11. An apparatus for analysing an audio signal, the audio signal comprising a speech signal, the apparatus comprising:
- means for receiving a digital representation of an audio signal;
  
  means for generating a first output function, said first output function being a response of a physiological model of the human cochlea to the digital representation, the amplitude of the response representing presence of speech in the audio signal in terms of time and space;
  
  means for selecting a temporal region of the first output function;
  
  means for identifying a plurality of peaks from the selected temporal region of the first output function, said plurality of peaks being identified according to a rate of change of the amplitude in the temporal region;
  
  means for comparing a first one of the plurality of peaks in a first temporal location with a spatially adjacent peak at a second temporal location to determine at least one property of the first output function by(i) comparing said first peak to said spatially adjacent peak to determine if said spatially adjacent peak is in a neighbourhood of said first peak; and
  
  (ii) generating a track function using the results of step (i), the track function storing locations of a plurality of said peaks in terms of time and space, wherein if said spatially adjacent peak is within the neighbourhood, said spatially adjacent peak is part of the same track as said first peak; and
  
  means for determining one or more values for use in analysing the audio signal, based on the determined property of the first output function by(i) selecting a relevant spatial range according to a signal dependent threshold of energy of the first output function, and temporal distance between a plurality of neighbouring tracks of the first output function;
  
  (ii) determining a track center point for each of the plurality of tracks within the spatial range, each track center point representing a center of mass of a corresponding track of the first output function in the spatial range, each track centre point belonging to a plurality of second output functions; and
  
  (iii) determining a centre of mass of the determined track centre points to generate a salient formant point, wherein the salient formant point belongs to the plurality of the second output functions.

12. A system for analysing an audio signal, the audio signal comprising a speech signal, the system comprising:
- a memory comprising data and a computer program;
  
  a processor coupled to the memory for executing the computer program, the computer program comprising instructions for;
  
  (a) receiving a digital representation of an audio signal;
  
  (b) generating a first output function, said first output function being a response of a physiological model of the human cochlea to the digital representation, the amplitude of the response representing presence of speech in the audio signal in terms of time and space;
  
  (c) selecting a temporal region of the first output function;
  
  (d) identifying a plurality of peaks from the selected temporal region of the first output function, said plurality of peaks being identified according to a rate of change of the amplitude in the temporal region;
  
  (e) comparing a first one of the plurality of peaks in a first temporal location with a spatially adjacent peak at a second temporal location to determine at least one property of the first output function by(i) comparing said first peak to said spatially adjacent peak to determine if said spatially adjacent peak is in a neighbourhood of said first peak; and
  
  (ii) generating a track function using the results of step (i), the track function storing locations of a plurality of said peaks in terms of time and space, wherein if said spatially adjacent peak is within the neighbourhood, said spatially adjacent peak is part of the same track as said first peak; and
  
  (f) determining one or more values for use in analysing the audio signal, based on the determined property of the first output function by(i) selecting a relevant spatial range according to a signal dependent threshold of energy of the first output function, and temporal distance between a plurality of neighbouring tracks of the first output function;
  
  (ii) determining a track center point for each of the plurality of tracks within the spatial range, each track center point representing a center of mass of a corresponding track of the first output function in the spatial range, each track centre point belonging to a plurality of second output functions; and
  
  (iii) determining a centre of mass of the determined track centre points to generate a salient formant point, wherein the salient formant point belongs to the plurality of the second output functions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NewSouth Innovations Pty Limited (University Of New South Wales)
Original Assignee
NewSouth Innovations Pty Limited (University Of New South Wales)
Inventors
Lu, Wenliang, Sen, Dipanjan
Primary Examiner(s)
Shah, Paras D
Assistant Examiner(s)
Sirjani, Fariba

Application Number

US13/119,898
Publication Number

US 20110213614A1
Time in Patent Office

2,020 Days
Field of Search

704/236
US Class Current

704/236
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 17/02   Preprocessing operations, e...

G10L 25/15   the extracted parameters be...

G10L 25/69   for evaluating synthetic or...

Method of analysing an audio signal

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

56 Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Method of analysing an audio signal

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

56 Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links