Trained artificial neural networks using an imperfect vocal tract model for assessment of speech signal quality

US 6,035,270 A
Filed: 02/03/1998
Issued: 03/07/2000
Est. Priority Date: 07/27/1995
Status: Expired due to Term

First Claim

Patent Images

1. A non-intrusive method of assessing the quality of a first signal carrying speech, said method comprising the steps of:

analyzing said signal carrying speech to generate output parameters according to a spectral representation imperfect vocal tract model capable of generating coefficients that can parametrically represent both speech and distortion signal elements, andweighting the output parameters according to a network definition function to generate an output derived from the weighted output parameters, the network definition function being generated using a trainable process, using well conditioned and/or ill-conditioned samples of a test signal, modeled by imperfect the vocal tract model.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech signal is subjected imperfect to vocal tract analysis model and the output therefrom is analyzed by a neural network. The output from the neural network is compared with the parameters stored in the network definition function, to derive measurement of the quality of the speech signal supplied to the source. The network definition function is determined by applying to the trainable processing apparatus a distortion perception measure indicative of the extent to which a distortion would be perceptible to a human listener.

32 Citations

View as Search Results

25 Claims

1. A non-intrusive method of assessing the quality of a first signal carrying speech, said method comprising the steps of:
- analyzing said signal carrying speech to generate output parameters according to a spectral representation imperfect vocal tract model capable of generating coefficients that can parametrically represent both speech and distortion signal elements, andweighting the output parameters according to a network definition function to generate an output derived from the weighted output parameters, the network definition function being generated using a trainable process, using well conditioned and/or ill-conditioned samples of a test signal, modeled by imperfect the vocal tract model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. A method as in claim 1, wherein:
    - the network definition function is established by means of the following steps;
      
      providing a training sequence comprising a first signal and a distorted version of the first signal; and
      
      determining the network definition function by measuring the perceptual degree of distortion present in each segment, as determined by an analysis process in which a distortion perception measure is generated which indicates the extent to which the distortion of said signal will be perceptible to a human listener.
  - 3. A method as in claim 2 in which the analysis process:
    - estimates the effect which would be produced on the human auditory system by distorted and undistorted versions of the same signal,determines the differences between the said effects, andgenerates said distortion perception measure in dependence upon said difference.
  - 4. A method as in claim 2, in which:
    - the analysis process generates said distortion perception measure to depend (a) upon perceptual intensity of said distortion, and (b) nonlinearly upon the amplitude of said distortion.
  - 5. A method as in claim 2, in which:
    - said analysis process estimates the effect which said distortion would produce on the human auditory system taking into account the temporal persistence of said effect.
  - 6. A method as in claim 2, in which the analysis process:
    - decomposes the distorted signal into a plurality of spectral component bands, the spectral component bands being shaped to provide spectral masking;
      
      calculates the temporal masking of the signal due to preceding and/or succeeding temporal portions thereof;
      
      forms, for each of the spectral component signals, a representation of the difference between the component signal of the distorted signal and a correspondingly calculated component of the test signal; and
      
      generates said distortion perception measure from said difference representation.
  - 7. A method as in claim 2, in which:
    - the analysis process estimates, for each spectral component signal, the masking effect which that spectral component signal would produce on the human auditory system.
  - 8. A method as in claim 6 in which:
    - the analysis process generates a measure of the spectral and temporal distribution of the distortion from said difference signal.
  - 9. A method as in claim 1 in which:
    - the network definition function weightings are dependent on the temporal context of the output parameters.
  - 10. A method as in claim 9, wherein:
    - sequences of parameters are classified with weighting values derived from a control set of parameter sequences.
  - 11. A method as in claim 10, wherein:
    - the parameters identified for each member of the sequence are stored in shortened form, and weighted according to a labelled set of sequences also stored in shortened form.
  - 12. A method as in claim 1 wherein the spectral representation is an imperfect vocal tract model.

13. Apparatus for non-intrusively assessing the quality of a first signal carrying speech, said apparatus comprising:
- means for analysing said first signal carrying speech using a spectral representation imperfect vocal tract model capable of generating coefficients that can parametrically represent both speech and distortion signal elements to generate output parameters,storage means for storing a set of weightings defining a network definition function,means for generating an output value derived from the output parameters and the network definition function; and
  
  training means for generating the stored set of weightings, the training means comprising means for supplying a sample of speech to the analysis means; and
  
  means for generating weightings relating to the speech sample, and inserting them in the storage means.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
- - 14. Apparatus as in claim 13, the training means comprising:
    - means for providing a training sequence comprising a first signal and a distorted version of the first signal,analysis means for receiving the training sequence and generating a distortion perception measure for indicating the extent to which the distortion would be perceptible to a human listener, andmeans for applying the distortion perception measure to the trainable processing apparatus to determine the network definition function.
  - 15. Apparatus as in claim 14, in which the analysis means comprises:
    - measurement means for estimating the effect which would be produced on the human auditory system by distorted and undistorted versions of the same signal,means for determining the differences between the said effects, andmeans for generating said distortion perception measure in dependence upon said difference.
  - 16. Apparatus as in claim 15, in which the analysis means comprises:
    - measurement means for decomposing the distorted signal into a plurality of spectral component bands, the spectral component bands being shaped to provide spectral masking,means for calculating the temporal masking of the signal due to preceding and/or succeeding temporal portions thereof;
      
      means for forming, for each of the spectral component signals, a representation of the difference between the component signal of the distorted signal and a correspondingly calculated component of the test signal; and
      
      calculation means for generating said distortion perception measure from said difference representation.
  - 17. Apparatus as in claim 16, in which:
    - the measurement means estimates, for each spectral component signal, the masking effect which that spectral component signal would produce on the human auditory system.
  - 18. Apparatus as in claim 16 in which:
    - the calculation means generates a measure of the spectral and temporal distribution of the distortion from said difference signal.
  - 19. Apparatus as in claim 14 in which:
    - the analysis means generates a distortion perception measure whose value is dependant (a) upon perceptual intensity of said distortion, and (b) nonlinearly upon the amplitude of said distortion.
  - 20. Apparatus as in claim 14, in which the analysis means includes:
    - measurement means for estimating the effect which said distortion would produce on the human auditory system taking into account the temporal persistence of said effect.
  - 21. Apparatus as in claim 20, which the analysis means comprises:
    - measurement means for generating a time sequence of successive processed signal segments from said test signal and/or said distorted signal, the value of at least some signal segments being generated in dependence upon portions of said test signal and/or distorted signal which precede and/or succeed said signal segments.
  - 22. Apparatus as in claim 13 in which the weightings defining the network definition function are dependant on the temporal context of the output parameters, and comprising:
    - means for storing output parameters relating to a plurality of temporal instants,the means for generating an output value being arranged to derive the output value from the stored output parameters and the network definition function.
  - 23. Apparatus as in claim 22, comprising:
    - means for storing a sequence of the output parameters as they are generated; and
      
      means for generating an output from said sequence in accordance with a set of predetermined weightings for such sequences.
  - 24. Apparatus as in claim 23, comprising:
    - means for storing the parameters of the sequences in shortened form.
  - 25. Apparatus as in claim 13 wherein the spectral representation is an imperfect vocal tract model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Psytechnics Ltd. (NetScout Systems Incorporated)
Original Assignee
British Telecommunications PLC (BT Group PLC)
Inventors
Sheppard, Philip J, Gray, Philip, Hollier, Michael P
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
SMITS, TALIVALDIS IVARS

Application Number

US09/000,270
Time in Patent Office

763 Days
Field of Search

704/202, 704/228, 706/20, 706/21
US Class Current

704/202
CPC Class Codes

G10L 19/018   Audio watermarking, i.e. em...

G10L 25/30   using neural networks

G10L 25/69   for evaluating synthetic or...

H04B 1/665   using psychoacoustic proper...

H04M 2201/40   using speech recognition sp...

H04M 3/22   Arrangements for supervisio...

H04M 3/2236   Quality of speech transmiss...

Trained artificial neural networks using an imperfect vocal tract model for assessment of speech signal quality

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

32 Citations

25 Claims

Specification

Use Cases

Quick Links

Others

Trained artificial neural networks using an imperfect vocal tract model for assessment of speech signal quality

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

32 Citations

25 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others