Method and system for objectively evaluating speech
First Claim
1. An output-based objective method for evaluating the quality of speech in a voice communication system comprising:
- providing a plurality of speech reference vectors, the speech reference vectors corresponding to a plurality of known clean speech samples obtained in a quiet environment;
receiving an unknown corrupted speech signal from an unavailable clean speech signal that is corrupted with distortions;
determining a plurality of distortions by comparing the unknown corrupted speech signal to at least one of the plurality of speech reference vectors; and
generating a score representing a subjective quality of the unknown corrupted speech signal based on the plurality of distortions.
7 Assignments
0 Petitions
Accused Products
Abstract
A method and system for objectively evaluating the quality of speech in a voice communication system. A plurality of speech reference vectors is first obtained based on a plurality of clean speech samples. A corrupted speech signal is received and processed to determine a plurality of distortions derived from a plurality of distortion measures based on the plurality of speech reference vectors. The plurality of distortions are processed by a non-linear neural network model to generate a subjective score representing user acceptance of the corrupted speech signal. The non-linear neural network model is first trained on clean speech samples as well as corrupted speech samples through the use of backpropagation to obtain the weights and bias terms necessary to predict subjective scores from several objective measures.
75 Citations
20 Claims
-
1. An output-based objective method for evaluating the quality of speech in a voice communication system comprising:
-
providing a plurality of speech reference vectors, the speech reference vectors corresponding to a plurality of known clean speech samples obtained in a quiet environment;
receiving an unknown corrupted speech signal from an unavailable clean speech signal that is corrupted with distortions;
determining a plurality of distortions by comparing the unknown corrupted speech signal to at least one of the plurality of speech reference vectors; and
generating a score representing a subjective quality of the unknown corrupted speech signal based on the plurality of distortions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
receiving a plurality of clean speech samples in the quiet environment;
performing a spectral analysis on the plurality of clean speech samples in a plurality of domains to generate analyzed speech samples; and
performing a clustering technique on the analyzed speech samples.
-
-
6. The method as recited in claim 5 wherein the clustering technique is a vector quantization.
-
7. The method as recited in claim 5 wherein the clustering technique is a k-means clustering technique.
-
8. The method as recited in claim 5 wherein performing the spectral analysis includes performing a linear predictive analysis.
-
9. The method as recited in claim 5 wherein performing the spectral analysis includes performing a perceptual linear predictive analysis.
-
10. An output-based objective system for evaluating the quality of speech in a voice communication system comprising:
-
a plurality of speech reference vectors, the speech reference vectors corresponding to a plurality of known clean speech samples obtained in a quiet environment;
means for receiving an unknown corrupted speech signal from an unavailable clean speech signal that is corrupted with distortions;
means for determining a plurality of distortions by comparing the unknown corrupted speech signal to at least one of the plurality of speech reference vectors; and
a non-linear model responsive to the plurality of distortions to generate a score representing a subjective quality of the unknown corrupted speech signal. - View Dependent Claims (11, 12, 13, 14, 16, 17, 18)
means for receiving a plurality of clean speech samples in the quiet environment;
means for performing a spectral analysis on the plurality of clean speech samples in a plurality of domains to generate analyzed speech samples; and
means for performing a clustering technique on the analyzed speech samples to generate the speech reference vectors.
-
-
16. The system as recited in claim 14 wherein the means for performing the clustering technique includes means for performing a k-means clustering technique.
-
17. The system as recited in claim 14 wherein the means for performing the spectral analysis includes means for performing a linear predictive analysis.
-
18. The system as recited in claim 14 wherein the means for performing the spectral analysis includes means for performing a perceptual linear predictive analysis.
-
15. The system as recited in claim 15 wherein the means for performing the clustering technique includes means for performing a vector quantization.
-
19. A computer readable storage medium having information stored thereon representing instructions executable by a computer to evaluate the quality of speech in a voice communication system, the computer readable storage medium further comprising:
-
instructions for providing a plurality of speech reference vectors, the speech reference vectors corresponding to a plurality of known clean speech samples obtained in a quiet environment;
instructions for receiving an unknown corrupted speech signal from an unavailable clean speech signal that is corrupted with distortions;
instructions for determining a plurality of distortions by comparing the unknown corrupted speech signal to at least one of the plurality of speech reference vectors; and
instructions for generating a score representing a subjective quality of the unknown corrupted speech signal based on the plurality of distortions. - View Dependent Claims (20)
instructions for providing a multi-layer perceptron neural network for processing the plurality of distortions.
-
Specification