System and method for sampling rate transformation in speech recognition
First Claim
1. A method for transforming a sampling rate in speech recognition systems comprising the steps of:
- providing cepstral based data including utterances comprised of segments at a reference frequency, the segments being represented by cepstral vector coefficients;
converting the cepstral vector coefficients to energy bands in logarithmic spectra;
filtering the energy bands of the logarithmic spectra to remove energy bands having a frequency above a predetermined portion of a target frequency; and
converting the filtered logarithmic spectra to modified cepstral vector coefficients at the target frequency the target frequency being different than the reference frequency.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for transforming a sampling rate in speech recognition systems, in accordance with the present invention, includes the steps of providing cepstral based data including utterances comprised of segments at a reference frequency, the segments being represented by cepstral vector coefficients, converting the cepstral vector coefficients to energy bands in logarithmic spectra, filtering the energy bands of the logarithmic spectra to remove energy bands having a frequency above a predetermined portion of a target frequency and converting the filtered logarithmic spectra to modified cepstral vector coefficients at the target frequency. Another method and system convert system prototypes for speech recognition systems from a reference frequency to a target frequency.
44 Citations
30 Claims
-
1. A method for transforming a sampling rate in speech recognition systems comprising the steps of:
-
providing cepstral based data including utterances comprised of segments at a reference frequency, the segments being represented by cepstral vector coefficients;
converting the cepstral vector coefficients to energy bands in logarithmic spectra;
filtering the energy bands of the logarithmic spectra to remove energy bands having a frequency above a predetermined portion of a target frequency; and
converting the filtered logarithmic spectra to modified cepstral vector coefficients at the target frequency the target frequency being different than the reference frequency. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
rescaling the mel-filters.
-
-
3. The method as recited in claim 1, wherein the step of converting the cepstral vector coefficients to energy bands in logarithmic spectra includes converting the cepstral vector coefficients to energy bands in logarithmic spectra by employing an inverse discrete cosine transform (IDCT).
-
4. The method as recited in claim 1, wherein the step of filtering the energy bands includes the step of filtering the energy bands to remove energy bands above one-half the target frequency.
-
5. The method as recited in claim 1, wherein the step of converting the filtered logarithmic spectra to modified cepstral vector coefficients at the target frequency includes the step of converting the filtered logarithmic spectra to modified cepstral vector coefficients at the target frequency by performing a discrete cosine transform (DCT).
-
6. The method as recited in claim 1, further comprising the step of estimating maximum and mean values of segment energies at the reference frequency and at the target frequency.
-
7. The method as recited in claim 6, further comprising the step of outputting a global maximum and mean at the reference frequency for denormalizing system prototypes of a speech recognition system.
-
8. The method as recited in claim 6, further comprising the step of outputting a global maximum and mean at the target frequency for energy normalization of system prototypes of a speech recognition system.
-
9. A method for transforming a sampling rate in speech recognition systems comprising the steps of:
-
providing system prototypes including distributions of normalized cepstral vectors at a reference frequency;
denormalizing the normalized cepstral vectors at the reference frequency;
converting the denormalized to energy bands in logarithmic spectra;
filtering the energy bands of the logarithmic spectra to truncate energy bands having a frequency above a predetermined portion of a target frequency;
converting the filtered energy bands to modified cepstral vectors; and
normalizing the modified cepstral vectors at the target frequency such that the system prototypes are sampled at the target frequency. - View Dependent Claims (10, 11, 12, 13, 14, 15)
rescaling the mel-filters.
-
-
11. The method as recited in claim 9, wherein the step of converting the denormalized to energy bands in logarithmic spectra includes converting the denormalized to energy bands in logarithmic spectra by employing an inverse discrete cosine transform (IDCT).
-
12. The method as recited in claim 9, wherein the step of filtering the energy bands includes the step of filtering the energy bands to remove energy bands above one-half the target frequency.
-
13. The method as recited in claim 9, wherein the step of converting the filtered energy bands to modified cepstral vectors includes the step of converting the filtered energy bands to modified cepstral vectors by performing a discrete cosine transform (DCT).
-
14. The method as recited in claim 9, wherein the step of denormalizing the normalized cepstral vectors at the reference frequency further comprises the step of inputting global maximum and mean values of segment energies at the reference frequency to denormalize the normalized cepstral vectors of the system prototypes at the reference frequency.
-
15. The method as recited in claim 9, wherein the step of normalizing the modified cepstral vectors further comprises the step of inputting global maximum and mean values of segment energies at the target frequency to normalize the cepstral vectors of the system prototypes at the target frequency.
-
16. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for transforming a sampling rate in speech recognition systems, the method steps comprising:
-
providing cepstral based data including utterances comprised of segments at a reference frequency, the segments being represented by cepstral vector coefficients;
converting the cepstral vector coefficients to energy bands in logarithmic spectra;
filtering the energy bands of the logarithmic spectra to remove energy bands having a frequency above a predetermined portion of a target frequency; and
converting the filtered logarithmic spectra to modified cepstral vector coefficients at the target frequency, the target frequency being different than the reference frequency. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
rescaling the mel-filters.
-
-
18. The program storage device as recited in claim 16, wherein the step of converting the cepstral vector coefficients to energy bands in logarithmic spectra includes converting the cepstral vector coefficients to energy bands in logarithmic spectra by employing an inverse discrete cosine transform (IDCT).
-
19. The program storage device as recited in claim 16, wherein the step of filtering the energy bands includes the step of filtering the energy bands to remove energy bands above one-half the target frequency.
-
20. The program storage device as recited in claim 16, wherein the step of converting the filtered logarithmic spectra to modified cepstral vector coefficients at the target frequency includes the step of converting the filtered logarithmic spectra to modified cepstral vector coefficients at the target frequency by performing a discrete cosine transform (DCT).
-
21. The program storage device as recited in claim 16, further comprising the step of estimating maximum and mean values of segment energies at the reference frequency and at the target frequency.
-
22. The program storage device as recited in claim 21, further comprising the step of outputting a global maximum and mean at the reference frequency for denormalizing system prototypes of a speech recognition system.
-
23. The program storage device as recited in claim 21, further comprising the step of outputting a global maximum and mean at the target frequency for energy normalization of system prototypes of a speech recognition system.
-
24. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for transforming a sampling rate in speech recognition systems, the method steps comprising:
-
providing system prototypes including distributions of normalized cepstral vectors at a reference frequency;
denormalizing the normalized cepstral vectors at the reference frequency;
converting the denormalized to energy bands in logarithmic spectra;
filtering the energy bands of the logarithmic spectra to truncate energy bands having a frequency above a predetermined portion of a target frequency;
converting the filtered energy bands to modified cepstral vectors; and
normalizing the modified cepstral vectors at the target frequency such that the system prototypes are sampled at the target frequency. - View Dependent Claims (25, 26, 27, 28, 29, 30)
rescaling the mel-filters.
-
-
26. The program storage device as recited in claim 24, wherein the step of converting the denormalized to energy bands in logarithmic spectra includes converting the denormalized to energy bands in logarithmic spectra by employing an inverse discrete cosine transform (IDCT).
-
27. The program storage device as recited in claim 24, wherein the step of filtering the energy bands includes the step of filtering the energy bands to remove energy bands above one-half the target frequency.
-
28. The program storage device as recited in claim 24, wherein the step of converting the filtered energy bands to modified cepstral vectors includes the step of converting the filtered energy bands to modified cepstral vectors by performing a discrete cosine transform (DCT).
-
29. The program storage device as recited in claim 24, wherein the step of denormalizing the normalized cepstral vectors at the reference frequency further comprises the step of inputting global maximum and mean values of segment energies at the reference frequency to denormalize the normalized cepstral vectors of the system prototypes at the reference frequency.
-
30. The program storage device as recited in claim 24, wherein the step of normalizing the modified cepstral vectors further comprises the step of inputting global maximum and mean values of segment energies at the target frequency to normalize the cepstral vectors of the system prototypes at the target frequency.
Specification