Method and apparatus for speaker recognition using selected spectral information
First Claim
1. In a method for speaker recognition, of the type wherein a speech sample is processed, the improvement comprising band-pass filtering the speech sample with a filter having a minimum frequency greater than 1 kHz and a maximum frequency between 5 and 10 kHz, wherein said band-pass filtering preserves spectral information in the speech sample for a range of frequencies between the minimum frequency and the maximum frequency, and excludes spectral information outside of the range.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus are disclosed for robust, text-independent (and text-dependent) speaker recognition in which identification of a speaker is based on selected spectral information from the speaker'"'"'s voice. Traditionally, speaker recognition systems (i) render a speech sample in the frequency domain to produce a spectrum, (ii) produce cepstrum coefficients from the spectrum, (iii) produce a codebook from the cepstrum coefficients, and (iv) use the codebook as the feature measure for comparing training speech samples with testing speech samples. The present invention, on the other hand, introduces the important and previously unknown step of truncating the spectrum prior to producing the cepstrum coefficients. Through the use of selected spectra as the feature measure for speaker recognition, the present invention has been shown to yield significant improvements in performance over prior art systems.
41 Citations
10 Claims
- 1. In a method for speaker recognition, of the type wherein a speech sample is processed, the improvement comprising band-pass filtering the speech sample with a filter having a minimum frequency greater than 1 kHz and a maximum frequency between 5 and 10 kHz, wherein said band-pass filtering preserves spectral information in the speech sample for a range of frequencies between the minimum frequency and the maximum frequency, and excludes spectral information outside of the range.
- 4. In a method for speaker recognition, of the type wherein a speech sample is transformed into the frequency domain to produce a spectrum for processing, improvement comprising truncating the spectrum, wherein said truncation preserves spectral information in the spectrum for a range of frequencies between a minimum frequency greater than 1 kHz and a maximum frequency between 5 and 10 kHz, and excludes spectral information outside of the range.
-
7. A method for speaker recognition which relies on an accumulated vector distance between a training speech sample and a testing speech sample, which method comprises the steps of:
-
(a) dividing the training speech sample into a plurality of speech frames; (b) for each speech frame of the training speech sample; transforming the speech frame into the frequency domain to produce a spectrum; truncating the spectrum to produce a selected spectrum, said selected spectrum preserving spectral information in the speech frame for a range of frequencies between a minimum frequency greater than 1 kHz and a maximum frequency between 5 and 10 kHz, and excluding spectral information outside of the range; and producing cepstrum coefficients based on the selected spectrum; (c) preparing a codebook collectively based on the cepstrum coefficients produced in step (b) for each speech frame, the codebook (i) having a plurality of codebook vectors and (ii) being the feature measure for comparing the training speech sample with the testing speech sample; (d) dividing the testing speech sample into a plurality of speech frames; (e) for each speech frame of the testing speech sample; transforming the speech frame into the frequency domain to produce a spectrum; truncating the spectrum to produce a selected spectrum, said selected spectrum preserving spectral information in the speech frame for the range of frequencies between the minimum frequency and the maximum frequency, and excluding spectral information outside of the range; producing cepstrum coefficients based on the selected spectrum; for each codebook vector of the codebook prepared in step (c), computing a vector distance between the codebook vector and the cepstrum coefficients; and selecting a minimum vector distance from the vector distances for all the codebook vectors; and (f) computing the accumulated vector distance based on a summation of the minimum vector distances produced in step (e) for each speech frame, the accumulated vector distance being the feature measure for determining speaker recognition. - View Dependent Claims (8, 9)
-
Specification