Method and apparatus for speaker recognition using selected spectral information

US 5,666,466 A
Filed: 12/27/1994
Issued: 09/09/1997
Est. Priority Date: 12/27/1994
Status: Expired due to Term

First Claim

Patent Images

1. In a method for speaker recognition, of the type wherein a speech sample is processed, the improvement comprising band-pass filtering the speech sample with a filter having a minimum frequency greater than 1 kHz and a maximum frequency between 5 and 10 kHz, wherein said band-pass filtering preserves spectral information in the speech sample for a range of frequencies between the minimum frequency and the maximum frequency, and excludes spectral information outside of the range.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus are disclosed for robust, text-independent (and text-dependent) speaker recognition in which identification of a speaker is based on selected spectral information from the speaker'"'"'s voice. Traditionally, speaker recognition systems (i) render a speech sample in the frequency domain to produce a spectrum, (ii) produce cepstrum coefficients from the spectrum, (iii) produce a codebook from the cepstrum coefficients, and (iv) use the codebook as the feature measure for comparing training speech samples with testing speech samples. The present invention, on the other hand, introduces the important and previously unknown step of truncating the spectrum prior to producing the cepstrum coefficients. Through the use of selected spectra as the feature measure for speaker recognition, the present invention has been shown to yield significant improvements in performance over prior art systems.

41 Citations

View as Search Results

10 Claims

1. In a method for speaker recognition, of the type wherein a speech sample is processed, the improvement comprising band-pass filtering the speech sample with a filter having a minimum frequency greater than 1 kHz and a maximum frequency between 5 and 10 kHz, wherein said band-pass filtering preserves spectral information in the speech sample for a range of frequencies between the minimum frequency and the maximum frequency, and excludes spectral information outside of the range.
- View Dependent Claims (2, 3, 10)
- - 2. The method of claim 1, wherein said minimum frequency is greater than 2 KHz and is chosen to substantially exclude phonetic/linguistic information from the speech sample, and said maximum frequency is less than 8 kHz and is chosen to substantially include identity information from the speech sample.
  - 3. The method of claim 1, wherein said minimum frequency is 3 kHz and said maximum frequency is 8 kHz.
  - 10. A machine for speaker recognition according to the method of any one of claims 1, 2, 3, 4, 5, 6, 7, 8, or 9.

4. In a method for speaker recognition, of the type wherein a speech sample is transformed into the frequency domain to produce a spectrum for processing, improvement comprising truncating the spectrum, wherein said truncation preserves spectral information in the spectrum for a range of frequencies between a minimum frequency greater than 1 kHz and a maximum frequency between 5 and 10 kHz, and excludes spectral information outside of the range.
- View Dependent Claims (5, 6)
- - 5. The method of claim 4, wherein said minimum frequency is greater than 2 kHz and is chosen to substantially exclude phonetic/linguistic information from the speech sample, and said maximum frequency is less than 8 kHz and is chosen to substantially include identity information from the speech sample.
  - 6. The method of claim 4, wherein said minimum frequency is 3 kHz and said maximum frequency is 8 kHz.

7. A method for speaker recognition which relies on an accumulated vector distance between a training speech sample and a testing speech sample, which method comprises the steps of:
- (a) dividing the training speech sample into a plurality of speech frames;
  
  (b) for each speech frame of the training speech sample;
  
  transforming the speech frame into the frequency domain to produce a spectrum;
  
  truncating the spectrum to produce a selected spectrum, said selected spectrum preserving spectral information in the speech frame for a range of frequencies between a minimum frequency greater than 1 kHz and a maximum frequency between 5 and 10 kHz, and excluding spectral information outside of the range; and
  
  producing cepstrum coefficients based on the selected spectrum;
  
  (c) preparing a codebook collectively based on the cepstrum coefficients produced in step (b) for each speech frame, the codebook (i) having a plurality of codebook vectors and (ii) being the feature measure for comparing the training speech sample with the testing speech sample;
  
  (d) dividing the testing speech sample into a plurality of speech frames;
  
  (e) for each speech frame of the testing speech sample;
  
  transforming the speech frame into the frequency domain to produce a spectrum;
  
  truncating the spectrum to produce a selected spectrum, said selected spectrum preserving spectral information in the speech frame for the range of frequencies between the minimum frequency and the maximum frequency, and excluding spectral information outside of the range;
  
  producing cepstrum coefficients based on the selected spectrum;
  
  for each codebook vector of the codebook prepared in step (c), computing a vector distance between the codebook vector and the cepstrum coefficients; and
  
  selecting a minimum vector distance from the vector distances for all the codebook vectors; and
  
  (f) computing the accumulated vector distance based on a summation of the minimum vector distances produced in step (e) for each speech frame, the accumulated vector distance being the feature measure for determining speaker recognition.
- View Dependent Claims (8, 9)
- - 8. The method of claim 7, wherein said minimum frequency is greater than 2 kHz and is chosen to substantially exclude phonetic/linguistic information from the training speech sample, and said maximum frequency is less than 8 kHz and is chosen to substantially include identity information from the training speech sample.
  - 9. The method of claim 7, wherein said minimum frequency is 3 kHz and said maximum frequency is 8 kHz.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Rutgers University
Original Assignee
Rutgers University
Inventors
Jan, Ea-Ee, Flanagan, James L., Lin, Qiguang
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Smits, Talivaldis Ivars

Application Number

US08/365,598
Time in Patent Office

987 Days
Field of Search

395/2.55, 395/2.56, 395/2.3, 395/2.31, 395/2.5
US Class Current

704/246
CPC Class Codes

G10L 17/00   Speaker identification or v...

G10L 17/02   Preprocessing operations, e...

G10L 17/04   Training, enrolment or mode...

G10L 25/24   the extracted parameters be...

Method and apparatus for speaker recognition using selected spectral information

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

41 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for speaker recognition using selected spectral information

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

41 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links