Speaker identification and verification system

US 5,522,012 A
Filed: 02/28/1994
Issued: 05/28/1996
Est. Priority Date: 02/28/1994
Status: Expired due to Term

First Claim

Patent Images

1. A method for speaker recognition comprising the steps of:

windowing a speech segment into a plurality of speech frames;

determining linear prediction coefficients from a linear predictive polynomial for each said frame of speech;

determining a first cepstral coefficient from said linear prediction coefficients in which first cepstrum information comprises said first cepstral coefficient;

applying an all pole filter to said linear prediction polynomial;

determining a plurality of roots of said linear prediction polynomial from the poles of said all pole filter, each said root including a residue component;

selecting one of said frames having a predetermined number of said roots within a unit circle of the z-plane in which said selected frames form said predetermined components of said first cepstrum information;

applying weightings to predetermined components from said first cepstrum information for producing an adaptive component weighting cepstrum to attenuate broad bandwidth components in said speech signal, by determining a finite impulse response filter for emphasizing the speech formants of said speech signal and attenuating said residue components comprising the steps of determining a finite impulse response filter for emphasizing the speech formants of said speech signal and attentuating said residue components, determining adaptive component weighting coefficients from said finite impulse response filter, determining a second cepstral coefficient from said adaptive component weighting coefficients, and subtracting said second cepstral coefficient from said first cepstral coefficient for forming said adaptive component weighting cepstrum; and

recognizing said adaptive component weighting cepstrum by calculating similarity of said adaptive component weighting cepstrum and a plurality of speech patterns which were produced by a plurality of speaking persons in advance.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to a speaker recognition method and system which applies adaptive component weighting to each frame of speech for attenuating non-vocal tract components and normalizing speech components. A linear predictive all pole model is used to select frames for an adaptively weighted cepstrum. Frames with a predetermined number of resonances are selected for cepstrum analysis. An adaptively weighted cepstrum is determined from a new transfer function. A normalized cepstrum is determined having improved characteristics for speech components. From the improved speech components, improved speaker recognition over a channel is obtained.

Citations

11 Claims

1. A method for speaker recognition comprising the steps of:
- windowing a speech segment into a plurality of speech frames;
  
  determining linear prediction coefficients from a linear predictive polynomial for each said frame of speech;
  
  determining a first cepstral coefficient from said linear prediction coefficients in which first cepstrum information comprises said first cepstral coefficient;
  
  applying an all pole filter to said linear prediction polynomial;
  
  determining a plurality of roots of said linear prediction polynomial from the poles of said all pole filter, each said root including a residue component;
  
  selecting one of said frames having a predetermined number of said roots within a unit circle of the z-plane in which said selected frames form said predetermined components of said first cepstrum information;
  
  applying weightings to predetermined components from said first cepstrum information for producing an adaptive component weighting cepstrum to attenuate broad bandwidth components in said speech signal, by determining a finite impulse response filter for emphasizing the speech formants of said speech signal and attenuating said residue components comprising the steps of determining a finite impulse response filter for emphasizing the speech formants of said speech signal and attentuating said residue components, determining adaptive component weighting coefficients from said finite impulse response filter, determining a second cepstral coefficient from said adaptive component weighting coefficients, and subtracting said second cepstral coefficient from said first cepstral coefficient for forming said adaptive component weighting cepstrum; and
  
  recognizing said adaptive component weighting cepstrum by calculating similarity of said adaptive component weighting cepstrum and a plurality of speech patterns which were produced by a plurality of speaking persons in advance.

2. A system for speaker recognition comprising:
- means for converting a speech signal into a plurality of frames of digital speech;
  
  speech parameter extracting means for converting said digital speech into first cepstrum information, said speech parameter extracting means comprising an all pole linear predictive (LPC) filter means, for determining a plurality of roots of said LPC filter, each said root including a residue component, and means for selecting ones of said frames having a predetermined number of said roots within a unit circle of the z-plane wherein said selected frames form said predetermined components of said first cepstrum information;
  
  speech parameter enhancing means for applying adaptive weightings to said first cepstrum information for producing an adaptive component weighting cepstrum to attenuate broad bandwidth components in said speech signal, said speech parameter enhancing means comprising, a finite impulse response filter for emphasizing the speech formants of said speech signal and attenuating said residue components, means for computing adaptive component weighting coefficients from said finite impulse response filter, means for computing a second cepstral coefficient from said adaptive component weighting coefficients, and means for subtracting said second cepstral coefficient from said first cepstral coefficient for forming said adaptive component weighting cepstrum; and
  
  evaluation means for determining a similarity of said adaptive component weighting cepstrum with a plurality of speech samples which were produced by a plurality of speaking persons in advance.

3. A method for speaker recognition comprising the steps of:
- windowing a speech segment into a plurality of speech frames;
  
  determining linear prediction coefficients from a linear predictive polynomial for each said frame of speech;
  
  determining a first cepstral coefficient from said linear prediction coefficients in which first cepstrum information comprises said first cepstral coefficient;
  
  applying an all pole filter to said linear prediction polynomial;
  
  determining a plurality of roots of said linear prediction polynomial from the poles of said all pole filter, each said root including a residue component;
  
  selecting one of said frames having a predetermined number of said roots within a unit circle of the z-plane in which said selected frames form said predetermined components of said first cepstrum information;
  
  applying weightings to predetermined components from said first cepstrum information for producing an adaptive component weighting cepstrum to attentuate broad bandwidth components in said speech signal, by determining a finite impulse response filter for emphasizing the speech formants of said speech signal and attentuating said residue components and determining adaptive component weighting coefficients from said finite impulse response filter; and
  
  recognizing said adaptive component weighting cepstrum by calculating similarity of said adaptive component weighting cepstrum and a plurality of speech patterns which were produced by a plurality of speaking persons in advance.
- View Dependent Claims (4, 5, 6, 7)
- - 4. The method of claim 3 wherein said finite impulse response filter normalizes said residue components of said first spectrum.
  - 5. The method of claim 4 wherein said finite impulse response filter corresponds to an adaptive component weighting spectrum of the form ##EQU12## wherein b_i are said adaptive component weighting coefficients and P is the order of the LP analysis.
  - 6. The method of claim 5 further comprising the step of:
    - classifying said adaptive component weighting cepstrum in a classification means as said plurality of speech patterns.
  - 7. The method of said claim 6 further comprising the step of:
    - determining said similarity of said adaptive component weighting cepstrum with said speech patterns by matching said adaptive component weighting cepstrum with said classified adaptive component weighting cepstrum in said classification means.

8. A system for speaker recognition comprising:
- means for converting a speech signal into a plurality of frames of digital speech;
  
  speech parameter extracting means for converting said digital speech into first cepstrum information, said speech parameter extracting means comprising an all pole linear predictive (LPC) filter means, for determining a plurality of roots of said LPC filter, each said root including a residue component, and means for selecting ones of said frames having a predetermined number of said roots within a unit circle of the z-plane wherein said selected frames form said predetermined components of said first cepstrum information;
  
  speech parameter enhancing means for applying adaptive weightings to said first cepstrum information for producing an adaptive component weighting cepstrum to attenuate broad bandwidth components in said speech signal, said speech parameter enhancing means comprising, a finite impulse response filter for emphasizing the speech formants of said speech signal and attenuating said residue components, means for computing adaptive component weighting coefficients from said finite impulse response filter; and
  
  evaluation means for determining a similarity of said adaptive component weighting cepstrum with a plurality of speech samples which were produced by a plurality of speaking persons in advance.
- View Dependent Claims (9, 10, 11)
- - 9. The system of said components of claim 8 wherein said finite impulse response filter corresponds to an adaptive component weighting spectrum of the form ##EQU13## wherein b_i are said adaptive component weighting coefficients and P is the order of analysis.
  - 10. The system of claim 9 further comprising:
    - means for classifying said adaptive component weighting cepstrum as said plurality of speech patterns.
  - 11. The system of said claim 10 further comprising:
    - means for determining said similarity of said adaptive component weighting cepstrum with said speech patterns by matching said adaptive component weighting cepstrum with said stored adaptive component weighting cepstrum in said classification means.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Bank One Colorado NA As Agent
Original Assignee
Rutgers University
Inventors
Assaleh, Khaled T., Mammone, Richard J.
Primary Examiner(s)
Moore, David K.
Assistant Examiner(s)
HAFIZ, TARIQ R

Application Number

US08/203,988
Time in Patent Office

820 Days
Field of Search

381/41-43, 381/29, 395/2, 395/2.4, 395/2.5, 395/2.55, 395/2.6, 395/2.63, 395/2.17, 395/2.81
US Class Current

704/250
CPC Class Codes

G10L 17/02   Preprocessing operations, e...

G10L 25/12   the extracted parameters be...

G10L 25/18   the extracted parameters be...

G10L 25/24   the extracted parameters be...

Speaker identification and verification system

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker identification and verification system

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links