Method and apparatus for training a speaker recognition system

US 5,864,807 A
Filed: 02/25/1997
Issued: 01/26/1999
Est. Priority Date: 02/25/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method for training a speaker recognition system by a computer, comprising steps of:

extracting speech parameters from a digitized audio signal to produce a set of differentiating factors (r);

storing said set of differentiating factors (r) in a data base to produce a stored set of differentiating factors (r);

polynomial pattern classifying by the computer said stored set of differentiating factors (r) to produce a first digital audio signature (w);

storing said first digital audio signature (w) in said data base to produce a stored first digital audio signature (w);

specifying a first speaker as having audio signature features X¹, X², . . . , X^M ;

specifying a second speaker, as having audio signature features Y¹, Y². . . , Y^M ;

discriminating between said first speaker and said second speaker;

training for said first digital audio signature (w) features of said first speaker, a polynomial for a 2-norm to an ideal output of said first speaker, and an ideal output of 0 for said second speaker;

representing a matrix whose rows are a polynomial expansions of said first and second sneakers audio signature features, ##EQU5## and where o₁ is a column vector of length 2M whose first M entries are 1 and remaining entries are 0, and o₂ =1-o₁ andtraining for said first speaker and said second speaker respectively being;

##EQU6##

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for training a system to assess the identity of a person through the audio characteristics of their voice. The system inserts an audio input (10) into an A/D Converter (20) for processing in a digital signal processor (30). The system then applies Neural network type processing by using a polynomial pattern classifier (60) for training the speaker recognition system.

Citations

29 Claims

1. A method for training a speaker recognition system by a computer, comprising steps of:
- extracting speech parameters from a digitized audio signal to produce a set of differentiating factors (r);
  
  storing said set of differentiating factors (r) in a data base to produce a stored set of differentiating factors (r);
  
  polynomial pattern classifying by the computer said stored set of differentiating factors (r) to produce a first digital audio signature (w);
  
  storing said first digital audio signature (w) in said data base to produce a stored first digital audio signature (w);
  
  specifying a first speaker as having audio signature features X¹, X², . . . , X^M ;
  
  specifying a second speaker, as having audio signature features Y¹, Y². . . , Y^M ;
  
  discriminating between said first speaker and said second speaker;
  
  training for said first digital audio signature (w) features of said first speaker, a polynomial for a 2-norm to an ideal output of said first speaker, and an ideal output of 0 for said second speaker;
  
  representing a matrix whose rows are a polynomial expansions of said first and second sneakers audio signature features, ##EQU5## and where o₁ is a column vector of length 2M whose first M entries are 1 and remaining entries are 0, and o₂ =1-o₁ andtraining for said first speaker and said second speaker respectively being;
  
  ##EQU6##
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. A method for training a speaker recognition system by said computer as claimed in claim 1, wherein the there is further included a step of providing an audio input of a voice of a speaker from a recorded medium.
  - 3. A method for training a speaker recognition system by said computer as claimed in claim 1, wherein there is further included a step of converting an audio input to said digitized audio signal.
  - 4. A method for training a speaker recognition system by said computer as claimed in claim 1, wherein there is further included a step of sending said set of differentiating factors (r) to said data base by a data link.
  - 5. A method for training a speaker recognition system by said computer as claimed in claim 1, wherein the step of extracting speech parameters includes a step of determining a frequency domain representation of a short-time power spectra and cepstra of said digitized audio signal.
  - 6. A method for training a speaker recognition system by said computer as claimed in claim 1, wherein the step of extracting speech parameters includes a step of determining a frequency domain representation of transitional information (dynamic) short-time power spectra and delta-cepstra of said digitized audio signal.
  - 7. A method for training a speaker recognition system by said computer as claimed in claim 1, wherein the step of extracting speech parameters includes a step of determining non-linearly processed filter bank output of said digitized audio signal.
  - 8. A method for training a speaker recognition system by said computer as claimed in claim 1, wherein the step of extracting speech parameters includes a step of determining linear-predictive coefficients of the digitized audio signal.
  - 9. A method for training a speaker recognition system by said computer as claimed in claim 1, wherein the step of extracting speech parameters resulting in said set of differentiating factors (r) includes a step of adding a new set of said differentiating factors (r) to a stored set of differentiating factors (r).
  - 10. A method for training a speaker recognition system by said computer as claimed in claim 1, wherein there is further included the step of extracting, storing said set of differentiating factors (r), and polynomial pattern classifying by the computer for a second digitized audio signal to produce a second digital audio signature (w).
  - 11. A method for training a speaker recognition system by said computer as claimed in claim 1, wherein there is further included the step of comparing said first and second digital audio signatures (w) to recognize a speaker.
  - 12. A method for training a speaker recognition system by said computer as claimed in claim 1, wherein there is further included the step of permitting communication access by the speaker if first and second digital audio signature (w) correlate.
  - 13. A method for training a speaker recognition system by said computer as claimed in claim 1, wherein there is further included the step of training said first digital audio signature (w) with said second digital audio signature (w).
  - 14. A method for training a speaker recognition system by said computer as claimed in claim 1, wherein there is further included the step of storing said second digital audio signature (w) in the data base.
  - 15. A method for training a speaker recognition system by said computer as claimed in claim 1, wherein there is further included the step of training with said differentiating factors (r), where rows in M correspond to the said first speaker and said speaker audio signature are denoted as M₁ and M₂ respectively:
    - space="preserve" listing-type="equation">M.sup.t Mw.sub.1 =M.sup.t o.sub.1
      space="preserve" listing-type="equation">(M.sub.1.sup.t M.sub.1 +M.sub.2.sup.t M.sub.2)w.sub.1 =M.sub.1.sup.t 1
      space="preserve" listing-type="equation">(R.sub.1 +R.sub.2)w.sub.1 =M.sub.1.sup.t 1
      where 1 is a vector of all ones.

16. An apparatus for training a speaker recognition system comprising:
- a processor for extracting speech parameters from a digitized audio signal to produce a set of differentiating factors (r);
  
  a computer for storing said set of differentiating factors (r) in a data base, said computer coupled to said processor;
  
  a polynomial pattern classifier operating on said set of differentiating factors (r) to produce a first digital audio signature (w);
  
  means for storing said digitized audio signature (w) in said data base to produce a stored first digital audio signature (w);
  
  means for specifying a first speaker as having audio signature features X¹, X², . . . , X^M ;
  
  means for specifying a second speaker, as having audio signature features Y¹, X², . . . , Y^M ;
  
  means for discriminating between said first speaker and said second speaker;
  
  means for training said audio signature of said first and second speaker to provide a polynomial for a 2-norm to an ideal output of 1 for features of said first speaker and an ideal output of 0 for said second speaker;
  
  means for representing a matrix whose rows are a polynomial expansions of said first and second speakers audio signature features, ##EQU7## and where o₁ is a column vector of length 2M whose first M entries are 1 and remaining entries are 0, and o₂ =1-o₁ ; and
  
  said means for training of said first speaker and said second speaker respectively being;
  
  ##EQU8##
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 17. The apparatus as claimed in claim 16, wherein said polynomial pattern classifier includes means for training with said differentiating factors (r) where rows in M corresponding to said first speaker and said second speaker audio signature are denoted as M₁ and M₂ respectively:
    - space="preserve" listing-type="equation">M.sup.t Mw.sub.1 =M.sup.t o.sub.1
      space="preserve" listing-type="equation">(M.sub.1.sup.t M.sub.1 +M.sub.2.sup.t M.sub.2)w.sub.1 =M.sub.1.sup.t 1
      space="preserve" listing-type="equation">(R.sub.1 +R.sub.2)w.sub.1 =M.sub.1.sup.t 1
      where 1 is a vector of all ones.
  - 18. The apparatus as claimed in claim 16, wherein there is further included a source of an audio input.
  - 19. The apparatus as claimed in claim 18, wherein there is further included an analog/digital converter for producing said digitized audio signal from said audio input, said analog/digital converter coupled to said source and to said processor.
  - 20. The apparatus as claimed in claim 16, wherein there is further included a data link for sending said set of differentiating factors (r) to said data base, said data link coupled to said processor and to said computer.
  - 21. The apparatus as claimed in claim 16, wherein said processor further includes means for determining a frequency domain representation of a short-time power spectra or cepstra of said digitized audio signal, said means for determining operated by said computer.
  - 22. The apparatus as claimed in claim 16, wherein said processor further includes means for determining a frequency domain representation of a transitional information (dynamic) short-time power spectra or delta-cepstra of said digitized audio signal.
  - 23. The apparatus as claimed in claim 16, wherein said processor further includes means for determining non-linearly processed filter bank outputs of said digitized audio signal.
  - 24. The apparatus as claimed in claim 16, wherein said processor further includes means for determining linear-predictive coefficients of the digitized audio signal.
  - 25. The apparatus as claimed in claim 16, wherein said processor includes means for extracting speech parameters resulting in said set of differentiating factors (r).
  - 26. The apparatus as claimed in claim 16, wherein said polynomial pattern classifier includes means for comparing speech parameters in said set of differentiating factors (r).
  - 27. The apparatus as claimed in claim 16, wherein said polynomial pattern classifier includes means for recognizing speech parameters of a person in said set of differentiating factors (r).
  - 28. The apparatus as claimed in claim 16, wherein said polynomial pattern classifier includes means for adding a new set of said differentiating factors (r) to a set of differentiating factors (r) already in a data base to train said speaker recognition system.
  - 29. The apparatus as claimed in claim 16, wherein said polynomial pattern classifier includes means for adding a new set of said differentiating factors (r) to update said set of differentiating factors (r) in a data base to retrain said speaker recognition system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google Technology Holdings LLC (Alphabet Inc.)
Original Assignee
Motorola, Inc. (Motorola Solutions, Inc.)
Inventors
Assaleh, Khaled Talal, Campbell, William Michael
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US08/805,540
Time in Patent Office

700 Days
Field of Search

704/232, 704/231, 704/236, 704/243, 704/244, 704/245, 704/246, 704/247, 704/251, 395/22, 395/23, 395/24
US Class Current

704/244
CPC Class Codes

G10L 17/04 Training, enrolment or mode...

Method and apparatus for training a speaker recognition system

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for training a speaker recognition system

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links