Anti-spoofing

US 9,767,806 B2
Filed: 09/24/2014
Issued: 09/19/2017
Est. Priority Date: 09/24/2013
Status: Active Grant

First Claim

Patent Images

1. A speaker recognition system adapted for receiving audio data, the system being adapted for:

receiving audio data under test;

obtaining a Medium Frequency Relative Energy (MF) parameter, comprising a ratio between an energy of the received audio data under test in a predetermined frequency band and an energy of a complete frequency spectrum of the received audio data under test; and

classifying using a Gaussian classifier whether the received audio data under test is genuine or represents a recording replayed through a loudspeaker, based on the Medium Frequency Relative Energy (MF) parameter, wherein the Gaussian classifier is trained by the following steps;

a first Gaussian is obtained by;

receiving genuine audio data;

obtaining a first Medium Frequency Relative Energy (MF) parameter, comprising the ratio between the energy of the genuine audio data in a predetermined frequency band and the energy of the complete frequency spectrum of the genuine audio data;

receiving audio data representing recordings replayed through a loudspeaker; and

modelling the genuine audio data;

and wherein a second Gaussian is obtained by;

receiving audio data representing recordings replayed through a loudspeaker;

obtaining a second Medium Frequency Relative Energy (MF) parameter, comprising the ratio between the energy of the audio data representing recordings replayed through a loudspeaker in a predetermined frequency band and the energy of the complete frequency spectrum of the audio data representing recordings replayed through a loudspeaker; and

modelling the audio data representing recordings replayed through a loudspeaker with a second Gaussian.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

System for classifying whether audio data received in a speaker recognition system is genuine or a spoof using a Gaussian classifier and method for classifying whether audio data received in a speaker recognition system is genuine or a spoof using a Gaussian classifier.

Citations

30 Claims

1. A speaker recognition system adapted for receiving audio data, the system being adapted for:
- receiving audio data under test;
  
  obtaining a Medium Frequency Relative Energy (MF) parameter, comprising a ratio between an energy of the received audio data under test in a predetermined frequency band and an energy of a complete frequency spectrum of the received audio data under test; and
  
  classifying using a Gaussian classifier whether the received audio data under test is genuine or represents a recording replayed through a loudspeaker, based on the Medium Frequency Relative Energy (MF) parameter, wherein the Gaussian classifier is trained by the following steps;
  
  a first Gaussian is obtained by;
  
  receiving genuine audio data;
  
  obtaining a first Medium Frequency Relative Energy (MF) parameter, comprising the ratio between the energy of the genuine audio data in a predetermined frequency band and the energy of the complete frequency spectrum of the genuine audio data;
  
  receiving audio data representing recordings replayed through a loudspeaker; and
  
  modelling the genuine audio data;
  
  and wherein a second Gaussian is obtained by;
  
  receiving audio data representing recordings replayed through a loudspeaker;
  
  obtaining a second Medium Frequency Relative Energy (MF) parameter, comprising the ratio between the energy of the audio data representing recordings replayed through a loudspeaker in a predetermined frequency band and the energy of the complete frequency spectrum of the audio data representing recordings replayed through a loudspeaker; and
  
  modelling the audio data representing recordings replayed through a loudspeaker with a second Gaussian.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. A speaker recognition system according to claim 1, wherein a lower end of the predetermined frequency band is in a range from 30 Hz to 150 Hz.
  - 3. A speaker recognition system according to claim 1, wherein an upper end of the predetermined frequency band is in a range from 150 Hz to 270 Hz.
  - 4. A speaker recognition system according to claim 1, wherein the considered parameters of the audio data further comprise a feature vector distance.
  - 5. A speaker recognition system according to claim 4, wherein the feature vector distance is calculated with regard to a constant value.
  - 6. A speaker recognition system according to claim 4, wherein the feature vector distance is calculated with regard to average feature vectors derived from enrolment data.
  - 7. A speaker recognition system according to claim 1, wherein the considered parameters of the audio data further comprise a spectral ratio.
  - 8. A speaker recognition system according to claim 1, wherein new parameters for the Gaussian classifier are found by adaptation of previous parameters of the Gaussian classifier using adaptation audio data.
  - 9. A speaker recognition system according to claim 8, wherein the number of available samples of adaptation audio data is considered in the adaptation process.
  - 10. A speaker recognition system according to claim 8, wherein mean vector(s) and/or covariance matrices and/or an a priori probability of one, two, three, four or more Gaussians representing the region of audio data parameters from genuine audio data and/or wherein mean vector(s) and/or the covariance matrices and/or an a priori probability of one, two, three, four or more Gaussians representing the region of audio data parameters from audio data representing recordings replayed through loudspeakers are adapted.
  - 11. A speaker recognition system according to claim 8, wherein enrollment audio data comprises the adaptation audio data.
  - 12. A speaker recognition system according to claim 8, wherein the adaptation audio data comprises genuine audio data and/or audio data representing a recording replayed through a loudspeaker.
  - 13. A speaker recognition system according to claim 8, wherein the adaptation audio data is chosen depending on information that the Gaussian classifier should model.

14. A method in a speaker recognition system for classifying whether audio data is genuine or represents a recording replayed through a loudspeaker, the method comprising:
- receiving the audio data, andclassifying using a Gaussian classifier whether the received audio data is genuine or represents a recording replayed through a loudspeaker, wherein Gaussians are used to model a region of audio data parameters from genuine audio data and wherein Gaussians are used to model a region of audio data parameters from audio data representing recordings replayed through loudspeakers, based on a Medium Frequency Relative Energy (MF) parameter, andwherein;
  
  the Medium Frequency Relative Energy (MF) parameter comprises a ratio between an energy of the audio data in a predetermined frequency band and an energy of a complete frequency spectrum of the audio data; and
  
  the Gaussian classifier is trained by the following steps;
  
  a first Gaussian is obtained by;
  
  receiving genuine audio data;
  
  obtaining a first Medium Frequency Relative Energy (MF) parameter, comprising the ratio between the energy of the genuine audio data in a predetermined frequency band and the energy of the complete frequency spectrum of the genuine audio data;
  
  receiving audio data representing recordings replayed through a loudspeaker; and
  
  modelling the genuine audio data;
  
  and wherein a second Gaussian is obtained by;
  
  receiving audio data representing recordings replayed through a loudspeaker;
  
  obtaining a second Medium Frequency Relative Energy (MF) parameter, comprising the ratio between the energy of the audio data representing recordings replayed through a loudspeaker in a predetermined frequency band and the energy of the complete frequency spectrum of the audio data representing recordings replayed through a loudspeaker; and
  
  modelling the audio data representing recordings replayed through a loudspeaker with a second Gaussian.
- View Dependent Claims (15)
- - 15. A non-transitory computer-readable medium comprising computer-readable instructions that, when executed on a computer, are adapted to carry out a method according to claim 14.

16. A speaker recognition system adapted for receiving audio data, the system being adapted for:
- receiving audio data under test;
  
  obtaining a Low Frequency Mel Frequency Cepstral Coefficients (LF-MFCC) parameter, comprising a ratio between an energy of the received audio data under test in a predetermined frequency band and an energy of a complete frequency spectrum of the received audio data under test; and
  
  classifying using a Gaussian classifier whether the received audio data under test is genuine or represents a recording replayed through a loudspeaker, based on the Low Frequency Mel Frequency Cepstral Coefficients (LF-MFCC) parameter, wherein the Gaussian classifier is trained by the following steps;
  
  a first Gaussian is obtained by;
  
  receiving genuine audio data;
  
  obtaining a first Low Frequency Mel Frequency Cepstral Coefficients (LF-MFCC) parameter, comprising the ratio between the energy of the genuine audio data in a predetermined frequency band and the energy of the complete frequency spectrum of the genuine audio data;
  
  receiving audio data representing recordings replayed through a loudspeaker; and
  
  modelling the genuine audio data;
  
  and wherein a second Gaussian is obtained by;
  
  receiving audio data representing recordings replayed through a loudspeaker;
  
  obtaining a Low Frequency Mel Frequency Cepstral Coefficients (LF-MFCC) parameter, comprising the ratio between the energy of the audio data representing recordings replayed through a loudspeaker in a predetermined frequency band and the energy of the complete frequency spectrum of the audio data representing recordings replayed through a loudspeaker; and
  
  modelling the audio data representing recordings replayed through a loudspeaker with a second Gaussian.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 30)
- - 17. A speaker recognition system according to claim 16, wherein the predetermined cut-off frequency is lower than 750 Hz.
  - 18. A speaker recognition system according to claim 16, wherein the considered parameters of the audio data further comprise a feature vector distance.
  - 19. A speaker recognition system according to claim 18, wherein the feature vector distance is calculated with regard to a constant value.
  - 20. A speaker recognition system according to claim 18, wherein the feature vector distance is calculated with regard to average feature vectors derived from enrolment data.
  - 21. A speaker recognition system according to claim 18, wherein the considered parameters of the audio data further comprise a spectral ratio.
  - 22. A speaker recognition system according to claim 16, wherein new parameters for the Gaussian classifier are found by adaptation of previous parameters of the Gaussian classifier using adaptation audio data.
  - 23. A speaker recognition system according to claim 22, wherein the number of available samples of adaptation audio data is considered in the adaptation process.
  - 24. A speaker recognition system according to claim 22, wherein mean vector(s) and/or covariance matrices and/or an a priori probability of one, two, three, four or more Gaussians representing the region of audio data parameters from genuine audio data and/or wherein mean vector(s) and/or the covariance matrices and/or an a priori probability of one, two, three, four or more Gaussians representing the region of audio data parameters from audio data representing recordings replayed through loudspeakers are adapted.
  - 25. A speaker recognition system according to claim 22, wherein enrollment audio data comprises the adaptation audio data.
  - 26. A speaker recognition system according to claim 22, wherein the adaptation audio data comprises genuine audio data and/or audio data representing a recording replayed through a loudspeaker.
  - 27. A speaker recognition system according to claim 22, wherein the adaptation audio data is chosen depending on the information that the Gaussian classifier should model.
  - 30. A speaker recognition system according to claim 16, wherein the predetermined cut-off frequency is between 250 Hz and 750 Hz.

28. A method in a speaker recognition system for classifying whether audio data is genuine or represents a recording replayed through a loudspeaker, the method comprising:
- receiving the audio data, andclassifying using a Gaussian classifier whether the received audio data is genuine or represents a recording replayed through a loudspeaker, wherein Gaussians are used to model a region of audio data parameters from genuine audio data and wherein Gaussians are used to model a region of audio data parameters from audio data representing recordings replayed through loudspeakers, based on the Low Frequency Mel Frequency Cepstral Coefficients (LF-MFCC) parameter, andwherein;
  
  the Low Frequency Mel Frequency Cepstral Coefficients (LF-MFCC) parameter comprises 1, 2, 3 or more or all LF-MFCC extracted from a region of the audio data having frequencies lower than a predetermined cut-off frequency; and
  
  wherein the Gaussian classifier is trained by the following steps;
  
  a first Gaussian is obtained by;
  
  receiving genuine audio data;
  
  obtaining a first Low Frequency Mel Frequency Cepstral Coefficients (LF-MFCC) parameter, comprising the ratio between the energy of the genuine audio data in a predetermined frequency band and the energy of the complete frequency spectrum of the genuine audio data;
  
  receiving audio data representing recordings replayed through a loudspeaker; and
  
  modelling the genuine audio data;
  
  and wherein a second Gaussian is obtained by;
  
  receiving audio data representing recordings replayed through a loudspeaker;
  
  obtaining a Low Frequency Mel Frequency Cepstral Coefficients (LF-MFCC) parameter, comprising the ratio between the energy of the audio data representing recordings replayed through a loudspeaker in a predetermined frequency band and the energy of the complete frequency spectrum of the audio data representing recordings replayed through a loudspeaker; and
  
  modelling the audio data representing recordings replayed through a loudspeaker with a second Gaussian.
- View Dependent Claims (29)
- - 29. A non-transitory computer-readable medium comprising computer-readable instructions that, when executed on a computer, are adapted to carry out a method according to claim 28.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cirrus Logic Incorporated
Original Assignee
Cirrus Logic International Semiconductor Ltd. (Cirrus Logic Incorporated)
Inventors
Gimnez, Alfonso Ortega, Rodriguez, Luis Buera, Avils-Casco, Carlos Vaquero
Primary Examiner(s)
Sharma, Neeraj

Application Number

US14/495,391
Publication Number

US 20150088509A1
Time in Patent Office

1,091 Days
Field of Search

704243, 704205, 704234, 704246, 704256, 704247, 704270, 704249, 341200, 381 941, 700 94, 726 19, 382115
US Class Current
CPC Class Codes

G10L 17/02   Preprocessing operations, e...

G10L 17/04   Training, enrolment or mode...

G10L 17/22   Interactive procedures; Man...

Anti-spoofing

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Anti-spoofing

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links