Speaker verification utilizing compressed audio formants
First Claim
Patent Images
1. A method of performing speaker verification to determine whether a speaker is a registered speaker, the method comprising:
- a) obtaining an array of frames of compressed audio formants representing the speaker uttering a predetermined pass phrase, each frame within the array including;
i) energy data and pitch data characterizing the residue of the speaker uttering the predetermined pass phrase; and
ii) a plurality of formant coefficients characterizing the resonance of the speaker uttering the predetermined pass phrase; and
b) performing a time domain normalization of the array of frames of compressed audio formants to a sample array of frames of compressed audio formants such that such that the two arrays are of an equal quantity of frames;
c) determining whether the speaker is the registered speaker by;
generating an array of discrepancy values, each discrepancy value representing the difference between one of;
i) an energy value;
ii) a pitch value; and
iii) a formant coefficient value of a frame of the array and a corresponding energy value;
ii) pitch value; and
iii) formant coefficient value of a corresponding frame in the sample array; and
determining whether the array of discrepancy values is within a predetermined threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
A identity of a remote speaker is verified by receiving compressed audio formants from a remote Internet telephony client and comparing the compressed audio formants with sample compressed audio formants known to represent the person the remote speaker purports to be. The compressed audio formants include energy and pitch data characterizing the residue of the speaker uttering a predetermined pass phrase and a plurality of formant coefficients characterizing the resonance of the speaker.
-
Citations
12 Claims
-
1. A method of performing speaker verification to determine whether a speaker is a registered speaker, the method comprising:
-
a) obtaining an array of frames of compressed audio formants representing the speaker uttering a predetermined pass phrase, each frame within the array including;
i) energy data and pitch data characterizing the residue of the speaker uttering the predetermined pass phrase; and
ii) a plurality of formant coefficients characterizing the resonance of the speaker uttering the predetermined pass phrase; and
b) performing a time domain normalization of the array of frames of compressed audio formants to a sample array of frames of compressed audio formants such that such that the two arrays are of an equal quantity of frames;
c) determining whether the speaker is the registered speaker by;
generating an array of discrepancy values, each discrepancy value representing the difference between one of;
i) an energy value;
ii) a pitch value; and
iii) a formant coefficient value of a frame of the array and a corresponding energy value;
ii) pitch value; and
iii) formant coefficient value of a corresponding frame in the sample array; and
determining whether the array of discrepancy values is within a predetermined threshold. - View Dependent Claims (2, 3, 4)
-
-
5. A method of determining whether a speaker is a registered speaker, the method comprising:
-
a) obtaining compressed audio formants for each frame of an array of frames representing the speaker uttering a predetermined pass phrase;
b) performing a time domain normalization of the array to a sample array of frames stored in a memory and representing the registered speaker uttering the predetermined pass phrase to decimate a portion of the frames of the larger of the two arrays such that the two arrays, after decimation, are of an equal quantity of frames, the portion of the frames to be decimated being selected by;
selecting a plurality of audio ferment decimation groups, each audio formant decimation group being a selection of frames from the larger of the two arrays which, if decimated, yields the best alignment between a formant coefficient value of each frame of each the array and the corresponding formant coefficient value of each frame of the sample array; and
determining a decimation group of frames from the larger of the two arrays, the decimation group being a quantity of frames equal to the quantity of frames to be decimated and being the frames which are selected by weighted average from each of the audio format decimation groups;
c) generating an array of discrepancy values, each discrepancy value representing the difference between one of an audio formant value of a frame of the array and a corresponding audio formant value of a corresponding frame of the sample array; and
d) determining that the remote speaker is the registered speaker if the array of discrepancy values is within a predetermined threshold. - View Dependent Claims (6, 7, 8)
-
-
9. A speaker verification server for determining whether a remote speaker is a registered speaker, the server comprising:
-
a) a network interface for receiving, via a packet switched network, compressed audio formants for each frame of an array of frames representing the remote speaker uttering a predetermined pass phrase as audio input to a remote telephony client;
b) a database storing compressed audio formants for each frame of a sample array of representing the registered speaker uttering the predetermined pass phrase as audio input; and
c) a verification application operatively coupled to each of the network interface and the database for comparing the compressed audio formants of the array of frames to the compressed audio formants of the sample array of frames to determine whether the remote speaker is the registered speaker by;
performing a time domain normalization of the array to the sample array such that such that the two arrays are of an equal quantity of frames;
generating an array of discrepancy values, each discrepancy value representing the difference between one of an audio formant value of a frame of the array and a corresponding audio formant value of a corresponding frame of the sample array; and
determining that the remote speaker is the registered speaker if the array of discrepancy values is within a predetermined threshold. - View Dependent Claims (10, 11, 12)
-
Specification