Speaker recognition using spectral coefficients normalized with respect to unequal frequency bands
First Claim
Patent Images
1. A method of speaker recognition, said method comprising the steps of:
- deriving recognition feature data from an input speech signal represented by plural successive frames of digital data for a speech utterance, said recognition feature data comprising a plurality of coefficients each related to speech signal magnitude in a predetermined frequency band;
comparing said feature data with predetermined speaker reference data;
indicating recognition of a speaker in dependence upon the comparison;
said frequency bands being unevenly spaced with respect to frequency,said deriving step including a step of deriving a long term average spectral magnitude extending over plural of said frames of digital data; and
processing at least one of said coefficients so as to generate a normalized coefficient in which the effect of said long term magnitude is substantially reduced.
1 Assignment
0 Petitions
Accused Products
Abstract
Apparatus and method for speaker recognition includes generating, in response to a speech signal, a plurality of feature data having a series of coefficient sets, each set having a plurality of coefficients indicating the short term special amplitude in a plurality of frequency bands. The feature data is compared with predetermined speaker reference data, and recognition of a corresponding speaker is indicated in dependence upon such comparison. The frequency bands are unevenly spaced along the frequency axis, and a long term average spectral magnitude of at least one of said coefficients is derived and used for normalizing the at least one coefficient.
77 Citations
49 Claims
-
1. A method of speaker recognition, said method comprising the steps of:
-
deriving recognition feature data from an input speech signal represented by plural successive frames of digital data for a speech utterance, said recognition feature data comprising a plurality of coefficients each related to speech signal magnitude in a predetermined frequency band; comparing said feature data with predetermined speaker reference data; indicating recognition of a speaker in dependence upon the comparison; said frequency bands being unevenly spaced with respect to frequency, said deriving step including a step of deriving a long term average spectral magnitude extending over plural of said frames of digital data; and processing at least one of said coefficients so as to generate a normalized coefficient in which the effect of said long term magnitude is substantially reduced. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. Apparatus for speaker recognition which comprises:
-
means for generating a plurality of feature data comprising a series of coefficient sets from a speech signal represented by plural successive frames of digital data for a speech utterance, each set comprising a plurality of coefficients indicating short term spectral magnitude in a plurality of unevenly spaced frequency bands, means for comparing said feature data with predetermined speaker reference data and for indicating recognition of a corresponding speaker in dependence upon said comparison, and means for deriving a long term average spectral magnitude of at least one of said coefficients extending over plural of said frames of digital data and for normalizing the said at least one coefficient by said long term average. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. Apparatus for recognition processing of a voice signal represented by plural successive frames of digital data for a speech utterance, said apparatus comprising:
-
means for deriving recognition data comprising a plurality of signals each related to short term amplitude in a corresponding frequency band of said voice signal said frequency bands being unevenly spaced in the frequency domain, means for performing recognition processing using said recognition data, means for periodically generating or updating a moving long term average spectral amplitude extending over plural of said frames of digital data in said frequency bands; and means for processing feature data based on said recognition data using said long term average to reduce their dependence upon stationary spectral envelope components.
-
-
24. A method of speaker recognition, said method comprising:
-
generating recognition feature data from an input speech signal, said recognition feature data comprising a plurality of coefficients each related to the speech signal magnitude in a predetermined frequency band, said frequency bands being unevenly spaced along the frequency axis, the step of generating said coefficients including a step of deriving a long term average spectral magnitude and processing at least one of said coefficients so as to generate a normalized coefficient in which the effect of said long term magnitude is substantially reduced; comparing said feature data with predetermined speaker reference data; and indicating recognition of a speaker in dependence upon the comparison. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33)
-
-
34. Apparatus for speaker recognition, said apparatus comprising:
-
means for generating from a speech signal, a plurality of feature data comprising a series of coefficient sets, each set comprising a plurality of coefficients indicating the short term spectral magnitude in a plurality of frequency bands, said frequency bands being unevenly spaced along the frequency axis; means for deriving a long term average spectral magnitude of at lest one of said coefficients; means for normalizing the or each of said at least one coefficient by said long term average; means for comparing said feature data with predetermined speaker reference data; and means for indicating recognition of a corresponding speaker in dependence upon said comparison. - View Dependent Claims (35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
-
-
46. Apparatus for recognition processing of a voice signal, said apparatus comprising:
-
means for deriving recognition data comprising a plurality of signals each related to the short term amplitude in a corresponding frequency band of said voice signal, said frequency bands being unevenly spaced in the frequency domain; means for periodically generating or updating a moving long term average spectral amplitude in said frequency bands, means for processing said feature data using said long term average to reduce their dependence upon stationary spectral envelope components; and means for performing recognition processing in dependence thereon.
-
-
47. A method of speaker recognition comprising:
-
generating recognition feature data from an input speech signal, comparing said feature data with predetermined speaker reference data; and
indicating recognition of a speaker in dependence upon the comparison;wherein said recognition feature generating step comprises; identifying a portion of the input speech signal as representing a single contiguous utterance; generating a plurality of coefficients each relating to the signal magnitude in one of a plurality of predetermined frequency bands of an identified portion of the signal, said frequency bands being unevenly spaced in the frequency domain; deriving a long term average spectral magnitude of the coefficients of the single contiguous utterance; and processing at least one of said coefficients so as to generate a normalized coefficient in which the effect of said long term magnitude is substantially reduced. - View Dependent Claims (48, 49)
-
Specification