Speaker recognition system using neural network
First Claim
Patent Images
1. A speaker recognition system for identifying a speaker and verifying registration of a speaker based on an input voice, comprising:
- a voice input section for inputting a voice;
a preprocessing section for extracting a feature quantity from said inputted voice and averaging said extracted feature quantity timewise;
a layered neural network for performing a predetermined operation based on an input pattern from said preprocessing section; and
a speaker judgment section for judging identification or registration of said speaker based on an output from said neural network;
whereinsaid preprocessing section divides said inputted voice into a plurality of frames that are timewise equal to one another, extracts a feature quantity for each frame, and averages together said feature quantity of each frame over a group of frames, thereby producing an input pattern for said layered neural network consisting of an extracted feature quantity which is timewise averaged.
0 Assignments
0 Petitions
Accused Products
Abstract
A speaker recognition system for recognizing a speaker from an input voice using a neural network, in which a feature quantity extracted from the input voice is timewise averaged to create an input pattern to the neural network. The averaging technique is such that the input voice is equally divided timewise into a plurality of blocks in a simple manner and that such feature quantity is averaged every block. The feature quantity includes a frequency characteristic, pitch frequency, linear prediction coefficient, and partial self-correlation (PARCOR) coefficient of the voice.
-
Citations
8 Claims
-
1. A speaker recognition system for identifying a speaker and verifying registration of a speaker based on an input voice, comprising:
-
a voice input section for inputting a voice; a preprocessing section for extracting a feature quantity from said inputted voice and averaging said extracted feature quantity timewise; a layered neural network for performing a predetermined operation based on an input pattern from said preprocessing section; and a speaker judgment section for judging identification or registration of said speaker based on an output from said neural network;
whereinsaid preprocessing section divides said inputted voice into a plurality of frames that are timewise equal to one another, extracts a feature quantity for each frame, and averages together said feature quantity of each frame over a group of frames, thereby producing an input pattern for said layered neural network consisting of an extracted feature quantity which is timewise averaged. - View Dependent Claims (2, 3)
-
-
4. A speaker recognition system using a neural network comprising:
-
a function selection section for selecting a speaker identification function, a speaker verification function, or both; a registered speaker count setting section for setting the number of speakers to be preregistered in said system; a mode selection section for selecting a mode in which learning is executed by the neural network, or a mode in which speaker recognition is activated using said neural network that has completed said learning; voice input section for inputting a voice and detecting a voice block from said voice; a preprocessing section for extracting a feature quantity from said inputted voice by dividing said inputted voice into a plurality of frames that are timewise equal to one another, extracting a feature quantity from each frame, and averaging said extracted feature quantity for each frame over a group of frames, thereby producing an input pattern; a layered neural network for performing a predetermined operation based on an input pattern from said preprocessing section; a speaker judgment section for judging identity or registration of said speaker based on an output from said neural network;
whereinin said learning mode, in response to a voice being input, together with speaker information identifying a speaker or indicating whether said speaker is a registered speaker, and a voice block being detected, said preprocessing section equally divides said voice block into m subblocks consisting of at least one frame, calculates spectral power of n frequency bands, said n frequency bands being set on the frequency domain for each frame, averages together the spectral power of each frame in a subblock for every equally divided subblock, generates an input pattern obtained from the result of averaging for input to said neural network, calculates an error between said obtained output pattern and a target value corresponding to said speaker information, determines a degree of strength of connection between units of the neural network for correction so that said error is decreased, and repeats said calculation of said error and said correction of said degree of strength of connection between said units until said error becomes below a predetermined value; and in said activation mode, in response to a voice being inputted and a voice block being detected, said preprocessing section equally divides said voice block into m subblocks, calculates spectral power for each of n frequency bands, said n frequency bands being set on the frequency domain for each frame, averages together said spectral power of each frame in a subblock for every equally divided subblock, and transmits an input pattern obtained from the result of averaging to said neural network that has completed said learning; and said judgment section judges the identity of the speaker from said obtained output pattern in an identification function and the registration of the speaker in a verification function wherein m and n are integers.
-
-
5. A speaker recognition system for identifying a speaker and verifying registration of a speaker based on an input voice, comprising:
-
a voice input section for inputting a voice; a preprocessing section for extracting a feature quantity from each of a plurality of frames which said input voice is divided into with a predetermined period; a layered neural network for performing a predetermined operation based on an input pattern from said preprocessing section; and a speaker judgement section for judging identification or registration of said speaker based on an output from said neural network; wherein said preprocessing section includes means for producing said input pattern in the manner that said plurality of frames are grouped into a plurality of groups, a total number of the groups is less than a total number of the frames, each of the groups includes a substantially equal number of frames, and said feature quantities in each of the groups are averaged to produce said input pattern of the neural network. - View Dependent Claims (6, 7)
-
-
8. A speaker recognition system using a neural network comprising:
-
a function selection section for selecting at least one of a speaker identification function and a speaker verification function; a registered speaker count setting section for setting the number of speakers to be preregistered in said system; a mode selection section for selecting a mode in which learning is executed by the neural network, or a mode in which speaker recognition is activated using said neural network that has completed said learning; a voice input section for inputting a voice and detecting a voice block from said voice; a preprocessing section for extracting a feature quantity from each of a plurality of frames which said input voice is divided into with a predetermined period; a layered neural network for performing a predetermined operation based on an input pattern from said preprocessing section; and a speaker judgement section for judging an identity or a registration of said speaker based on an output from said neural network; wherein said preprocessing section includes means for producing said input pattern in the manner that said plurality of frames are grouped into a plurality of groups, a total number of the groups is less than a total number of the frames, each of the groups includes a substantially equal number of frames, and said feature quantities in each of the groups are averaged to produce said input pattern of the neural network; and wherein, in said learning mode, in response to a voice being input, together with speaker information identifying a speaker or indicating whether said speaker is a registered speaker, and a voice block being detected, said preprocessing section equally divides said voice block into m subblocks including at least one frame, calculates spectral power of n frequency bands, said frequency bands being set on the frequency domain for each frame, averages together the spectral power of each frame in a subblock for every equally divided subblock, generates an input pattern obtained from the result of averaging for input to said neural network, calculates an error between said obtained output pattern and a target value corresponding to said speaker information, determines a degree of strength of connection between units of the neural network for correction so that said error is decreased, and repeats said calculation of said error and said correction of said degree of strength of connection between said units until said error becomes below a predetermined value; and in said activation mode, in response to a voice being inputted and a voice block being detected, said preprocessing section equally divides said voice block into m subblocks, calculates spectral power for each of n frequency bands, said n frequency bands being set on the frequency domain for each frame, averages together said spectral power of each frame in a subblock for every equally divided subblock, and transmits an input pattern obtained from the result of averaging to said neural network that has completed said learning; and said judgement section judges the identity of the speaker from said obtained output pattern in an identification function mode and the registration of the speaker in a verification function mode wherein m and n are integers.
-
Specification