Speaker independent isolated word recognition system using neural networks
First Claim
Patent Images
1. A speaker independent isolated word recognition apparatus, comprising:
- digitizing means for digitizing a speech signal and subjecting the digitized speech signal to spectral analysis at constant temporal intervals using fast Fourier transform, to obtain an analysis result;
means connected to said digitizing means for subjecting the analysis result to an orthogonal transformation to obtain cepstral parameters and a logarithm of a total energy contained in each temporal interval to yield characteristic parameters of the speech signal for each temporal interval;
means for detecting word ends through an energy level of the respective speech signal; and
a recognizer (RNA), in which complete words are modelled with Markov model automata of a left-to-right type with recursion on states, each of which corresponds to an acoustic portion of the word, and in which the recognition is carried out through a dynamic programming according to a Viterbi algorithm on all automata for finding one with a minimum cost path, which corresponds to the recognized word indicated at output (PR), emission possibilities being calculate with a neural network with feedback having parallel processing neurons, the neural network being trained by;
initialization;
a. initialization of the neural network with small random synaptic weights;
b. creation of a first segmentation by segmenting a training of set words uniformly;
iteration by;
initialization of the training set with all the segmented words;
random choice of a word not already learned;
updating of synaptic weights wij for a word by applying a correlative training by varying a neural network input according to a window sliding from left to right on the word and supplying for every input window a suitable objective vector at an output, constructed by setting a 1 on the neuron corresponding to a state to which the input window belongs, according to the segmentation, and by setting 0 on all the other neurons;
segmentation recomputation for the considered word, by using the neural network as previously trained, and performing a dynamic programming only with correct model;
updating of the segmentation St+1 ;
if there still are non considered words in the training set, repeat the random choice;
recomputation of transition probabilities of automata; and
if the number of iterations on the training set is greater than a maximum preset number NMAX, terminate or return to initialization of the training set.
2 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition apparatus in which the speech signal is digitalized and subjected to special analysis, word end detection is effected by energy analysis of the speech signal and the recognition system utilizes a Markov model in combination with a neural network learning by specific training steps.
31 Citations
8 Claims
-
1. A speaker independent isolated word recognition apparatus, comprising:
-
digitizing means for digitizing a speech signal and subjecting the digitized speech signal to spectral analysis at constant temporal intervals using fast Fourier transform, to obtain an analysis result; means connected to said digitizing means for subjecting the analysis result to an orthogonal transformation to obtain cepstral parameters and a logarithm of a total energy contained in each temporal interval to yield characteristic parameters of the speech signal for each temporal interval; means for detecting word ends through an energy level of the respective speech signal; and
a recognizer (RNA), in which complete words are modelled with Markov model automata of a left-to-right type with recursion on states, each of which corresponds to an acoustic portion of the word, and in which the recognition is carried out through a dynamic programming according to a Viterbi algorithm on all automata for finding one with a minimum cost path, which corresponds to the recognized word indicated at output (PR), emission possibilities being calculate with a neural network with feedback having parallel processing neurons, the neural network being trained by;initialization; a. initialization of the neural network with small random synaptic weights; b. creation of a first segmentation by segmenting a training of set words uniformly; iteration by; initialization of the training set with all the segmented words; random choice of a word not already learned; updating of synaptic weights wij for a word by applying a correlative training by varying a neural network input according to a window sliding from left to right on the word and supplying for every input window a suitable objective vector at an output, constructed by setting a 1 on the neuron corresponding to a state to which the input window belongs, according to the segmentation, and by setting 0 on all the other neurons; segmentation recomputation for the considered word, by using the neural network as previously trained, and performing a dynamic programming only with correct model; updating of the segmentation St+1 ; if there still are non considered words in the training set, repeat the random choice; recomputation of transition probabilities of automata; and if the number of iterations on the training set is greater than a maximum preset number NMAX, terminate or return to initialization of the training set. - View Dependent Claims (2, 6, 7, 8)
-
-
3. calculate error E, defined as square error between output vector O and desired vector T, according to the formula:
- ##EQU12## where the objective is defined according to the correlation formula of outputs;
space="preserve" listing-type="equation">t.sub.k =o.sub.k ·
O.sub.n if t.sub.k ≠
1 and t.sub.h =1
space="preserve" listing-type="equation">t.sub.k unvaried if t.sub.k =1where tk is the k-th element of the objective vector, and ok and oh are the outputs of the k-th and h-th neuron of the output level of the network;
- ##EQU12## where the objective is defined according to the correlation formula of outputs;
-
4. calculate a partial derivative ##EQU13## of the error with respect to weights, used in updating of synaptic weights ##EQU14## where wij is the synaptic weight from neuron j to neuron I, η
- is a coefficient determining learning speed, β
is a coefficient, called moment, determining an inertia in weight updating δ
i is the backpropagated error on neuron I and oj is the output of neuron j;
starting from the error defined at step 4 new backpropagation error laws for correlative training are obtained, defined as follows;for output neurons;
space="preserve" listing-type="equation">δ
.sub.i =(t.sub.i -o.sub.i)F(net.sub.i) if t.sub.i =1
space="preserve" listing-type="equation">δ
.sub.i =-o.sub.i (o.sub.h -1).sup.2 F'"'"'(net.sub.i) if t.sub.i ≠
1 t.sub.h =1for internal neurons;
##EQU15##
- is a coefficient determining learning speed, β
-
5. where index k moves on neurons of the upper level;
- 6. Update every synaptic weight wij, according to the equation;
##EQU16##
- 6. Update every synaptic weight wij, according to the equation;
Specification