Method and apparatus for developing a neural network for phoneme recognition
First Claim
1. A method for developing a neural network to recognize phonemes in speech comprising the steps of:
- generating a training set of phonemes for later training the neural network to recognize phonemes in speech;
an operator visibly or audibly determining at selected stages of generating the training set the quality of a training set of phonemes used to train the neural network to later recognize phonemes in speech; and
varying one or more parameters used to generate the training set of phonemes to affect how the training set is generated until the operator determines that the generated training set is acceptable.
1 Assignment
0 Petitions
Accused Products
Abstract
An automated speech recognition system converts a speech signal into a compact, coded representation that correlates to a speech phoneme set. A number of different neural network pattern matching schemes may be used to perform the necessary speech coding. An integrated user interface guides a user unfamiliar with the details of speech recognition or neural networks to quickly develop and test a neural network for phoneme recognition. To train the neural network, digitized voice data containing known phonemes that the user wants the neural network to ultimately recognize are processed by the integrated user interface. The digitized speech is segmented into phonemes with each segment being labelled with a corresponding phoneme code. Based on a user selected transformation method and transformation parameters, each segment is transformed into a series of multiple dimension vectors representative of the speech characteristics of that segment. These vectors are iteratively presented to a neural network to train/adapt that neural network to consistently distinguish and recognize these vectors and assign an appropriate phoneme code to each vector. Simultaneous display of the digitized speech, segments, vector sets, and a representation of the trained neural network assist the user in visually confirming the acceptability of the phoneme training set. A user may also selectively audibly confirm the acceptability of the digitization scheme, the segments, and the transform vectors so that satisfactory training data are presented to the neural network. If the user finds a particular step or parameter produces an unacceptable result, the user may modify one or more of the parameters and verify whether the modification effected an improvement in performance. The trained neural network is also automatically tested by presenting a test speech signal to the integrated user interface and observing both audibly and visually automatic segmentation of the speech, transformation into multidimensional vectors, and the resulting neural network assigned phoneme codes. A method of decoding such phoneme codes using the neural network is also disclosed.
76 Citations
37 Claims
-
1. A method for developing a neural network to recognize phonemes in speech comprising the steps of:
-
generating a training set of phonemes for later training the neural network to recognize phonemes in speech; an operator visibly or audibly determining at selected stages of generating the training set the quality of a training set of phonemes used to train the neural network to later recognize phonemes in speech; and varying one or more parameters used to generate the training set of phonemes to affect how the training set is generated until the operator determines that the generated training set is acceptable. - View Dependent Claims (2, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 34, 35, 36, 37)
-
-
3. A method for testing a neural network for speech recognition comprising:
-
automatically assigning by the neural network codes to input speech; decoding the automatically assigned code to generate an estimate of a vector representing a corresponding phoneme; reverse transforming the estimate vector into a time varying signal; and reproducing the time varying signal to evaluate the performance of the speech recognition system; and modifying one or more parameters to change processing of speech subsequently input to the neural network. - View Dependent Claims (4, 5, 6, 32, 33)
-
-
7. A method for evaluating the acceptability of training data used to develop a neural network for speech recognition, comprising:
-
digitizing a speech signal having predetermined phonemes; segmenting the speech signal with each segment corresponding to one phoneme; audibly reproducing selected segments to detect a similarity with the predetermined phonemes; based on the detected similarity between audibly reproduced segments and the predetermined phonemes, modifying one or more parameters that affect how the speech signal is digitized or how the speech signal is segmented; and determining the acceptability of the training data after the one or more parameters is modified. - View Dependent Claims (8, 9)
-
Specification