Operator interactions for developing phoneme recognition by neural networks
First Claim
1. A method, comprising the steps of:
- digitizing a set of speech signals,dividing the digitized signals into individual segments, each segment representing one or more speech phonemes;
transforming each segment into one or more vectors having plural dimensions;
assigning a code to each of the vectors, each code corresponding to a phoneme;
developing a speech recognition system including training a neural network using the coded vectors; and
an operator modifying the transforming step based on the acceptability of the trained neural network.
0 Assignments
0 Petitions
Accused Products
Abstract
An automated speech recognition system converts a speech signal into a compact, coded representation that correlates to a speech phoneme set. A number of different neural network pattern matching schemes may be used to perform the necessary speech coding. An integrated user interface guides a user unfamiliar with the details of speech recognition or neural networks to quickly develop and test a neural network for phoneme recognition. To train the neural network, digitized voice data containing known phonemes that the user wants the neural network to ultimately recognize are processed by the integrated user interface. The digitized speech is segmented into phonemes with each segment being labelled with a corresponding phoneme code. Based on a user selected transformation method and transformation parameters, each segment is transformed into a series of multiple dimension vectors representative of the speech characteristics of that segment. These vectors are iteratively presented to a neural network to train/adapt that neural network to consistently distinguish and recognize these vectors and assign an appropriate phoneme code to each vector. Simultaneous display of the digitized speech, segments, vector sets, and a representation of the trained neural network assist the user in visually confirming the acceptability of the phoneme training set. A user may also selectively audibly confirm the acceptability of the digitization scheme, the segments, and the transform vectors so that satisfactory training data are presented to the neural network. If the user finds a particular step or parameter produces an unacceptable result, the user may modify one or more of the parameters and verify whether the modification effected an improvement in performance. The trained neural network is also automatically tested by presenting a test speech signal to the integrated user interface and observing both audibly and visually automatic segmentation of the speech, transformation into multidimensional vectors, and the resulting neural network assigned phoneme codes. A method of decoding such phoneme codes using the neural network is also disclosed.
79 Citations
36 Claims
-
1. A method, comprising the steps of:
-
digitizing a set of speech signals, dividing the digitized signals into individual segments, each segment representing one or more speech phonemes; transforming each segment into one or more vectors having plural dimensions; assigning a code to each of the vectors, each code corresponding to a phoneme; developing a speech recognition system including training a neural network using the coded vectors; and an operator modifying the transforming step based on the acceptability of the trained neural network. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
11. A method, comprising the steps of:
-
digitizing a set of speech signals, dividing the digitized signals into individual segments, each segment representing one or more speech phonemes; transforming each segment into one or more vectors having plural dimensions; assigning a code to each of the vectors, each code corresponding to a phoneme; and developing a speech recognition system including training a neural network using the coded vectors, wherein the dividing step is performed either manually under the control of an operator or using an automatic segmenting procedure and the operator modifies the segments either by manually changing boundaries defining ones of the segments or by changing automatic segmenting procedure parameters.
-
-
29. A speech signal processing system, comprising:
-
an analog to digital converter for converting speech signals to be analyzed to digital form; a data processor connected to the analog to digital converter and performing the following data processing tasks; analyzing the digitized signals as speech segments, each segment representing one or more speech phonemes; transforming each segment into one or more vectors; assigning a phoneme to each of the vectors; an operator selecting a neural network from a plurality of different types of neural networks; and training the selected neural network using the vectors, wherein the trained neural network is used to recognize speech. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36)
-
Specification