Method and neural network for speech recognition using a correlogram as input
First Claim
Patent Images
1. A method for recognizing individual words of speech, which comprises:
- converting speech during an expectation time period into an electrical speech signal;
ascertaining an instantaneous spectral amplitude distribution of the speech signal during time intervals defined by a duration of a phoneme and representing the instantaneous spectral amplitude distribution as a spectral vector Si (i=0, 1, . . . , m-1), wherein each element (Si0, Si1, . . . , Sin-1) of the spectral vector Si represents an amplitude of a frequency band having a predetermined bandwidth, and n is an integer representing a number of divisions of a total detected frequency band into the frequency bands having the predetermined bandwidth;
forming a spectogram S from the spectral vectors Si in accordance with ##EQU2## deriving a correlogram K from the spectrogram S, wherein the correlogram K has coordinates j, h, k and each element Kj,h,k of the correlogram K is formed in accordance with ##EQU3## and classifying an individual spoken word with a word-typical characteristic pattern with the correlogram K.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and device for recognizing individual words of spoken speech can be used to control technical processes. The method proposed by the invention is based on feature extraction which is particularly efficient in terms of computing capacity and recognition rate, plus subsequent classification of the individual words using a neural network.
-
Citations
9 Claims
-
1. A method for recognizing individual words of speech, which comprises:
-
converting speech during an expectation time period into an electrical speech signal; ascertaining an instantaneous spectral amplitude distribution of the speech signal during time intervals defined by a duration of a phoneme and representing the instantaneous spectral amplitude distribution as a spectral vector Si (i=0, 1, . . . , m-1), wherein each element (Si0, Si1, . . . , Sin-1) of the spectral vector Si represents an amplitude of a frequency band having a predetermined bandwidth, and n is an integer representing a number of divisions of a total detected frequency band into the frequency bands having the predetermined bandwidth; forming a spectogram S from the spectral vectors Si in accordance with ##EQU2## deriving a correlogram K from the spectrogram S, wherein the correlogram K has coordinates j, h, k and each element Kj,h,k of the correlogram K is formed in accordance with ##EQU3## and classifying an individual spoken word with a word-typical characteristic pattern with the correlogram K. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
4. The method according to claim 1, which comprises classifying the spoken individual word with a neural network.
-
5. The method according to claim 4, which comprises
assigning each element of the correlogram K to each of a first number of neurons in an input plane of the neural network; -
assigning each of the neurons of the input plane to each of a second number of neurons in an output plane of the neural network; and indicating a defined recognized individual word with an output of a respective one of the neurons of the output plane.
-
-
6. The method according to claim 4, which comprises calculating in each of the neurons with a nonlinear transfer function ##EQU4##
-
7. The method according to claim 1, which further comprises initiating a dialing process in a telephone set as a function of the recognized individual word of a telephone number associated with the individual word.
-
-
8. An apparatus for recognizing spoken words, comprising:
-
a digital signal processor connected to a bus system, said bus system including data, address and control lines, and a program memory, a working memory, and an input/output unit each connected to said digital signal processor via said bus system; said digital signal processor including means for converting speech spoken during an expectation time period into an electrical speech signal;
means for ascertaining an instantaneous spectral amplitude distribution of the speech signal during a time interval defined by a duration of a phoneme;
means for representing the instantaneous spectral amplitude distribution as a spectral vector, each element of the spectral vector representing an amplitude of a frequency band having a predetermined bandwidth; and
means for classifying an individual spoken word with a word-typical characteristic pattern derived from the spectral vector representing the instantaneous spectral amplitude distribution of the individual spoken word. - View Dependent Claims (9)
-
Specification