Method and neural network for speech recognition using a correlogram as input

US 5,721,807 A
Filed: 01/21/1994
Issued: 02/24/1998
Est. Priority Date: 07/25/1991
Status: Expired due to Fees

First Claim

Patent Images

1. A method for recognizing individual words of speech, which comprises:

converting speech during an expectation time period into an electrical speech signal;

ascertaining an instantaneous spectral amplitude distribution of the speech signal during time intervals defined by a duration of a phoneme and representing the instantaneous spectral amplitude distribution as a spectral vector Sⁱ (i=0, 1, . . . , m-1), wherein each element (Sⁱ₀, Sⁱ₁, . . . , Sⁱ_n-1) of the spectral vector Sⁱ represents an amplitude of a frequency band having a predetermined bandwidth, and n is an integer representing a number of divisions of a total detected frequency band into the frequency bands having the predetermined bandwidth;

forming a spectogram S from the spectral vectors Sⁱ in accordance with ##EQU2## deriving a correlogram K from the spectrogram S, wherein the correlogram K has coordinates j, h, k and each element K_j,h,k of the correlogram K is formed in accordance with ##EQU3## and classifying an individual spoken word with a word-typical characteristic pattern with the correlogram K.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and device for recognizing individual words of spoken speech can be used to control technical processes. The method proposed by the invention is based on feature extraction which is particularly efficient in terms of computing capacity and recognition rate, plus subsequent classification of the individual words using a neural network.

Citations

9 Claims

1. A method for recognizing individual words of speech, which comprises:
- converting speech during an expectation time period into an electrical speech signal;
  
  ascertaining an instantaneous spectral amplitude distribution of the speech signal during time intervals defined by a duration of a phoneme and representing the instantaneous spectral amplitude distribution as a spectral vector Sⁱ (i=0, 1, . . . , m-1), wherein each element (Sⁱ₀, Sⁱ₁, . . . , Sⁱ_n-1) of the spectral vector Sⁱ represents an amplitude of a frequency band having a predetermined bandwidth, and n is an integer representing a number of divisions of a total detected frequency band into the frequency bands having the predetermined bandwidth;
  
  forming a spectogram S from the spectral vectors Sⁱ in accordance with ##EQU2## deriving a correlogram K from the spectrogram S, wherein the correlogram K has coordinates j, h, k and each element K_j,h,k of the correlogram K is formed in accordance with ##EQU3## and classifying an individual spoken word with a word-typical characteristic pattern with the correlogram K.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method according to claim 1, which comprises defining the time interval t_s between two successive spectral vectors Sⁱ and Sⁱ⁺¹ equal to 32 ms.
  - 3. The method according to claim 1, which comprises selecting each of the indices j, h, k of the correlogram K according to
    
    space="preserve" listing-type="equation">k>
    
    0;
    
    k * b.sub.c <
    
    1 kHz
    
    space="preserve" listing-type="equation">j+k≦
    
    n-1
    
    space="preserve" listing-type="equation">j≧
    
    0;
    
    h≧
    
    0; and
    
    space="preserve" listing-type="equation">h * t.sub.s <
    
    500 ms;
    
    wherein b_c is the predetermined bandwidth of a frequency band and t_s is a time interval defined by the duration of the phonemes.
- 4. The method according to claim 1, which comprises classifying the spoken individual word with a neural network.
- 5. The method according to claim 4, which comprisesassigning each element of the correlogram K to each of a first number of neurons in an input plane of the neural network;
  - assigning each of the neurons of the input plane to each of a second number of neurons in an output plane of the neural network; and
    
    indicating a defined recognized individual word with an output of a respective one of the neurons of the output plane.
- 6. The method according to claim 4, which comprises calculating in each of the neurons with a nonlinear transfer function ##EQU4##
- 7. The method according to claim 1, which further comprises initiating a dialing process in a telephone set as a function of the recognized individual word of a telephone number associated with the individual word.

8. An apparatus for recognizing spoken words, comprising:
- a digital signal processor connected to a bus system, said bus system including data, address and control lines, and a program memory, a working memory, and an input/output unit each connected to said digital signal processor via said bus system;
  
  said digital signal processor including means for converting speech spoken during an expectation time period into an electrical speech signal;
  
  means for ascertaining an instantaneous spectral amplitude distribution of the speech signal during a time interval defined by a duration of a phoneme;
  
  means for representing the instantaneous spectral amplitude distribution as a spectral vector, each element of the spectral vector representing an amplitude of a frequency band having a predetermined bandwidth; and
  
  means for classifying an individual spoken word with a word-typical characteristic pattern derived from the spectral vector representing the instantaneous spectral amplitude distribution of the individual spoken word.
- View Dependent Claims (9)
- - 9. The apparatus according to claim 8, wherein said digital signal processor, said bus system, said program memory, said working memory, and said input/output unit are formed as an integrated circuit.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Siemens AG
Original Assignee
Siemens AG
Inventors
Tschirk, Wolfgang
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
MATTSON, ROBERT

Application Number

US08/185,800
Time in Patent Office

1,495 Days
Field of Search

395/2.11, 395/2.41, 395/2.68, 395/2.64, 395/2.6-2.63, 395/2.75
US Class Current

704/255
CPC Class Codes

G10L 15/16 using artificial neural net...

Method and neural network for speech recognition using a correlogram as input

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

Method and neural network for speech recognition using a correlogram as input

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links