Speech recognition method using time-frequency masking mechanism

US 5,459,815 A
Filed: 06/21/1993
Issued: 10/17/1995
Est. Priority Date: 06/25/1992
Status: Expired due to Fees

First Claim

Patent Images

1. A speech recognition method in which an input speech in converted to a time sequence of a feature vector, said feature vector including one of a spectrum and a cepstrum, and a distance or probability between the input speech time sequence and a time sequence of the feature vector or a statistical model thereof, is calculated for recognition, comprising the steps of:

effecting a time frequency masking by an operation of obtaining a masked speech spectrum by subtracting, from speech spectrum at present, a masking pattern which is a function of frequency obtained by smoothing immediately preceding speech spectrum by time and frequency; and

recognizing the speech by using the masked speech spectrum obtained by the above described operation at every time point.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition method in which input speech signals are converted to digital signals and then time sequentially converted to cepstrum coefficients or logarithmic spectra. Dynamic spectrum time sequence is obtained by time frequency filtering of cepstrum coefficients, or masked spectrum time sequence is obtained by time frequency masking of the logarithmic vector time sequence. Based on the dynamic cepstrum time sequence or masked spectrum time sequence obtained in this manner, speech is recognized.

29 Citations

View as Search Results

13 Claims

1. A speech recognition method in which an input speech in converted to a time sequence of a feature vector, said feature vector including one of a spectrum and a cepstrum, and a distance or probability between the input speech time sequence and a time sequence of the feature vector or a statistical model thereof, is calculated for recognition, comprising the steps of:
- effecting a time frequency masking by an operation of obtaining a masked speech spectrum by subtracting, from speech spectrum at present, a masking pattern which is a function of frequency obtained by smoothing immediately preceding speech spectrum by time and frequency; and
  
  recognizing the speech by using the masked speech spectrum obtained by the above described operation at every time point.

2. A speech recognition method, comprising the steps of:
- converting an input speech to a digitized speech signal;
  
  converting said digitized speech signal to cepstrum coefficients at every prescribed time interval;
  
  obtaining a time sequence of dynamic cepstrum by subtracting a masking pattern from an input speech cepstrum at present; and
  
  recognizing the speech by using said dynamic cepstrum.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9)
- - 3. The speech recognition method according to claim 2, whereinsaid step of converting to said cepstrum coefficients includes the steps of:
    - segmenting said digitized speech signal at every prescribed time interval and obtaining an auto-correlation coefficient vector; and
      
      calculating a linear predictive coefficient vector based on said auto-correlation coefficient vector.
  - 4. The speech recognition method according to claim 2, whereinsaid step of converting to said cepstrum coefficients includes the step of segmenting said digitized speech signal at every prescribed time interval and obtaining a logarithmic spectrum by Fourier transformand calculating a cepstrum coefficient vector by inverse Fourier transform of the logarithmic spectrum.
  - 5. The speech recognition method according to claim 2, whereinsaid step of recognizing the speech includes the steps of:
    - assigning the closest one of the centroid vectors obtained from a number of training samples of dynamic cepstrum vectors to the time sequence of centroid vectors of said dynamic cepstrum for an input speech, to generate a sequence of vector code numbers; and
      
      recognizing said sequence of vector code numbers.
  - 6. The speech recognition method according to claim 5, further comprising the step of:
    - collecting training samples represented by said sequence of vector code numbers and learning the same in accordance with a prescribed algorithm;
      
      whereinsaid step of generating said sequence of vector code numbers includes the step of recognizing a sequence of vector code numbers of the input speech to be recognized, based on the result of learning in accordance with said prescribed algorithm.
  - 7. The speech recognition method according to claim 6, whereinsaid step of learning includes the step of learning by using Hidden Markov Models.
  - 8. The speech recognition method according to claim 2, whereinsaid step of recognizing an input speech sound includes the step of learning the probability of the spectral features of training speed units including phenomes or words.
  - 9. The speech recognition method according to claim 8, whereinsaid step of recognizing the speech includes the step of recognizing the input speech represented by the dynamic cepstrum time sequence by using the result of said learning.

10. A speech recognition method, comprising the steps of:
- converting an input speech to a digitized speech signal;
  
  segmenting said digitized speech signal at every prescribed time interval in order to obtain a logarithmic spectrum time sequence by Fourier transform;
  
  effecting a time frequency masking by an operation of obtaining masked speech spectrum by subtracting, from speech spectrum at present, a masking pattern which is a function of frequency obtained by smoothing immediately preceding speech spectrum by time and frequency for obtaining a masked spectrum time sequence; and
  
  recognizing the speech by using said masked spectrum time sequence.
- View Dependent Claims (11, 12, 13)
- - 11. The speech recognition method according to claim 10, whereinsaid step of recognizing the speech includes the step of recognizing the speech by calculating a feature vector representing the same content as a dynamic cepstrum including said masked spectrum.
  - 12. The speech recognition method according to claim 11, whereinsaid step of recognizing the input speech includes the step of recognizing the speech by a method of dynamic time warping.
  - 13. The speech recognition method according to claim 11, whereinsaid step of recognizing the input speech includes the steps of:
    - storing as a template, typical speech sound of a word to be recognized as it is, or storing as a template, an average of a plurality of typical speech sounds of the word to be recognized; and
      
      calculating a distance between said registered word template and the time sequence of said masked spectrum of the input speech to be recognized by dynamic time warping, and recognizing the speech based on this distance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
ATR Auditory and Visual Perception Research Laboratories
Original Assignee
ATR Auditory and Visual Perception Research Laboratories
Inventors
Aikawa, Kiyoaki, Tohkura, Yoh'ichi, Kawahara, Hideki
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
Onka, Thomas

Application Number

US08/079,425
Time in Patent Office

848 Days
Field of Search

381/42, 381/50, 395/2.4-2.63
US Class Current

704/254
CPC Class Codes

G10L 15/142 Hidden Markov Models [HMMs]

Speech recognition method using time-frequency masking mechanism

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

29 Citations

13 Claims

Specification

Use Cases

Quick Links

Others

Speech recognition method using time-frequency masking mechanism

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

29 Citations

13 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others