Adaptive speech recognition with selective input data to a speech classifier

US 6,044,343 A
Filed: 06/27/1997
Issued: 03/28/2000
Est. Priority Date: 06/27/1997
Status: Expired due to Term

First Claim

Patent Images

1. A speech recognition system comprising:

a first speech signal preprocessor to receive first input data representing a speech input signal and having first speech input signal preclassifying output data;

a second speech signal preprocessor to receive second input data representing the speech input signal and having second speech input signal preclassifying output data;

a mixer to receive the first and second speech input signal preclassifying output data and having output data represented by a selected mix of the first and second speech input signal preclassifying output data;

a selection control circuit coupled to the mixer to determine the selected mix of the first and second speech input signal preclassifying output data by determining an appropriate balance between speech recognition accuracy of the speech recognition system and a speech recognition processing speed of the speech recognition system; and

a speech classifier to receive the selected mix and having output data to classify the speech input signal as recognized speech.

View all claims

10 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

One embodiment of a speech recognition system is organized with speech input signal preprocessing and feature extraction followed by a fuzzy matrix quantizer (FMQ) designed with respective codebook sets at multiple signal to noise ratios. The FMQ quantizes various training words from a set of vocabulary words and produces observation sequences O output data to train a hidden Markov model (HMM) processes λj and produces fuzzy distance measure output data for each vocabulary word codebook. A fuzzy Viterbi algorithm is used by a processor to compute maximum likelihood probabilities PR(O|λj) for each vocabulary word. The fuzzy distance measures and maximum likelihood probabilities are mixed in a variety of ways to preferably optimize speech recognition accuracy and speech recognition speed performance.

Citations

35 Claims

1. A speech recognition system comprising:
- a first speech signal preprocessor to receive first input data representing a speech input signal and having first speech input signal preclassifying output data;
  
  a second speech signal preprocessor to receive second input data representing the speech input signal and having second speech input signal preclassifying output data;
  
  a mixer to receive the first and second speech input signal preclassifying output data and having output data represented by a selected mix of the first and second speech input signal preclassifying output data;
  
  a selection control circuit coupled to the mixer to determine the selected mix of the first and second speech input signal preclassifying output data by determining an appropriate balance between speech recognition accuracy of the speech recognition system and a speech recognition processing speed of the speech recognition system; and
  
  a speech classifier to receive the selected mix and having output data to classify the speech input signal as recognized speech.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The speech recognition system of claim 1 wherein the selection control circuit is capable of dynamically selecting the selected mix based on predetermined parameters.
  - 3. The speech recognition system of claim 1 further comprising:
    - a noise level detection sensor to provide a noise level parameter output signal to the selection control circuit.
  - 4. The speech recognition system of claim 1 wherein the first speech signal preprocessor comprises:
    - a fuzzy matrix quantizer, wherein the first speech input signal preclassifying output data of the fuzzy matrix quantizer are fuzzy distance measures between a speech input signal representation matrix and respective fuzzy matrix codebooks.
  - 5. The speech recognition system of claim 1 wherein the second speech signal preprocessor comprises:
    - a plurality of hidden Markov models each modeling a respective word in a predetermined vocabulary, wherein the second input data representing the speech input signal is an observation sequence produced by the first speech signal preprocessor; and
      
      a probability module to determine respective probabilities of each hidden Markov model producing the observation sequence representing the speech input signal.
  - 6. The speech recognition system of claim 5 wherein the probability module includes a Viterbi algorithm.
  - 7. The speech recognition system of claim 1 wherein the first input data representing the speech input signal comprises X order line spectral pair coefficients.
  - 8. The speech recognition system of claim 1 wherein the speech classifier is a multilevel perceptron neural network.
  - 9. The speech recognition system of claim 1 wherein the selected mix of the first and second speech input signal preclassifying output data is selected from the group comprised of(i) the first speech input signal preclassifying output data alone,(ii) the second speech input signal preclassifying output data alone,(iii) a combination of the first and second speech input signal preclassifying output data,(iv) the first speech input signal preclassifying output data and the second speech input signal preclassifying output data,(v) the first speech input signal preclassifying output data and the combination of the first and second speech input signal preclassifying output data,(vi) the second speech input signal preclassifying output data and the combination of the first and second speech input signal preclassifying output data, and(vii) the first speech input signal preclassifying output data, the combination of the first and second speech input signal preclassifying output data, and the second speech input signal preclassifying output data.
  - 10. The speech recognition system of claim 1 wherein the first speech input signal preclassifying output data is fuzzy distance measures between the first input data representing the speech input signal and respective reference codebooks of the first speech signal preprocessor.
  - 11. The speech recognition system of claim 1 further comprising:
    - decision logic coupled to the speech classifier to receive the output data from the speech classifier and to classify the speech input signal as a word selected from a predetermined vocabulary.
  - 12. The speech recognition system of claim 1 further comprising:
    - a processor;
      
      a memory coupled to the processor and having processor executable code for implementing the first and second speech signal preprocessors, the mixer and the speech classifier.
  - 13. The speech recognition system of claim 1 wherein the selection control circuit is capable of determining an appropriate balance between the speech recognition accuracy of the speech recognition system and the speech recognition processing speed of the speech recognition system in accordance with factors affecting speech recognition accuracy and speech recognition processing speed, wherein such factors are selected from the group comprising a vocabulary size of the speech recognition system and noise levels of an environment of the speech recognition system.

14. A speech recognition system comprising:
- a speech input signal feature extractor to provide parameters representing features of T groups of N speech input signal frames;
  
  a vocabulary of u words;
  
  a matrix quantizer to receive the parameters and to provide (i) a series of observation sequences for each of the T groups of the N speech input signal frames and (ii) distance measure output data between the parameters and u respective matrix codebooks;
  
  a plurality of u hidden Markov models coupled to the matrix quantizer to receive the observation sequences;
  
  a Viterbi algorithm module to receive the observation sequences and provide respective probabilities that the respective hidden Markov models produced a respective observation sequence;
  
  a selection control circuit to determine when the distance measure output, the probabilities, and a combination of the distance measure output and the probabilities are included in a plurality of selected mixes by determining an appropriate balance between speech recognition accuracy of the speech recognition system and a speech recognition processing speed of the speech recognition system;
  
  a mixer coupled to the matrix quantizer and the Viterbi algorithm module for mixing the distance measure output and the probabilities into one set of mixed output data based on the selected mixes; and
  
  a neural network coupled to the mixer to receive the mixed output data set and determine which of the u vocabulary words most probably represents the speech input signal.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The speech recognition system of claim 14 wherein the matrix quantizer is a fuzzy matrix quantizer, the distance measures are fuzzy distance measures, and the observation sequence is a vector of indices representing the relative closeness of each of the parameters and codewords in the respective matrix codebooks.
  - 16. The speech recognition system of claim 14 wherein the predetermined mixed output data sets include:
    - (i) the distance measure output preclassifying output data alone,(ii) the probabilities preclassifying output data alone,(iii) a combination of the distance measure output and probabilities preclassifying output data,(iv) the distance measure output preclassifying output data and the probabilities preclassifying output data,(v) the distance measure output preclassifying output data and the combination of the distance measure output and probabilities preclassifying output data,(vi) the probabilities preclassifying output data and the combination of the distance measure output and probabilities preclassifying output data, and(vii) the distance measure output preclassifying output data, the combination of the distance measure output and probabilities preclassifying output data, and the probabilities preclassifying output data.
  - 17. The speech recognition system of claim 14 wherein the speech input signal feature extractor comprises:
    - an X order linear predictive code (LPC) module to determine X LPC coefficients; and
      
      a line spectral pair (LSP) module to determine X LSPs from the X LPC coefficients.
  - 18. The speech recognition system of claim 14 wherein the selection control circuit is capable of determining an appropriate, balance between the speech recognition accuracy of the speech recognition system and the speech recognition processing speed of the speech recognition system in accordance with factors affecting speech recognition accuracy and speech recognition processing speed, wherein such factors are selected from the group comprising a vocabulary size of the speech recognition system and noise levels of an environment of the speech recognition system.
  - 19. The speech recognition system of claim 14 further comprising a noise level detector to provide a noise level parameter output signal to the selection control circuit.

20. A speech recognition system comprising:
- means for processing first speech input signal data to preclassify the speech input signal and produce first preclassification output data, wherein the first speech input signal data represents a speech input signal;
  
  means for processing second speech input signal data to preclassify the speech input signal and produce second preclassification output data;
  
  means, coupled to both means for processing, for determining when to include the first speech input signal, the second speech input signal, and a combination of the first and second speech input signals in a preferred mix of the preclassification output data by determining an appropriate balance between speech recognition accuracy of the speech recognition system and a speech recognition processing speed of the speech recognition system;
  
  means, coupled to the means for determining, for mixing the first and second preclassification output data in accordance with the determined preferred mix;
  
  means, coupled to the means for mixing, for classifying the speech input signal based on the preferred mix of preclassification output data.
- View Dependent Claims (21)
- - 21. The speech recognition system of claim 20 further comprising means to provide a noise level parameter output signal to the means for determining.

22. A speech recognition method comprising the steps of:
- processing first speech input signal data to preclassify the speech input signal and produce first preclassification output data, wherein the first speech input signal data represents a speech input signal;
  
  processing second speech input signal data to preclassify the speech input signal and produce second preclassification output data;
  
  determining when to include the first speech input signal, the second speech input signal, and a combination of the first and second speech input signals in a preferred mix of the preclassification output data by determining at least an appropriate balance between speech recognition accuracy and a speech recognition processing speed;
  
  mixing the first and second preclassification output data in accordance with the preferred mix; and
  
  classifying the speech input signal based on the preferred mix of preclassification output data.
- View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31)
- - 23. The speech recognition method of claim 22 wherein step of processing first speech input signal data comprises the step of:
    - fuzzy matrix quantizing a plurality of the first speech input signal data;
      
      determining a fuzzy distance measure between the fuzzy matrix quantized first speech input signal data and a plurality of fuzzy matrix codebooks, wherein the first preclassification output data includes the fuzzy distance measure.
  - 24. The speech recognition method of claim 22 further comprising the steps of:
    - training a first speech processor for processing the first speech input signal data with temporally related data from speech input signals corrupted with acoustic noise at a plurality of signal to noise ratios;
      
      training a second speech processor for processing the second speech input signal data with temporally related data from the speech input signals corrupted with the acoustic noise at the plurality of signal to noise ratios; and
      
      training a speech classifier to classify the speech input signal with a plurality of preclassification output data mixes.
  - 25. The speech recognition method of claim 22 wherein the processing first speech input signal data step further comprises the step of:
    - determining an observation sequence of indices representing a relative closeness between the first speech input signal data and a plurality of codebooks.
  - 26. The speech recognition method of claim 22 further comprising the steps of:
    - receiving TO speech input signals, wherein the TO speech input signals define an input speech word;
      
      representing each of the TO speech input signals with P LSP coefficients;
      
      representing each group of N frames of the speech input signals with a respective P×
      
      N matrix;
      
      determining the relative closeness between each P×
      
      N matrix and each codeword in a fuzzy matrix codebook, wherein an observation sequence vector of indices is produced for each P×
      
      N matrix, and the indices are the second speech input signal data;
      
      determining a distance between each P×
      
      N matrix and each of the codewords; and
      
      weighting the distance between each P×
      
      N matrix and each of the codewords with respective indices of the observation sequence vector corresponding to the respective P×
      
      N matrix to obtain an overall fuzzy distance measure, wherein the fuzzy distance measure is the first preclassification output data.
  - 27. The speech recognition method of claim 22 wherein the step of determining the preferred mix of the preclassification output data comprises the steps of:
    - selecting a mix of the preclassification output data to obtain a predetermined satisfactory recognition accuracy in the least amount of time.
  - 28. The speech recognition method of claim 27 wherein the preferred mix is selected from the group comprising(i) the first speech input signal preclassifying output data alone,(ii) the second speech input signal preclassifying output data alone,(iii) a combination of the first and second speech input signal preclassifying output data,(iv) the first speech input signal preclassifying output data and the second speech input signal preclassifying output data,(v) the first speech input signal preclassifying output data and the combination of the first and second speech input signal preclassifying output data,(vi) the second speech input signal preclassifying output data and the combination of the first and second speech input signal preclassifying output data, and(vii) the first speech input signal preclassifying output data, the combination of the first and second speech input signal preclassifying output data, and the second speech input signal preclassifying output data.
  - 29. The speech recognition method of claim 22 wherein second speech input signal data is an observation sequence of indices of relative closeness of a representation of the speech input signal to codewords in a reference codebook, and the step of processing second speech input signal data comprises the step of:
    - determining with a fuzzy Viterbi algorithm a respective probability for each of u hidden Markov models that the hidden Markov model produced the observation sequence, wherein the second preclassification output data are the u determined respective probabilities.
  - 30. The speech recognition method of claim 22 wherein the step of classifying the speech input signal comprises the step of:
    - classifying the speech input signal with a multilayer perceptron neural network.
  - 31. The speech recognition method of claim 22 wherein determining an appropriate balance between the speech recognition accuracy and the speech recognition processing speed comprises utilizing factors affecting speech recognition accuracy and speech recognition processing speed, wherein such factors are selected from the group comprising a vocabulary size and noise levels of an environment.

32. A speech recognition system comprising:
- a first speech signal preprocessor to receive first input data representing a speech input signal and having first speech input signal preclassifying output data;
  
  a second speech signal preprocessor to receive second input data representing the speech input signal and having second speech input signal preclassifying output data;
  
  a mixer to receive the first arid second speech input signal preclassifying output data and having output data represented by a selected mix of the first and second speech input signal preclassifying output data;
  
  a non-neural network selection control circuit coupled to the mixer to determine when to include the first speech input signal, the second speech input signal, and a combination of the first and second speech input signals in the selected mix; and
  
  a speech classifier to receive the selected mix and having output data to classify the speech input signal as recognized speech.

33. A speech recognition system comprising:
- a first speech signal preprocessor to receive first input data representing a speech input signal and having first speech input signal preclassifying output data;
  
  a second speech signal preprocessor to receive second input data representing the speech input signal and having second speech input signal preclassifying output data;
  
  a mixer to receive the first and second speech input signal preclassifying output data and having output data represented by a selected mix of the first and second speech input signal preclassifying output data;
  
  a selection control circuit coupled to the mixer to determine when to include the first speech input signal, the second speech input signal, and a combination of the first and second speech input signals in the selected mix;
  
  a speech classifier to receive the selected mix and having output data to classify the speech input signal as recognized speech; and
  
  a noise level detector to provide a noise level parameter output signal to the selection control circuit.
- View Dependent Claims (34, 35)
- - 34. The speech recognition system of claim 33 wherein the noise level detector comprises a noise level detection sensor to detect noise levels which may corrupt at least one of the first input data and the second input data.
  - 35. The speech recognition system of claim 33 wherein the noise level detector comprises:
    - a database of noise level information corresponding to noise levels at different traveling speeds of a vehicle; and
      
      a data retriever to retrieve noise level information from the database of noise level information corresponding to a traveling speed of the vehicle.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
RPX Corporation
Original Assignee
Advanced Micro Devices, Inc.
Inventors
Cong, Lin, Asghar, Safdar M.
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Storm, Donald L.

Application Number

US08/883,978
Time in Patent Office

1,005 Days
Field of Search

704/236, 704/222, 704/255, 704/256, 704/232, 704/243, 704/244, 704/245
US Class Current

704/236
CPC Class Codes

G10L 15/063 Training

G10L 15/20 Speech recognition techniqu...

Adaptive speech recognition with selective input data to a speech classifier

First Claim

10 Assignments

0 Petitions

Accused Products

Abstract

Citations

35 Claims

Specification

Solutions

Use Cases

Quick Links

Adaptive speech recognition with selective input data to a speech classifier

First Claim

10 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

35 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links