Method of speech recognition resistant to convolutive distortion and additive distortion

US 7,165,028 B2
Filed: 09/20/2002
Issued: 01/16/2007
Est. Priority Date: 12/12/2001
Status: Active Grant

First Claim

Patent Images

1. A method of speech recognition comprising the steps of:

receiving speech utterances and sensing noise during speech pauses;

providing models for recognizing speech;

estimating additive noise during pauses in speech to provide additive noise estimate;

adapting the models to additive noise and convolutional channel bias to provide adapted models with adapted model states;

comparing input speech utterances with the adapted models for recognizing the speech and providing an alignment between an input speech utterance and recognized models andestimating the convolutional channel bias by an iterative statistical maximum-likelihood method using said alignment and said additive noise estimate that maximizes the channel bias based on the probability likelihood of each speech data input frame feature vector of said input speech utterance mapping to the existing adapted model states to generate the maximum-likelihood channel bias estimate.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognizer operating in both ambient noise (additive distortion) and microphone changes (convolutive distortion) is provided. For each utterance to be recognized the recognizer system adapts HMM mean vectors with noise estimates calculated from pre-utterance pause and a channel estimate calculated using an Estimation Maximization algorithm from previous utterances.

136 Citations

20 Claims

1. A method of speech recognition comprising the steps of:
- receiving speech utterances and sensing noise during speech pauses;
  
  providing models for recognizing speech;
  
  estimating additive noise during pauses in speech to provide additive noise estimate;
  
  adapting the models to additive noise and convolutional channel bias to provide adapted models with adapted model states;
  
  comparing input speech utterances with the adapted models for recognizing the speech and providing an alignment between an input speech utterance and recognized models andestimating the convolutional channel bias by an iterative statistical maximum-likelihood method using said alignment and said additive noise estimate that maximizes the channel bias based on the probability likelihood of each speech data input frame feature vector of said input speech utterance mapping to the existing adapted model states to generate the maximum-likelihood channel bias estimate.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 2. The method of claim 1 wherein parameters of the models are based on log spectral parameters.
  - 3. The method of claim 1 wherein parameters of the models are mel frequency cepstral parameters.
  - 4. The method of claim 1 wherein said estimating step includes calculating the average of the log spectral domain during speech pauses.
  - 5. The method of claim 1 wherein said adapting step includes adapting log spectral domain model parameters to a log spectral domain noise estimate and a log spectral domain channel estimate.
  - 6. The method of claim 1 wherein said convolutional channel bias estimate includes estimating the log spectral domain channel bias by the maximum-likelihood method using said output alignment and said additive noise estimate to generate the maximum-likelihood spectral domain channel bias estimate.
  - 7. The method of claim 1 wherein the models include first order time derivative features and the adapting step includes adaptation of first order time derivative features.
  - 8. The method of claim 7 wherein the first order time derivative features are first order time derivatives of log spectral domain parameters.
  - 9. The method of claim 1 wherein the models include second order time and the adapting step includes adaptation of second order time derivative features.
  - 10. The method of claim 9 wherein said second order time derivative features are second order time derivatives of log spectral domain parameters.
  - 12. The recognizer of claim 1 wherein parameters of the models are based on log spectral parameters.
  - 13. The recognizer of claim 1 wherein parameters of the models are mel frequency cepstral parameters.
  - 14. The recognizer of claim 1, wherein said convolutional channel bias estimator estimates the log spectral domain channel bias by the maximum-likelihood method using said output alignment and said additive noise estimate to generate the maximum-likelihood spectral domain channel bias estimate.
  - 15. The recognizer of claim 1 wherein the models include first order time derivative features and the adapter includes adaptation of first order time derivative features.
  - 16. The recognizer of claim 15 wherein the first order time derivative features are first order time derivatives of log spectral domain parameters.
  - 17. The recognizer of claim 1 wherein the models include second order time derivative features and the adapter includes adaptation of second order time derivative features.
  - 18. The recognizer of claim 17 wherein said second order time derivative features are second order time derivatives of log spectral domain parameters.
  - 19. The recognizer of claim 1, wherein said noise estimate is determined by calculating the average of the log spectral domain during speech pauses.
  - 20. The recognizer of claim 19, wherein said model adapter subsystem adapts log spectral domain model parameters to a log spectral domain noise estimate and a log spectral domain channel estimate.

11. A speech recognizer comprising:
- a microphone and /or sensor for receiving speech utterances and sensing noise during pauses;
  
  models for recognizing speech;
  
  an additive noise estimator for estimating additive noise during pauses in speech to provide an additive noise estimate;
  
  a model adapter subsystem for adapting the models to additive noise and convolutional channel bias to provide adapted models with adapted model states;
  
  a recognizer subsystem which compares input speech utterances with the adapted models for recognizing the speech and outputs an alignment between an input speech utterance and recognized models anda convolutional channel bias estimator that estimates the convolutional channel bias by a an iterative statistical maximum-likelihood method using said alignment and said additive noise estimate that maximizes the channel bias based on the probability likelihood of each speech data input frame feature vector of said input speech utterance mapping to the existing adapted model states to generate a maximum-likelihood channel bias estimate.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Texas Instruments, Inc.
Original Assignee
Texas Instruments, Inc.
Inventors
Gong, Yifan
Primary Examiner(s)
Opsasnick; Michael N.

Application Number

US10/251,734
Publication Number

US 20030115055A1
Time in Patent Office

1,579 Days
Field of Search

704/233, 704/256
US Class Current

704/233
CPC Class Codes

G10L 15/065 Adaptation

G10L 15/20 Speech recognition techniqu...

Method of speech recognition resistant to convolutive distortion and additive distortion

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

136 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method of speech recognition resistant to convolutive distortion and additive distortion

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

136 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links