System and method for speech recognition

US 20040260546A1
Filed: 04/23/2004
Published: 12/23/2004
Est. Priority Date: 04/25/2003
Status: Abandoned Application

First Claim

Patent Images

1. A speech recognition system having an initial noise model produced based on pre-estimated noise of a service environment, a clean speech model of noiseless speech, and an initial synthesized model produced by combining the initial noise model and the clean speech model, the system performing speech recognition by producing an utterance environment noise model from background noise of the service environment upon speech recognition, producing a sequence of feature vectors from noise-superimposed speech including an uttered voice and the background noise, producing an adaptive model by adapting the initial synthesized model using the utterance environment noise model and the initial noise model, and checking the adaptive model against the sequence of feature vectors, the speech recognition system comprising:

compensation means for providing compensation in accordance with the sequence of feature vectors upon producing the adaptive model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method include an initial noise model produced based on pre-estimated noise of a service environment and an initial synthesized model of a voice containing noise. The system and method produce an utterance environment noise model from background noise of the service environment upon speech recognition as well as a sequence of feature vectors from noise-superimposed speech including an uttered voice and the background noise. The system and method also produce an adaptive model by adapting the initial synthesized model using the utterance environment noise model, the initial noise model, and a compensation model, so that the adaptive model is checked against the sequence of feature vectors to perform speech recognition. Upon performing the speech recognition, a compensation model is created upon which the signal to noise ratio between the background noise present at the time of actual utterance of a voice and the uttered voice is reflected.

36 Citations

View as Search Results

20 Claims

1. A speech recognition system having an initial noise model produced based on pre-estimated noise of a service environment, a clean speech model of noiseless speech, and an initial synthesized model produced by combining the initial noise model and the clean speech model, the system performing speech recognition by producing an utterance environment noise model from background noise of the service environment upon speech recognition, producing a sequence of feature vectors from noise-superimposed speech including an uttered voice and the background noise, producing an adaptive model by adapting the initial synthesized model using the utterance environment noise model and the initial noise model, and checking the adaptive model against the sequence of feature vectors, the speech recognition system comprising:
- compensation means for providing compensation in accordance with the sequence of feature vectors upon producing the adaptive model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The speech recognition system according to claim 1, wherein the compensation means provides compensation in accordance with the sequence of feature vectors, the utterance environment noise model, and the clean speech model.
  - 3. The speech recognition system according to claim 1, wherein the compensation means provides compensation so as to make a signal to noise ratio of the adaptive model equal to a signal to noise ratio of the sequence of feature vectors.
  - 4. The speech recognition system according to claim 1, wherein the compensation means allows a compensation model for compensating a noise level upon the adaptation to compensate an adaptive parameter calculated using the utterance environment noise model and the initial noise model at the time of the adaptation.
  - 5. The speech recognition system according to claim 4, wherein the compensation means produces:
    - a differential vector by determining a difference between the sequence of feature vectors to be checked and the utterance environment noise model; and
      
      the compensation model by determining a difference between the clean speech model corresponding to the adaptive model to be checked and the differential vector.
  - 6. The speech recognition system according to claim 4, wherein the compensation means produces the compensation model for making a signal to noise ratio of the adaptive model equal to a signal to noise ratio of the sequence of feature vectors.
  - 7. The speech recognition system according to claim 5, wherein the compensation means comprises detection means for detecting a feature vector of a vowel from the sequence of feature vectors to be checked, produces the differential vector by determining a difference between the feature vector detected by the detection means and the utterance environment noise model, and produces the compensation model by determining a difference between the clean speech model corresponding to the vowel and the differential vector.
  - 8. The speech recognition system according to claim 5, wherein the compensation means comprises detection means for detecting a feature vector having a predetermined power level or more in the sequence of feature vectors to be checked, produces the differential vector by determining a difference between the feature vector detected by the detection means and the utterance environment noise model, and produces the compensation model by determining a difference between the clean speech model corresponding to a feature vector having the predetermined power level or more and the differential vector.
  - 9. The speech recognition system according to claim 4, wherein the compensation means comprises calculation means for determining an average of the compensation models generated in a predetermined period, and delivers an averaged compensation model provided by the calculation means.
  - 10. The speech recognition system according to claim 4, wherein the compensation means comprises calculation means for determining an average of a plurality of compensation models determined in accordance with a plurality of uttered voices, and delivers an averaged compensation model provided by the calculation means.

11. A speech recognition method comprising the steps of:
- providing an initial noise model produced based on pre-estimated noise of a service environment, a clean speech model of noiseless speech, and an initial synthesized model produced by combining the initial noise model and the clean speech model;
  
  producing an utterance environment noise model from background noise of the service environment upon speech recognition;
  
  producing a sequence of feature vectors from noise-superimposed speech including an uttered voice and the background noise;
  
  producing an adaptive model by adapting the initial synthesized model using the utterance environment noise model and the initial noise model; and
  
  checking the adaptive model against the sequence of feature vectors to perform speech recognition, wherein the step of producing the adaptive model includes the step of providing compensation in accordance with the sequence of feature vectors.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The speech recognition method according to claim 11, wherein the step of providing compensation is carried out by providing compensation in accordance with the sequence of feature vectors, the utterance environment noise model, and the clean speech model.
  - 13. The speech recognition method according to claim 11, wherein the step of providing compensation is carried out by providing compensation so as to make a signal to noise ratio of the adaptive model equal to a signal to noise ratio of the sequence of feature vectors.
  - 14. The speech recognition method according to claim 11, wherein the step of providing compensation is carried out by allowing a compensation model for compensating a noise level upon the adaptation to compensate an adaptive parameter calculated using the utterance environment noise model and the initial noise model at the time of the adaptation.
  - 15. The speech recognition method according to claim 14, wherein the step of providing compensation produces:
    - a differential vector by determining a difference between the sequence of feature vectors to be checked and the utterance environment noise model; and
      
      the compensation model by determining a difference between the clean speech model corresponding to the adaptive model to be checked and the differential vector.
  - 16. The speech recognition method according to claim 14, wherein the step of providing compensation produces the compensation model for making a signal to noise ratio of the adaptive model equal to a signal to noise ratio of the sequence of feature vectors.
  - 17. The speech recognition system according to claim 15, wherein the step of providing compensation comprises the steps of:
    - detecting a feature vector of a vowel from the sequence of feature vectors to be checked;
      
      producing the differential vector by determining a difference between the feature vector detected by the step of detecting the feature vector and the utterance environment noise model; and
      
      producing the compensation model by determining a difference between the clean speech model corresponding to the vowel and the differential vector.
  - 18. The speech recognition method according to claim 15, wherein the step of providing compensation comprising the steps of:
    - detecting a feature vector having a predetermined power level or more in the sequence of feature vectors to be checked;
      
      producing the differential vector by determining a difference between the feature vector detected in the step of detecting the feature vector and the utterance environment noise model; and
      
      producing the compensation model by determining a difference between the clean speech model corresponding to a feature vector having the predetermined power level or more and the differential vector.
  - 19. The speech recognition method according to claim 14, wherein the step of providing compensation comprises the steps of:
    - determining an average of the compensation models generated in a predetermined period; and
      
      delivering an averaged compensation model.
  - 20. The speech recognition method according to claim 14, wherein the step of providing compensation comprises the steps of:
    - determining an average of a plurality of compensation models determined in accordance with a plurality of uttered voices; and
      
      delivering an averaged compensation model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Pioneer Corporation
Original Assignee
Pioneer Corporation
Inventors
Seo, Hiroshi, Toyama, Soichi

Application Number

US10/830,458
Publication Number

US 20040260546A1
Time in Patent Office

Days
Field of Search
US Class Current

704/233
CPC Class Codes

G10L 15/20 Speech recognition techniqu...

System and method for speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

36 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

System and method for speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

36 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others