Apparatus and method for speech recognition

US 20030220791A1
Filed: 04/25/2003
Published: 11/27/2003
Est. Priority Date: 04/26/2002
Status: Abandoned Application

First Claim

Patent Images

1. A speech recognition apparatus for applying speaker adaptation to acoustic models based on feature vectors of utterances, said apparatus comprising:

speech recognition device for comparing said acoustic models against said feature vector of utterances, and outputting a recognition result specifying a sequence of acoustic models having maximum likelihood, a first score indicating a value of the maximum likelihood, and a second score indicating a value of second highest likelihood;

judging device for comparing an evaluation value based on said first score and said second score with a pre-set threshold value, and judging said recognition result as being true when said evaluation value is in a specific relation with respect to said threshold value; and

speaker adaptation processing device for applying speaker adaptation to said acoustic models once said judging device judges said recognition result as being true.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A true/false judgment on a result of speech recognition is made with high accuracy using a less volume of processing. By comparing acoustic models HMMsb against the feature vector sequence V(n) of utterances, a recognition result RCG specifying the acoustic model HMMsb having the maximum likelihood, a first score FSCR indicting the value of the maximum likelihood, and a second score SSCR indicating the value of the second highest likelihood are found. Then, by comparing an evaluation value FSCR×(FSCR−SSCR) based on the first score FSCR and the second score SSCR with a pre-set threshold value THD, a true/false judgment on the recognition result RCG is made. When the recognition result RCG is judged as being true, speaker adaptation is applied to the acoustic models HMMsb, and when the recognition result RCG is judged as being false, speaker adaptation is not applied to the acoustic models HMMsb. It is thus possible to improve the accuracy of speaker adaptation.

Robust speaker adaptation which remains unsusceptible to influences of background noises is achieved. Initial acoustic models Mc are stored in advance in a speaker adapted model storing section, and a noise adapting section generates noise adapted models Mc′ by applying noise adaptation to the initial acoustic models Mc pre-stored in the speaker adapted model storing section. A speaker adaptation parameter calculating section generates speaker adaptation parameters P based on the noise adapted models Mc′ and a feature vector sequence V(n) of utterances from the speaker, and a acoustic model updating section generates speaker adapted models Mc″ by applying speaker adaptation processing to the initial acoustic models Mc using the speaker adaptation parameters P. The initial acoustic models Mc are replaced with the speaker adapted models Mc″ so as to be updated and newly stored in the speaker adapted model storing section. At the time of speech recognition, the noise adapting section generates speaker adapted models Mreg adapted to noises by applying noise adaptation to the updated and newly stored speaker adapted models Mc″ instead of the initial acoustic models Mc. Then, a speech recognition section performs speech recognition by comparing sequences formed by the speaker adapted models Mreg adapted to noises against the feature vector sequence V(n) of utterances to be recognized.

91 Citations

View as Search Results

16 Claims

1. A speech recognition apparatus for applying speaker adaptation to acoustic models based on feature vectors of utterances, said apparatus comprising:
- speech recognition device for comparing said acoustic models against said feature vector of utterances, and outputting a recognition result specifying a sequence of acoustic models having maximum likelihood, a first score indicating a value of the maximum likelihood, and a second score indicating a value of second highest likelihood;
  
  judging device for comparing an evaluation value based on said first score and said second score with a pre-set threshold value, and judging said recognition result as being true when said evaluation value is in a specific relation with respect to said threshold value; and
  
  speaker adaptation processing device for applying speaker adaptation to said acoustic models once said judging device judges said recognition result as being true.
- View Dependent Claims (2, 3, 4)
- - 2. The speech recognition apparatus according to claim 1, wherein:
    - said judging device judges said recognition result as being false when said evaluation value is not in the specific relation with respect to said threshold value; and
      
      said speaker adaptation processing device does not apply speaker adaptation to said acoustic models when said recognition result is judged as being false.
  - 3. The speech recognition apparatus according to claim 1, wherein said evaluation value is computed from a difference value between said first score and said second score.
  - 4. The speech recognition apparatus according to claim 2, further comprising device for inhibiting an output of said recognition result and providing information indicating that said recognition result is false when said judging device judges said recognition result as being false.

5. A speech recognition method for applying speaker adaptation to acoustic models based on feature vectors of utterances, said method comprising:
- a first step of comparing said acoustic models against said feature vector of utterances, and outputting a recognition result specifying a sequence of acoustic models having maximum likelihood, a first score indicating a value of the maximum likelihood, and a second score indicating a value of second highest likelihood;
  
  a second step of comparing an evaluation value based on said first score and said second score with a pre-set threshold value, and judging said recognition result as being true when said evaluation value is in a specific relation with respect to said threshold value; and
  
  a third step of applying speaker adaptation to said acoustic models when said recognition result is judged as being true in said second step.
- View Dependent Claims (6, 7, 8)
- - 6. The speech recognition method according to claim 5, wherein:
    - in said second step, said recognition result is judged as being false when said evaluation value is not in the specific relation with respect to said threshold value; and
      
      in said third step, speaker adaptation is not applied to said acoustic models when said recognition result is judged as being false.
  - 7. The speech recognition method according to claim 5, wherein said evaluation value is computed from a difference value between said first score and said second score.
  - 8. The speech recognition method according to claim 6, wherein when said recognition result is judged as being false in said second step, an output of said recognition result is inhibited and information is provided to indicate that said recognition result is false.

9. A speech recognition apparatus, comprising:
- storage device having initial acoustic models;
  
  noise adapting device for generating noise adapted models by applying noise adaptation to said initial acoustic models in said storage device using background noises at a time of speaker adaptation;
  
  speaker adaptation parameter calculating device for performing speaker adaptation computation with respect to said noise adapted models generated in said noise adapting device, using utterances uttered at the time of said speaker adaptation, and thereby calculating a speaker adaptation parameter for converting said noise adapted models into noise-superimposed speaker adapted models; and
  
  acoustic model updating device for generating speaker adapted models by applying speaker adaptation to said initial acoustic models in said storage device using said speaker adaptation parameter, and replacing said initial acoustic models with said speaker adapted models so as to be updated and newly stored in said storage device.
- View Dependent Claims (10)
- - 10. The speech recognition apparatus according to claim 9, further comprising:
    - recognition processing device for performing speech recognition processing at a time of speech recognition, wherein said noise adapting device generates speaker adapted models adapted to noises by applying noise adaptation to said speaker adapted models updated and newly stored in said storage device, using background noises during a silent period at the time of said speech recognition, and supplies said speaker adapted models adapted to noises to said speech recognition device as acoustic models for speech recognition of the utterances.

11. A speech recognition apparatus, comprising:
- storage device having initial acoustic models;
  
  noise adapting device for generating noise adapted models by applying noise adaptation to said initial acoustic models in said storage device, using background noises during a silent period at a time of speech recognition;
  
  recognition processing device for performing speech recognition by comparing utterances uttered during an utterance period at the time of said speech recognition and to be subjected to speech recognition, against said noise adapted models generated in said noise adapting device;
  
  speaker adaptation parameter calculating device for performing speaker adaptation computation with respect to said noise adapted models generated in said noise adapting device, using said utterances to be subjected to speech recognition, and thereby calculating a speaker adaptation parameter for converting said noise adapted models into noise-superimposed speaker adapted models; and
  
  acoustic model updating device for generating speaker adapted models by applying speaker adaptation to said initial acoustic models in said storage device using said speaker adaptation parameter, and replacing said initial acoustic models with said speaker adapted models so as to be updated and newly stored in said storage device.
- View Dependent Claims (12)
- - 12. The speech recognition apparatus according to claim 11, wherein said speaker adaptation parameter calculating device and said acoustic model updating device generate said speaker adapted models and replace said initial acoustic models with said speaker adapted models so as to be updated and newly stored in said storage device when a reliability of a recognition result from said recognition processing device is high.

13. A speech recognition method, comprising:
- a noise adaptation processing step of generating noise adapted models by applying noise adaptation to initial acoustic models stored in storage device, using background noises at a time of speaker adaptation;
  
  a speaker adaptation parameter calculating step of performing speaker adaptation computation with respect to said noise adapted models generated in said noise adaptation processing step, using utterances uttered at the time of said speaker adaptation, and thereby calculating a speaker adaptation parameter for converting said noise adapted models into noise-superimposed speaker adapted models; and
  
  a acoustic model updating step of generating speaker adapted models by applying speaker adaptation to said initial acoustic models in said storage device using said speaker adaptation parameter, and replacing said initial acoustic models with said speaker adapted models so as to be updated and newly stored in said storage device.
- View Dependent Claims (14)
- - 14. The speech recognition method according to claim 13, wherein:
    - in said noise adaptation processing step, speaker adapted models adapted to noises are generated by applying noise adaptation to said speaker adapted models updated and newly stored in said storage device, using background noises during a silent period at a time of speech recognition; and
      
      said method further comprises a speech recognition processing step of performing speech recognition by comparing said speaker adapted models adapted to noises against utterances to be subjected to speech recognition during an utterance period at the time of said speech recognition.

15. A speech recognition method, comprising:
- a noise adaptation processing step of generating noise adapted models by applying noise adaptation to initial acoustic models stored in storage device, using background noises during a silent period at a time of speech recognition;
  
  a recognition processing step of performing speech recognition by comparing utterances to be uttered during an utterance period at the time of said speech recognition and to be subjected to speech recognition, against said noise adapted models generated in said noise adaptation processing step;
  
  a speaker adaptation parameter calculating step of performing speaker adaptation computation with respect to said noise adapted models generated in said noise adaptation processing step, using said utterances to be subjected to speech recognition, and thereby calculating a speaker adaptation parameter for converting said noise adapted models into noise-superimposed speaker adapted models; and
  
  a acoustic model update processing step of generating speaker adapted models by applying speaker adaptation to said initial acoustic models in said storage device using said speaker adaptation parameter, and replacing said initial acoustic models with said speaker adapted models so as to be updated and newly stored in said storage device.
- View Dependent Claims (16)
- - 16. The speech recognition method according to claim 15, wherein, in said speaker adaptation parameter calculating step and said acoustic model update processing step, said speaker adapted models are generated in such a manner that said initial acoustic models are replaced with said speaker adapted models so as to be updated and newly stored in said storage device when a reliability of a recognition result in said recognition processing step is high.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Pioneer Corporation
Original Assignee
Pioneer Corporation
Inventors
Toyama, Soichi

Application Number

US10/422,969
Publication Number

US 20030220791A1
Time in Patent Office

Days
Field of Search
US Class Current

704/256
CPC Class Codes

G10L 15/07   to the speaker

G10L 15/20   Speech recognition techniqu...

G10L 21/0216   characterised by the method...

Apparatus and method for speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

91 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and method for speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

91 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links