Acoustic model creation method as well as acoustic model creation apparatus and speech recognition apparatus

US 7,366,669 B2
Filed: 03/08/2004
Issued: 04/29/2008
Est. Priority Date: 03/14/2003
Status: Expired due to Fees

First Claim

Patent Images

1. An acoustic model creation method to create a syllabic HMM (Hidden Markov Model) which is an acoustic model, comprising:

generating a phoneme HMM set which includes phoneme HMMs corresponding to individual phonemes;

combining the phoneme HMMs of the phoneme HMM set so as to generate an initial phoneme-connected syllable HMM set which includes initial phoneme-connected syllable HMMs corresponding to individual syllables;

training the initial phoneme-connected syllable HMM set, thereby generating a phoneme-connected syllable HMM set being the acoustic model; and

conducting a preliminary experiment for the phoneme-connected syllable HMM set by using training speech data, any misrecognized syllable and a syllable connected to the misrecognized syllable being checked using results of the preliminary experiment and syllable label data prepared in correspondence with the training speech data, a combination between a correct answer syllable for the misrecognized syllable and a syllable connected to the misrecognized syllable being extracted as a syllable connection, that a syllable-connected HMM corresponding to the syllable connection being added into the phoneme-connected syllable HMM set so as to generate an initial phoneme-connected syllable HMM/syllable-connected HMM set, and then the initial phoneme-connected syllable HMM/syllable-connected HMM set being trained using the training speech data and the syllable label data, thereby generating a phoneme-connected syllable HMM/syllable-connected HMM set being the acoustic model,the numbers of times of misrecognition of such syllable connections in the preliminary experiment results being counted, and that, a syllable-connected HMM corresponding to any syllable connection whose number of times of misrecognition is at least a preset number, among the syllable connections extracted using the preliminary experiment results, is made a candidate for addition into the phoneme-connected syllable HMM set.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

To provide an acoustic model which can absorb the fluctuation of a phonemic environment in an interval longer than a syllable, with the number of parameters of the acoustic model suppressed to be small, a phoneme-connected syllable HMM/syllable-connected HMM set is generated in such a way that a phoneme-connected syllable HMM set corresponding to individual syllables is generated by combining phoneme HMMs. A preliminary experiment is conducted using the phoneme-connected syllable HMM set and training speech data. Any misrecognized syllable and the preceding syllable of the misrecognized syllable are checked using results of a preliminary experiment syllable label data. The combination between a correct answer syllable for the misrecognized syllable and the preceding syllable of the misrecognized syllable is extracted as a syllable connection. A syllable-connected HMM corresponding to this syllable connection is added into the phoneme-connected syllable HMM set. The resulting phoneme-connected syllable HMM set is trained using the training speech data and the syllable label data.

79 Citations

View as Search Results

14 Claims

1. An acoustic model creation method to create a syllabic HMM (Hidden Markov Model) which is an acoustic model, comprising:
- generating a phoneme HMM set which includes phoneme HMMs corresponding to individual phonemes;
  
  combining the phoneme HMMs of the phoneme HMM set so as to generate an initial phoneme-connected syllable HMM set which includes initial phoneme-connected syllable HMMs corresponding to individual syllables;
  
  training the initial phoneme-connected syllable HMM set, thereby generating a phoneme-connected syllable HMM set being the acoustic model; and
  
  conducting a preliminary experiment for the phoneme-connected syllable HMM set by using training speech data, any misrecognized syllable and a syllable connected to the misrecognized syllable being checked using results of the preliminary experiment and syllable label data prepared in correspondence with the training speech data, a combination between a correct answer syllable for the misrecognized syllable and a syllable connected to the misrecognized syllable being extracted as a syllable connection, that a syllable-connected HMM corresponding to the syllable connection being added into the phoneme-connected syllable HMM set so as to generate an initial phoneme-connected syllable HMM/syllable-connected HMM set, and then the initial phoneme-connected syllable HMM/syllable-connected HMM set being trained using the training speech data and the syllable label data, thereby generating a phoneme-connected syllable HMM/syllable-connected HMM set being the acoustic model,the numbers of times of misrecognition of such syllable connections in the preliminary experiment results being counted, and that, a syllable-connected HMM corresponding to any syllable connection whose number of times of misrecognition is at least a preset number, among the syllable connections extracted using the preliminary experiment results, is made a candidate for addition into the phoneme-connected syllable HMM set.
- View Dependent Claims (2, 3, 4, 5, 6, 13)
- - 2. The acoustic model creation method as recited in claim 1, the number of times which such syllable connections occur in syllable label data corresponding to the training speech data being counted in addition to the numbers of times of misrecognition, and a syllable-connected HMM corresponding to any syllable connection whose number of times of occurrence in syllable label data corresponding to the training speech data is at most a preset number, among the syllable connections whose numbers of times of misrecognition are at least the preset number, being excluded as the candidate for the addition into the phoneme-connected syllable HMM set.
  - 3. The acoustic model creation method as recited in claim 1, the syllable label data being corrected using any syllable connection which corresponds to the syllable-connected HMM made a candidate for addition into the phoneme-connected syllable HMM set, and subject to a plurality of syllable connections repeatedly applicable in a case where the syllable connection corresponding to the syllable-connected HMM made a candidate for addition into the phoneme-connected syllable HMM set is applied to the syllable label data, the syllable connection whose number of times of misrecognition is larger being preferentially applied so as to correct the corresponding syllable label data.
  - 4. The acoustic model creation method as recited in claim 1, in a case where any common phoneme HMM is used in the training of initial phoneme-connected syllable HMMs as proceeded in generating the phoneme-connected syllable HMM set and in the training of initial phoneme-connected syllable HMMs/syllable-connected HMMs as proceeded in generating the phoneme-connected syllable HMM/syllable-connected HMM set, Gaussian distributions being tied in respective states of the common phoneme HMM.
  - 5. The acoustic model creation method as recited in claim 1, the syllable connected to any misrecognized syllable being a preceding syllable of the misrecognized syllable, and a combination between the preceding syllable and a correct answer syllable for the misrecognized syllable being extracted as the syllable connection.
  - 6. The acoustic model creation method as recited in claim 1, distribution number optimization processing using a Minimum Description Length criterion being executed for the phoneme-connected syllable HMM set, thereby generating a phoneme-connected syllable HMM set whose distribution numbers are optimized, and which is used in subsequent processing.
  - 13. A speech recognition apparatus to recognize input speech by employing an HMM (Hidden Markov Model) which is an acoustic model, for feature data obtained by subjecting the input speech to a feature analysis, characterized in that any acoustic model created bythe acoustic model creation method as recited in claim 3 is used as the HMM being the acoustic model.

7. An acoustic model creation apparatus to create a syllable HMM (Hidden Markov Model) which is an acoustic model, comprising:
- an initial phoneme-connected syllable HMM set generation device to combine phoneme HMMs trained in correspondence with individual phonemes, so as to generate an initial phoneme-connected syllable HMM set which includes initial phoneme-connected syllable HMM corresponding to individual syllables; and
  
  a HMM retraining device to retrain the initial phoneme-connected syllable HMM set so as to generate a phoneme-connected syllable HMM set being the acoustic model;
  
  a preliminary experiment device to conduct a preliminary experiment which uses training speech data, for a phoneme-connected syllable HMM set;
  
  a misrecognized-syllabic-part extraction device to check any misrecognized syllable and a syllable connected to the misrecognized syllable by using results of the preliminary experiment obtained by the preliminary experiment device and syllable label data prepared in correspondence with the training speech data, and to extract as a syllable connection, a combination between a correct answer syllable for the misrecognized syllable and a syllable connected to the misrecognized syllable;
  
  initial phoneme-connected syllable HMM/syllable-connected HMM set generation device to add a syllable-connected HMM which corresponds to the syllable connection extracted by the misrecognized-syllabic-part extraction device, into the phoneme-connected syllable HMM set, thereby generating an initial phoneme-connected syllable HMM/syllable-connected HMM set;
  
  the HMM retraining device to retrain the initial phoneme-connected syllable HMM/syllable-connected HMM set generated by the initial phoneme-connected syllable HMM/syllable-connected HMM set generation device, by using the training speech data and the syllable label data, thereby generating a phoneme-connected syllable HMM syllable-connected HMM set being the acoustic model; and
  
  characterized in that the misrecognized-syllabic-part extraction device counts the numbers of times of misrecognition of the syllable connections in the preliminary experiment results, and that, a syllable-connected HMM corresponding to any syllable connection whose number of times of misrecognition is at least a preset number, among the syllable connections extracted using the preliminary experiment results, is made a candidate for addition into the phoneme-connected syllable HMM set.
- View Dependent Claims (8, 9, 10, 11, 12, 14)
- - 8. The acoustic model creation apparatus as recited in claim 7, the numbers of times which such syllable connections occur in syllable label data corresponding to the training speech data being counted in addition to the numbers of times of misrecognition, and that a syllable-connected HMM corresponding to any syllable connection whose number of times of occurrence in syllable label data corresponding to the training speech data is at most a preset number, among the syllable connections whose numbers of times of misrecognition are at least the preset number, being excluded as a candidate for addition into the phoneme-connected syllable HMM set.
  - 9. The acoustic model creation apparatus as recited in claim 7, a syllable label data correction device to correct the syllable label data being provided, the syllable label data correction device correcting the syllable label data by using any syllable connection which corresponds to the syllable-connected HMM made a candidate for addition into the phoneme-connected syllable HMM set, and that, subject to a plurality of syllable connections repeatedly applicable in a case where the syllable connection corresponding to the syllable-connected HMM made a candidate for addition into the phoneme-connected syllable HMM set being applied to the syllable label data, the syllable connection whose number of times of misrecognition is larger being preferentially applied so as to correct the corresponding syllable label data.
  - 10. The acoustic model creation apparatus as recited in claim 7, in a case where any common phoneme HMM is used in the training of initial phoneme-connected syllable HMMs as proceeded in generating the phoneme-connected syllable HMM set and in the training of initial phoneme-connected syllable HMMs/syllable-connected HMMs as proceeded in generating the phoneme-connected syllable HMM/syllable-connected HMM set, Gaussian distributions being tied in respective states of the common phoneme HMM.
  - 11. The acoustic model creation apparatus as recited in claim 7, the syllable connected to any misrecognized syllable is a preceding syllable of the misrecognized syllable, and a combination between the preceding syllable and a correct answer syllable for the misrecognized syllable being extracted as the syllable connection.
  - 12. The acoustic model creation apparatus as recited in claim 7, a distribution number optimization device to subject the phoneme-connected syllable HMM set to distribution number optimization processing using a Minimum Description Length criterion being provided and a phoneme-connected syllable HMM set whose distribution numbers are optimized being generated by the distribution number optimization device and being used in subsequent processing.
  - 14. A speech recognition apparatus to recognize input speech by employing an HMM (Hidden Markov Model) which is an acoustic model, for feature data obtained by subjecting the input speech to a feature analysis, characterized in that any acoustic model created bythe acoustic model creation apparatus as recited in claim 7 is used as the HMM being the acoustic model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Seiko Epson Corporation (Seiko Group)
Original Assignee
Seiko Epson Corporation (Seiko Group)
Inventors
Matsumoto, Hiroshi, Nishitani, Masanobu, Yamamoto, Kazumasa, Miyazawa, Yasunaga
Primary Examiner(s)
{hacek over (S)}mits; Talivaldis Ivars
Assistant Examiner(s)
HERNANDEZ, JOSIAH J

Application Number

US10/793,859
Publication Number

US 20040236577A1
Time in Patent Office

1,513 Days
Field of Search

704/256
US Class Current

704/256
CPC Class Codes

G10L 15/144 Training of HMMs

G10L 2015/027 Syllables being the recogni...

Acoustic model creation method as well as acoustic model creation apparatus and speech recognition apparatus

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

79 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Acoustic model creation method as well as acoustic model creation apparatus and speech recognition apparatus

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

79 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links