Speaker-independent model generation apparatus and speech recognition apparatus each equipped with means for splitting state having maximum increase in likelihood

US 5,839,105 A
Filed: 11/29/1996
Issued: 11/17/1998
Est. Priority Date: 11/30/1995
Status: Expired due to Term

First Claim

Patent Images

1. A speaker-independent model generation apparatus comprising:

model generation means for generating a hidden Markov model of a single Gaussian distribution using a Baum-Welch training algorithm based on spoken speech data from a plurality of specific speakers, and thereafter for generating a speaker-independent hidden Markov model by iterations of splitting a state having a maximum increase in likelihood upon splitting one state in contextual or temporal domains on the hidden Markov model of the single Gaussian distribution.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

There is provided a speaker-independent model generation apparatus and a speech recognition apparatus which require a processing unit to have less memory capacity and which allow its computation time to be reduced, as compared with a conventional counterpart. A single Gaussian HMM is generated with a Baum-Welch training algorithm based on spoken speech data from a plurality of specific speakers. A state having a maximum increase in likelihood as a result of splitting one state in contextual or temporal domains is searched. Then, the state having a maximum increase in likelihood is split in a contextual or temporal domain corresponding to the maximum increase in likelihood. Thereafter, a single Gaussian HMM is generated with the Baum-Welch training algorithm, and these steps are iterated until the states within the single Gaussian HMM can no longer be split or until a predetermined number of splits is reached. Thus, a speaker-independent HMM is generated. Also, speech is recognized with reference to the generated speaker-independent HMM.

123 Citations

10 Claims

1. A speaker-independent model generation apparatus comprising:
- model generation means for generating a hidden Markov model of a single Gaussian distribution using a Baum-Welch training algorithm based on spoken speech data from a plurality of specific speakers, and thereafter for generating a speaker-independent hidden Markov model by iterations of splitting a state having a maximum increase in likelihood upon splitting one state in contextual or temporal domains on the hidden Markov model of the single Gaussian distribution.
- View Dependent Claims (2, 3, 4)
- - 2. The speaker-independent model generation apparatus as claimed in claim 1, wherein said model generation means comprises:
    - initial model generation means for generating an initial hidden Markov model of a single Gaussian distribution using the Baum-Welch training algorithm based on spoken speech data from a plurality of specific speakers;
      
      search means for searching a state having a maximum increase in likelihood upon splitting one state in contextual or temporal domains on the initial hidden Markov model of the single Gaussian distribution generated by said initial model generation means;
      
      generation means for splitting the state having the maximum increase in likelihood searched by said search means, in a contextual or temporal domain corresponding to the maximum increase in likelihood and thereafter for generating a hidden Markov model of a single Gaussian distribution using the Baum-Welch training algorithm; and
      
      control means for generating a speaker-independent hidden Markov model by iterating a process of said search means and a process of said generation means until at least one of the following conditions is satisfied;
      
      (a) the states within the hidden Markov model of the single Gaussian distribution can no longer be split; and
      
      (b) a number of states within the hidden Markov model of the single Gaussian distribution reaches a predetermined number of splits.
  - 3. The speaker-independent model generation apparatus as claimed in claim 2, wherein the states searched by the search means are limited to two new states split by said generation means in the preceding process.
  - 4. The speaker-independent model generation apparatus as claimed in claim 2,wherein the states searched by the search means are limited to two new states split by said generation means in the preceding step and a state which is away from the two new states by a distance of one.

5. A speech recognition apparatus comprising:
- model generation means for generating a hidden Markov model of a single Gaussian distribution using a Baum-Welch training algorithm based on spoken speech data from a plurality of specific speakers, and thereafter for generating a speaker-independent hidden Markov model by iterations of splitting a state having a maximum increase in likelihood upon splitting one state in contextual or temporal domains on the hidden Markov model of the single Gaussian distribution; and
  
  speech recognition means for, in response to an input speech signal of a spoken speech, recognizing the spoken speech with reference to the speaker-independent hidden Markov model generated by said model generation means.

6. A speech recognition apparatus comprising:
- initial model generation means for generating a hidden Markov model of a single Gaussian distribution using the Baum-Welch training algorithm based on spoken speech data from a plurality of specific speakers;
  
  search means for searching a state having a maximum increase in likelihood upon splitting one state in contextual or temporal domains on the hidden Markov model of the single Gaussian distribution generated by said initial model generation means;
  
  generation means for splitting the state having the maximum increase in likelihood searched by said search means, in a contextual or temporal domain corresponding to the maximum increase in likelihood and thereafter for generating a hidden Markov model of a single Gaussian distribution using the Baum-Welch training algorithm;
  
  control means for generating a speaker-independent hidden Markov model by iterating a process of said search means and a process of said generation means until at least one of the following conditions is satisfied;
  
  (a) the states within the hidden Markov model of the single Gaussian distribution can no longer be split; and
  
  (b) a number of states within the hidden Markov model of the single Gaussian distribution reaches a predetermined number of splits; and
  
  speech recognition means for, in response to an input speech signal of a spoken speech, recognizing the spoken speech with reference to the speaker-independent hidden Markov model generated by said control means.
- View Dependent Claims (7, 8)
- - 7. The speech recognition apparatus as claimed in claim 6,wherein the states searched by the search means are limited to two new states split by said generation means in the preceding process.
  - 8. The speech recognition apparatus as claimed in claim 6,wherein the states searched by the search means are limited to two new states split by said generation means in the preceding step and a state which is away from the two new states by a distance of one.

9. A method for generating a speaker-independent model, including the following steps:
- generating a hidden Markov model of a single Gaussian distribution using a Baum-Welch training algorithm based on spoken speech data from a plurality of specific speakers; and
  
  thereafter, generating a speaker-independent hidden Markov model by iterations of splitting a state having a maximum increase in likelihood upon splitting cone state in contextual or temporal domains on the hidden Markov model of the single Gaussian distribution.

10. A method for generating a speaker-independent model, including the following steps:
- generating an initial hidden Markov model of a single Gaussian distribution using the Baum-Welch training algorithm based on spoken speech data from a plurality of specific speakers;
  
  searching a state having a maximum increase in likelihood upon splitting one state in contextual or temporal domains on the generated initial hidden Markov model of the single Gaussian distribution;
  
  splitting the searched state having the maximum increase in likelihood, in a contextual or temporal domain corresponding to the maximum increase in likelihood, and thereafter, generating a hidden Markov model of a single Gaussian distribution using the Baum-Welch training algorithm; and
  
  generating a speaker-independent hidden Markov model by iterating said searching step and said splitting and generating step until at least one of the following conditions is satisfied;
  
  (a) the states within the hidden Markov model of the single Gaussian distribution can no longer be split; and
  
  (b) a number of states within the hidden Markov model of the single Gaussian distribution reaches a predetermined number of splits.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
DENSO Corporation
Original Assignee
ATR Interpreting Telephony Research Laboratories
Inventors
Ostendorf, Mari, Singer, Harald
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
ZINTEL, HAROLD ALBERT

Application Number

US08/758,378
Time in Patent Office

718 Days
Field of Search

704/255, 704/256, 704/257, 704/244, 704/245, 704/231
US Class Current

704/256
CPC Class Codes

G10L 15/144 Training of HMMs

G10L 2015/0631 Creating reference template...

Speaker-independent model generation apparatus and speech recognition apparatus each equipped with means for splitting state having maximum increase in likelihood

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

123 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Speaker-independent model generation apparatus and speech recognition apparatus each equipped with means for splitting state having maximum increase in likelihood

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

123 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links