Acoustic model generating method for speech recognition

US 5,799,277 A
Filed: 10/25/1995
Issued: 08/25/1998
Est. Priority Date: 10/25/1994
Status: Expired due to Fees

First Claim

Patent Images

1. An acoustic model generating method for a speech recognition dependent upon phoneme context, for executing speech data processing using hidden Markov models obtained by modeling static speech features indicative of speech feature pattern shape in minute time and dynamic speech features indicative of speech change with the lapse of time, as a chain of signal sources composed of one output probability distribution and one set of state transition probability, which comprises the steps of:

reiterating splitting processing or merging processing of the output probability distribution of at least one signal source of an initial model by selecting one of the processing successively to generate a plurality of signal sources, until a specific number of the generated signal sources reaches a predetermined value for achieving optimum speech recognition; and

deciding, when the number reaches the predetermined value, a sharing structure of states used for representing a model among a plurality of models, a sharing structure of each signal source among the states, and a parameter of each output probability distribution, all under a common evaluation criterion.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The acoustic model generating method for speech recognition enables a high representation effect on the basis of the minimum possible model parameters. In an initial model having a smaller number of signal sources, the acoustic model for speech recognition is generated by selecting the splitting processing or the merging processing for the signal sources successively and repeatedly. The merging processing is executed prior to the splitting processing. In the merging processing, when the merged result is not appropriate, the splitting processing is executed for the model obtained before merging processing (without use of the merged result). Further, the splitting processing includes three methods at the same time, as (1) a method of splitting the signal source into two and reconstructing a shared structure between a plurality of states having common signal sources to be split, (2) a method of splitting one state into two states corresponding to different phoneme context categories in phoneme context direction, (3) a method of splitting one state into two states corresponding to different speech sections in time direction. One of the three methods is selected by obtaining three pieces of maximum likelihood for the three splitting steps and judging which one is the biggest to select the splitting step for which the biggest likelihood is obtained.

Citations

8 Claims

1. An acoustic model generating method for a speech recognition dependent upon phoneme context, for executing speech data processing using hidden Markov models obtained by modeling static speech features indicative of speech feature pattern shape in minute time and dynamic speech features indicative of speech change with the lapse of time, as a chain of signal sources composed of one output probability distribution and one set of state transition probability, which comprises the steps of:
- reiterating splitting processing or merging processing of the output probability distribution of at least one signal source of an initial model by selecting one of the processing successively to generate a plurality of signal sources, until a specific number of the generated signal sources reaches a predetermined value for achieving optimum speech recognition; and
  
  deciding, when the number reaches the predetermined value, a sharing structure of states used for representing a model among a plurality of models, a sharing structure of each signal source among the states, and a parameter of each output probability distribution, all under a common evaluation criterion.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The acoustic model generating method for a speech recognition of claim 1, wherein the merging processing includes a step of merging two different signal sources having similar characteristics into a single signal source, to reduce the number of signal sources without deteriorating precision of the acoustic models.
  - 3. The acoustic model generating method for a speech recognition of claim 2, the merging step including the steps of:
    - calculating a magnitude of distribution on an acoustic parameter space obtained by synthesizing each pair of the signal sources; and
      
      merging two signal sources of a pair having the minimum calculated distribution.
  - 4. The acoustic model generating method for a speech recognition of claim 1, wherein the merging processing is executed prior to the splitting processing, and the method further comprising the steps of:
    - adopting a merging processing result only when a first evaluation value of learning samples obtained as a result of the merging processing is higher than a second evaluation value calculated on the basis of already-obtained models having signal sources whose number is the same as that of the model obtained as the result of the merging processing; and
      
      re-executing the splitting processing by use of the models already obtained before the merging processing when the first evaluation value is not higher than the second evaluation value.
  - 5. The acoustic model generating method for a speech recognition of claim 4, wherein the adopting step includes the step of obtaining two pieces of sum total likelihood as the evaluation values.
  - 6. The acoustic model generating method for a speech recognition of claim 1, wherein the splitting processing includes the step of splitting a first signal source into a second and a third signal source, allocating two mixture distributions of the first signal source to the second and third signal sources, respectively, as output probability distribution, and copying self transition probability value of the first signal source and transition probability value to the succeeding signal source to the second and third signal sources.
  - 7. The acoustic model generating method for a speech recognition of claim 1, wherein the splitting processing includes either of:
    - a first splitting step of splitting two signal sources into two, and reconstructing a shared structure between a plurality of states having common signal sources to be split;
      
      a second splitting step of splitting one state into two states corresponding two different phoneme context categories in phoneme context direction, in order to absorb fluctuations of the static speech features due to difference in phoneme context; and
      
      a third splitting step of splitting one state into two states corresponding two different speech sections in time direction, in order to absorb fluctuations of the dynamic speech features existing in some phoneme context category,further the acoustic model generating method comprises a step of selecting one of the three splitting steps so that an evaluation value for actual speech samples can be obtained.
  - 8. The acoustic model generating method for a speech recognition of claim 7, wherein the selecting step includes the steps:
    - obtaining three pieces of maximum likelihood for the three splitting steps; and
      
      judging which one of the three pieces of maximum likelihood is the biggest to select the splitting step for which the biggest likelihood is obtained.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Victor Company of Japan Limited (JVC Kenwood Corporation)
Original Assignee
Victor Company of Japan Limited (JVC Kenwood Corporation)
Inventors
Takami, Junichi
Primary Examiner(s)
Dorvil, Richemond

Application Number

US08/547,794
Time in Patent Office

1,035 Days
Field of Search

395/2.65, 395/2.41, 395/2.54, 395/2.45, 395/2.49, 395/2.51, 395/2.52, 395/2.09, 395/2.64, 395/2.53, 704/256, 704/254, 704/255, 704/232, 704/245, 704/236, 704/240, 704/242, 704/243, 704/244, 704/200
US Class Current

704/256.4
CPC Class Codes

G10L 15/063   Training

G10L 15/144   Training of HMMs

G10L 15/148   Duration modelling in HMMs,...

G10L 2015/0635   updating or merging of old ...

Acoustic model generating method for speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Acoustic model generating method for speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links