Acoustic model generating method for speech recognition
First Claim
1. An acoustic model generating method for a speech recognition dependent upon phoneme context, for executing speech data processing using hidden Markov models obtained by modeling static speech features indicative of speech feature pattern shape in minute time and dynamic speech features indicative of speech change with the lapse of time, as a chain of signal sources composed of one output probability distribution and one set of state transition probability, which comprises the steps of:
- reiterating splitting processing or merging processing of the output probability distribution of at least one signal source of an initial model by selecting one of the processing successively to generate a plurality of signal sources, until a specific number of the generated signal sources reaches a predetermined value for achieving optimum speech recognition; and
deciding, when the number reaches the predetermined value, a sharing structure of states used for representing a model among a plurality of models, a sharing structure of each signal source among the states, and a parameter of each output probability distribution, all under a common evaluation criterion.
2 Assignments
0 Petitions
Accused Products
Abstract
The acoustic model generating method for speech recognition enables a high representation effect on the basis of the minimum possible model parameters. In an initial model having a smaller number of signal sources, the acoustic model for speech recognition is generated by selecting the splitting processing or the merging processing for the signal sources successively and repeatedly. The merging processing is executed prior to the splitting processing. In the merging processing, when the merged result is not appropriate, the splitting processing is executed for the model obtained before merging processing (without use of the merged result). Further, the splitting processing includes three methods at the same time, as (1) a method of splitting the signal source into two and reconstructing a shared structure between a plurality of states having common signal sources to be split, (2) a method of splitting one state into two states corresponding to different phoneme context categories in phoneme context direction, (3) a method of splitting one state into two states corresponding to different speech sections in time direction. One of the three methods is selected by obtaining three pieces of maximum likelihood for the three splitting steps and judging which one is the biggest to select the splitting step for which the biggest likelihood is obtained.
-
Citations
8 Claims
-
1. An acoustic model generating method for a speech recognition dependent upon phoneme context, for executing speech data processing using hidden Markov models obtained by modeling static speech features indicative of speech feature pattern shape in minute time and dynamic speech features indicative of speech change with the lapse of time, as a chain of signal sources composed of one output probability distribution and one set of state transition probability, which comprises the steps of:
-
reiterating splitting processing or merging processing of the output probability distribution of at least one signal source of an initial model by selecting one of the processing successively to generate a plurality of signal sources, until a specific number of the generated signal sources reaches a predetermined value for achieving optimum speech recognition; and deciding, when the number reaches the predetermined value, a sharing structure of states used for representing a model among a plurality of models, a sharing structure of each signal source among the states, and a parameter of each output probability distribution, all under a common evaluation criterion. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
Specification