Recognizing speech data using a state transition model
First Claim
1. A recognition apparatus for recognizing speech data by comparing an unknown word in speech data with data of known words without learning a model for recognition, comprising:
- obtaining means for obtaining a state having the maximum likelihood for each part of the speech data, from plural states employed in a state transition model for known words, to be employed in speech recognition of known words;
ergodic model preparation means for preparing an ergodic model of an unknown word from the obtained states having the maximum likelihood, the obtained states being numbered in a description inserted in a portion where a processing for the unknown word is desired in a recognition grammar;
calculation means for calculating a likelihood for an unknown word in the speech data by multiplying the maximum likelihood of each state by a transition probability commonly used for the entire transitions in unknown words; and
recognition means for effecting recognition of said speech data based on the likelihood calculated by the calculation means.
1 Assignment
0 Petitions
Accused Products
Abstract
Detecting an unknown word in input speech data reduces the search space and the memory capacity for the unknown word. For this purpose, an HMM data memory stores data describing a state transition mode for the unknown word, defined by a number of states and the transition probability between the states. An output probability calculation unit acquires a state of the maximum likelihood at each time of the speech data, among the plural states employed in the state transition mode for a known word, employed in the speech recognition of the known word. The obtained result is applied to the state transition mode for the unknown word, stored in the HMM data memory, to obtain a state transition mode of the unknown word. A different output probability calculation unit determines the likelihood of the state transition mode for the known word. Then a language search unit effects the language search process, utilizing the likelihoods determined by the aforementioned two output probability calculation units, in a portion where the presence of the unknown word is permitted by the dictionary.
37 Citations
22 Claims
-
1. A recognition apparatus for recognizing speech data by comparing an unknown word in speech data with data of known words without learning a model for recognition, comprising:
-
obtaining means for obtaining a state having the maximum likelihood for each part of the speech data, from plural states employed in a state transition model for known words, to be employed in speech recognition of known words;
ergodic model preparation means for preparing an ergodic model of an unknown word from the obtained states having the maximum likelihood, the obtained states being numbered in a description inserted in a portion where a processing for the unknown word is desired in a recognition grammar;
calculation means for calculating a likelihood for an unknown word in the speech data by multiplying the maximum likelihood of each state by a transition probability commonly used for the entire transitions in unknown words; and
recognition means for effecting recognition of said speech data based on the likelihood calculated by the calculation means. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
wherein said recognition means is adapted to effect, in a portion where said dictionary indicates the possibility of the existence of the unknown word, a language search process including the state transition model and the likelihood for the unknown word.
-
-
5. A recognition apparatus according to claim 1, further comprising a table storing the transition probabilities among the states in said state transition model for the unknown word.
-
6. A recognition apparatus according to claim 5, wherein, in said table, the transition probability in the case of a transition to the same state is different from that in the case of a transition to a different state.
-
7. A recognition apparatus according to claim 5, further comprising study means for varying the transition probabilities store in said table, by performing a study of the transition probabilities.
-
8. A recognition apparatus according to claim 1, wherein said speech data is entered by a microphone.
-
9. A recognition apparatus according to claim 1, further comprising recognition result output means for outputting the result of recognition by said recognition means, in the form of a train of characters.
-
10. A recognition apparatus according to claim 9, wherein said recognition result output means is an ink jet printer.
-
11. A recognition method for recognizing speech data by comparing an unknown word in speech data with data of known words without learning a model for recognition, comprising:
-
an obtaining step of obtaining a state having the maximum likelihood for each part of the speech data, from plural states employed in a state transition model for known words, to be employed in speech recognition of known words;
an ergodic model preparation step of preparing an ergodic model of an unknown word from the obtained states having the maximum likelihood, the obtained states being numbered in a description inserted in a portion where a processing for the unknown word is desired in a recognition grammar;
a calculation step of calculating a likelihood for an unknown word in the speech data by multiplying the maximum likelihood of each state by a transition probability for the entire transitions in unknown words; and
a recognition step of effecting recognition of said speech data based on the likelihood calculated by the calculation means. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
wherein said recognition step is adapted to effect, in a portion where said dictionary indicates the possibility of the existence of the unknown word, a language search process including the state transition model and the likelihood for the unknown word.
-
-
15. A recognition method according to claim 11, further comprising a table storing the transition probabilities among the states in said state transition model for the unknown word.
-
16. A recognition method according to claim 15, wherein, in said table, the transition probability in the case of a transition to the same state is different from that in the case of a transition to a different state.
-
17. A recognition method according to claim 15, further comprising a study step of varying the transition probabilities store in said table, by performing a study of the transition probabilities.
-
18. A recognition method according to claim 11, wherein said speech data is entered by a microphone.
-
19. A recognition method according to claim 11, further comprising a recognition result output step of outputting the result of recognition by said recognition step, in the form of a train of characters.
-
20. A recognition method according to claim 19, wherein said recognition result output step is adapted to output the result of recognition by an ink jet printer engine.
-
21. A computer controlled apparatus for effecting a speech recognition process by reading a predetermined program from a memory medium and comparing an unknown word in speech data with data of known words without learning a model for recognition, wherein said memory medium comprises:
-
a first process code for causing the computer controlled apparatus to obtain a state having maximum likelihood for each part of the speech data, from plural states employed in a state transition model for known words, to be employed in speech recognition of known words;
a second process code for causing the computer controlled apparatus to prepare an unknown word from the obtained states having the maximum likelihood, the obtained states being numbered in a description inserted in a portion where a processing for the unknown word is desired in a recognition grammar;
a third process code for causing the computer controlled apparatus to calculate a likelihood for an unknown word in the speech data by multiplying the maximum likelihood of each state by a transition probability commonly used for the entire transition in unknown words; and
a fourth process code for causing the computer controlled apparatus to effect recognition of said speech data based on the likelihood calculated by the calculation means. - View Dependent Claims (22)
-
Specification