Continuous speech recognition apparatus, continuous speech recognition method, continuous speech recognition program, and program recording medium
First Claim
1. A continuous speech recognition apparatus which uses, as a recognition unit, a sub-word determined depending on an adjacent sub-word and which uses context dependent acoustic models dependent on sub-word context to recognize a continuous input speech, comprising:
- an acoustic analysis section analyzing the input speech to obtain feature parameter time series;
a word lexicon in which each of words included in vocabulary is stored in a form of a sub-word network or in a sub-word tree structure;
a language model storage unit in which language models representing information regarding connection between words is stored;
a context dependent acoustic model storage unit in which the context dependent acoustic models are stored in a form of sub-word state trees in each of which state sequences of a plurality of sub-word models of the context dependent acoustic models are organized in a tree structure;
a matching unit developing hypotheses of sub-words by referencing the sub-word state tree representing the context dependent acoustic models, the word lexicon and the language models, and performing matching between the feature parameter time series and the developed hypotheses so as to output, as a word lattice, word information including a word, an accumulated score and a beginning start frame with respect to a hypothesis representing a word end portion; and
a search unit for searching the word lattice to generate recognition results.
1 Assignment
0 Petitions
Accused Products
Abstract
Accuracy is assured by using phoneme context dependent acoustic models even at word boundaries and also time increase of a processing amount is suppressed even in large-vocabulary continuous speech recognition. A phoneme context dependent acoustic model storage unit contains phoneme state trees in each of which state sequences each consisting of a preceding phoneme state, a center phoneme state, and a succeeding phoneme state are configured in a tree structure with triphone models with the same preceding phoneme and triphone models with the same center phoneme collected. Accordingly, a forward matching unit has only to develop one phonemic hypothesis regardless of a leading phoneme of the succeeding word, by referencing the phoneme state trees, language models stored in a language model storage unit, and a word lexicon. Thus, development of hypotheses is easy regardless of in-word or word-boundary state. Moreover, an operation amount in performing matching with feature parameter sequences from an acoustic analysis unit can be remarkably reduced.
17 Citations
8 Claims
-
1. A continuous speech recognition apparatus which uses, as a recognition unit, a sub-word determined depending on an adjacent sub-word and which uses context dependent acoustic models dependent on sub-word context to recognize a continuous input speech, comprising:
-
an acoustic analysis section analyzing the input speech to obtain feature parameter time series;
a word lexicon in which each of words included in vocabulary is stored in a form of a sub-word network or in a sub-word tree structure;
a language model storage unit in which language models representing information regarding connection between words is stored;
a context dependent acoustic model storage unit in which the context dependent acoustic models are stored in a form of sub-word state trees in each of which state sequences of a plurality of sub-word models of the context dependent acoustic models are organized in a tree structure;
a matching unit developing hypotheses of sub-words by referencing the sub-word state tree representing the context dependent acoustic models, the word lexicon and the language models, and performing matching between the feature parameter time series and the developed hypotheses so as to output, as a word lattice, word information including a word, an accumulated score and a beginning start frame with respect to a hypothesis representing a word end portion; and
a search unit for searching the word lattice to generate recognition results. - View Dependent Claims (2, 3, 4, 5, 7, 8)
-
-
6. A continuous speech recognition method which uses, as a recognition unit, a sub-word determined depending on an adjacent sub-word and which uses context dependent acoustic models dependent on sub-word context to recognize a continuous input speech, comprising:
-
analyzing the input speech to obtain feature parameter time series by an acoustic analysis section;
developing hypotheses of sub-words by referencing a sub-word state tree formed by placing state sequences of the context dependent acoustic models in a tree structure, a word lexicon describing each of words included in vocabulary in a form of a sub-word network or in a sub-word tree structure, and a language model representing information regarding connection between words, and performing matching between the feature parameter time series and the developed hypotheses so as to generate, as a word lattice, word information including a word, an accumulated score and a beginning start frame with respect to a hypothesis regarding a word end portion, by a matching unit; and
searching the word lattice to generate recognition results by a search unit.
-
Specification