Knowledge-guided automatic speech recognition apparatus and method
First Claim
1. An acoustic pattern recognition apparatus for automatically recognizing continuous input speech, said apparatus comprising:
- (a) acoustic analysis means for dividing an acoustic pattern of the continuous input speech at predetermined time intervals, so as to produce a plurality of frame data;
(b) dictionary memory means for storing reference acoustic patterns of phonemes in a selected language as reference phonemic labels;
(c) similarity calculation means, connected to said acoustic analysis means and said dictionary memory means, for calculating similarities between the frame data and the reference acoustic patterns, so as to produce a plurality of similarity data; and
(d) main processor means, connected to said similarity calculation means, for prestoring based upon a preliminary processing, as a phonetic/phonological condition in the processing of phonemic labels, phonetic/phonological data including both speech duration and connectability for phonemes in the selected language and for using both of said phonetic and phonological conditions, for extracting during the main processing, from among the reference phonemic labels to be compared with the frame data, reference phonemic labels which satisfy said phonetic/phonological condition with respect to the phonemes of the input speech in a first data memory which contain upper and lower limit values of the speech duration phonemes in a tabular format, and a second data memory which has prestored data as to the connectability of phonemes, for rejecting the similarity data of the reference phonemic labels which fail to satisfy said phonetic/phonological condition, and for allowing only the similarity data of the extracted reference phonemic labels to be subjected to similarity sum calculation to thereby generate a series of phonemic labels having a maximum similarity sum as a recognition result.
1 Assignment
0 Petitions
Accused Products
Abstract
An acoustic pattern of continuous input speech is divided by an acoustic analyzer into frames of a predetermined time interval. A similarity calculator calculates similarities between frame data and the reference phonemic labels prestored in a dictionary memory, and supplies similarity data to a main processor, which has memories prestores speech duration data and connectability data. The main processor extracts, from among the references phonemic labels, those which satisfy phonetic/phonological conditions with respect to phonemes of the input speech. Similarity sum calculation is conducted only for the similarity data of the extracted labels.
18 Citations
5 Claims
-
1. An acoustic pattern recognition apparatus for automatically recognizing continuous input speech, said apparatus comprising:
-
(a) acoustic analysis means for dividing an acoustic pattern of the continuous input speech at predetermined time intervals, so as to produce a plurality of frame data; (b) dictionary memory means for storing reference acoustic patterns of phonemes in a selected language as reference phonemic labels; (c) similarity calculation means, connected to said acoustic analysis means and said dictionary memory means, for calculating similarities between the frame data and the reference acoustic patterns, so as to produce a plurality of similarity data; and (d) main processor means, connected to said similarity calculation means, for prestoring based upon a preliminary processing, as a phonetic/phonological condition in the processing of phonemic labels, phonetic/phonological data including both speech duration and connectability for phonemes in the selected language and for using both of said phonetic and phonological conditions, for extracting during the main processing, from among the reference phonemic labels to be compared with the frame data, reference phonemic labels which satisfy said phonetic/phonological condition with respect to the phonemes of the input speech in a first data memory which contain upper and lower limit values of the speech duration phonemes in a tabular format, and a second data memory which has prestored data as to the connectability of phonemes, for rejecting the similarity data of the reference phonemic labels which fail to satisfy said phonetic/phonological condition, and for allowing only the similarity data of the extracted reference phonemic labels to be subjected to similarity sum calculation to thereby generate a series of phonemic labels having a maximum similarity sum as a recognition result. - View Dependent Claims (2, 3)
-
-
4. An acoustic pattern recognition method for automatically recognizing continuous input speech, said method comprising the steps of:
-
(a) dividing an acoustic pattern of the continuous input speech at predetermined time intervals so as to produce a plurality of frame data; (b) reading out reference acoustic patterns of phonemes in a selected language prestored in a dictionary memory as reference phonemic labels; (c) calculating similarities between the frame data and the reference acoustic patterns, so as to produce a plurality of similarity data; (d) verifying as a preliminary process, whether reference phonemic labels, which satisfy phonetic/phonological data as a phonetic/phonological condition having at least both speech duration and connectability with respect to the phonemes of the input speech, are present in the reference phonemic labels to be compared with the frame data, to thereby extract reference phonemic labels which satisfy said phonetic/phonological condition including both speech duration and connectability for phonemes in the selected language; (e) rejecting the similarity data of the reference phonemic labels which fail to satisfy said phonetic/phonological condition, as a preliminary process, wherein when similarity data is unsatisfied with at least one condition, for speech duration or connectability being present, the similarity data is excluded from candidates for maximum similarity sum calculations; and (f) allowing only the similarity data of the extracted reference phonemic labels to be subjected to similarity sum calculation, as a main process, to generate a series of phonemic labels having a maximum similarity sum as a final recognition result. - View Dependent Claims (5)
-
Specification