Speech recognition system
First Claim
1. A speech recognition system comprising:
- means for converting audible input speech into an input sheech signal;
acoustic signal processing means for extracting first feature data, second feature data and third feature data from the input speech signal, the first feature data being a time-frequency spectrum pattern data comprising a plurality of frame data arranged on a time axis, and each of the frame data being frequency spectrum data obtained at each of predetermined time points on the time axis, the second feature data being phoneme data obtained in respective frames defining respecitve computation intervals, the frequency range of the input speech signal being divided into a plurality of channels and the frequency spectrum data being obtained at each channel, the phoneme data of each frame being labelled with a prescribed character, and the third feature data being a coded acoustic feature data, the frequency spectrum data of each frame being divided into gross spectrum envelopes;
a buffer memory means for storing the first to third feature data;
a reference pattern memory means for storing first, second and third reference data eacha similarity computation circuit for computing similarities between the first to third feature data and the first to third reference data, respectively;
means for determining a word class pattern having a first reference pattern which gives a largest similarity as being the input speech signal when the largest similarity is larger than a prescribed value and when a difference between the largest sililarity and a second largest similarity is larger than a prescribed value;
means for extracting m classes of patterns of reference patterns which give the largest to mth largest similarities when the word class pattern is regarded not to correspond to the input speech signal;
means for computing similarities between the second feature data and the second reference data and between the third feature data and the third reference data for determining whether or not one of said m classes of patterns correspond to the input speech signal.
1 Assignment
0 Petitions
Accused Products
Abstract
An acoustic signal processing circuit extracts input speech pattern data and subsidiary feature data from an input speech signal. The input speech pattern data comprise frequency spectra, whereas the subsidiary feature data comprise phoneme and acoustic features. These data are then stored in a data buffer memory. The similarity measures between the input speech pattern data stored in the data buffer memory and reference speech pattern data stored in a dictionary memory are computed by a similarity computation circuit. When the largest similarity measure exceeds a first threshold value and when the difference between the largest similarity measure and the second largest measure exceeds a second threshold value, category data of the reference pattern which gives the largest similarity measure is produced by a control circuit to correspond to an input speech. When recognition cannot be performed, the categories of the reference speech patterns which respectively give the largest to mth similarity measures are respectively compared with the subsidiary feature data. In this manner, subsidiary feature recognition of the input voice is performed by a subsidiary feature recognition section.
93 Citations
8 Claims
-
1. A speech recognition system comprising:
-
means for converting audible input speech into an input sheech signal; acoustic signal processing means for extracting first feature data, second feature data and third feature data from the input speech signal, the first feature data being a time-frequency spectrum pattern data comprising a plurality of frame data arranged on a time axis, and each of the frame data being frequency spectrum data obtained at each of predetermined time points on the time axis, the second feature data being phoneme data obtained in respective frames defining respecitve computation intervals, the frequency range of the input speech signal being divided into a plurality of channels and the frequency spectrum data being obtained at each channel, the phoneme data of each frame being labelled with a prescribed character, and the third feature data being a coded acoustic feature data, the frequency spectrum data of each frame being divided into gross spectrum envelopes; a buffer memory means for storing the first to third feature data; a reference pattern memory means for storing first, second and third reference data each a similarity computation circuit for computing similarities between the first to third feature data and the first to third reference data, respectively; means for determining a word class pattern having a first reference pattern which gives a largest similarity as being the input speech signal when the largest similarity is larger than a prescribed value and when a difference between the largest sililarity and a second largest similarity is larger than a prescribed value; means for extracting m classes of patterns of reference patterns which give the largest to mth largest similarities when the word class pattern is regarded not to correspond to the input speech signal; means for computing similarities between the second feature data and the second reference data and between the third feature data and the third reference data for determining whether or not one of said m classes of patterns correspond to the input speech signal. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for recognizing speech comprising the steps of:
-
extracting input speech pattern data which consists of time series of first feature parameter data of an input speech from an input speech signal; storing the input speech pattern data; storing a plurality of reference speech pattern data; computing a similarity measure between the input speech pattern data and said plurality of reference speech pattern data; determining whether or not a category having a reference pattern which gives a largest similarity measure corresponds to the input speech in accordance with a difference between the largest similarity measure and a second largest similarity measure; extracting m categories of said plurality of reference patterns which give the largest to mth largest similarity measures when the category is regarded not to correspond to the input speech; comparing the m categories with the input speech by using second feature parameter data; and storing the second feature parameter data, whereby the category of the reference pattern corresponding to the input speech is determined by comparison results.
-
Specification