Speech recognizing apparatus
First Claim
1. A speech-recognizing apparatus for recognizing input speech, said apparatus comprising:
- a phoneme-standard-characteristic-pattern storage unit for storing a phoneme characteristic vector of a plurality of phoneme standard patterns in advance;
an analysis unit for computing a characteristic vector for each of frames of said input speech;
a vector-to-vector-distance-computing unit for computing a vector-to-vector distance between said characteristic vector for each of said frames and said phoneme characteristic vector;
an average-value-computing unit for computing an average value of vector-to-vector distances of phonemes for one of said frames;
a correction unit for correcting said vector-to-vector distance by subtracting said average value from said vector-to-vector distance;
a word-standard-pattern storage unit for storing a word standard pattern defining a combination of said phoneme standard patterns by word models in advance; and
a recognition unit for cumulating corrected vector-to-vector distances each produced by said correction unit into a cumulative vector-to-vector distance for speech inputted at different times, and comparing said cumulative vector-to-vector distance with said word standard pattern in order to recognize said input speech.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech-recognizing apparatus for recognizing input speech comprises, an analysis unit for computing a characteristic vector for each of frames of the input speech, a correction-value storage unit for storing a correction distance in advance, a vector-to-vector-distance-computing unit for computing a vector-to-vector distance between the characteristic vector and the phoneme characteristic vector, an average-value-computing unit for computing an average value of vector-to-vector distances for one of the frames, a correction unit for computing a corrected vector-to-vector distance as a value of an expression of (the vector-to-vector distance-the average value+the correction distance), and a recognition unit for cumulating corrected vector-to-vector distances into a cumulative vector-to-vector distance and comparing the cumulative vector-to-vector distance with the word standard pattern in order to recognize the input speech.
11 Citations
5 Claims
-
1. A speech-recognizing apparatus for recognizing input speech, said apparatus comprising:
-
a phoneme-standard-characteristic-pattern storage unit for storing a phoneme characteristic vector of a plurality of phoneme standard patterns in advance;
an analysis unit for computing a characteristic vector for each of frames of said input speech;
a vector-to-vector-distance-computing unit for computing a vector-to-vector distance between said characteristic vector for each of said frames and said phoneme characteristic vector;
an average-value-computing unit for computing an average value of vector-to-vector distances of phonemes for one of said frames;
a correction unit for correcting said vector-to-vector distance by subtracting said average value from said vector-to-vector distance;
a word-standard-pattern storage unit for storing a word standard pattern defining a combination of said phoneme standard patterns by word models in advance; and
a recognition unit for cumulating corrected vector-to-vector distances each produced by said correction unit into a cumulative vector-to-vector distance for speech inputted at different times, and comparing said cumulative vector-to-vector distance with said word standard pattern in order to recognize said input speech.
-
-
2. A speech-recognizing apparatus for recognizing input speech, said apparatus comprising:
-
an analysis unit for computing characteristic vectors of intervals in said input speech;
a word-standard-pattern storage unit for storing characteristic vectors of word standard patterns in advance;
a similarity-computing unit for comparing said characteristic vectors of said intervals in said input speech with said characteristic vector of said word standard patterns in order to compute a first similarity to each word standard pattern for a portion of said input speech in each of said intervals;
a first judgment unit for forming a judgment as to whether or not a word of a word standard pattern corresponding to said first similarity is a word represented by said input speech by comparison of said first similarity or a result of computation based on said first similarlity with a first threshold value;
a candidate storage unit for storing a second similarity;
a candidate-determining unit, which is used for storing said first similarity into said candidate storage unit if;
an outcome of a judgment formed by said first judgment unit indicates that said word of said word standard patterns corresponding to said first similarity is not said word represented by said input speech as evidenced by the fact that said first similarity is smaller than said first threshold value;
said first similarity is greater than said second similarity stored in said candidate storage unit respectively; and
a second judgment unit, which is used for determining that said word of a word standard pattern corresponding to a value stored in said candidate storage unit is said word represented by said input speech on the basis of said second similarities or a result of computation based on said second similarity stored in said candidate storage unit in case an outcome of a judgment formed by said first judgment unit indicates that said word of said word standard patterns corresponding to said first similarities is not said word represented by said input speech within a predetermined period.
-
-
3. A speech-recognizing apparatus for recognizing an input speech, said apparatus comprising:
-
an analysis unit for computing characteristic vectors of intervals in said input speech;
a word-standard-pattern storage unit for storing a characteristic vector of word standard patterns in advance;
a distance-computing unit for comparing said characteristic vectors of said intervals in said input speech with said characteristic vector of said word standard patterns in order to compute a first distance to each word standard pattern for a portion of said input speech in each of said intervals;
a first judgment unit for forming a judgment as to whether or not a word of said word standard patterns corresponding to said first distance is a word represented by said input speech by comparison of said first distance or a result of computation based on said first distance with a first threshold value;
a candidate storage unit for storing a second distance;
a candidate-determining unit, which is used for storing said first distance as said second distance into said candidate storage unit if;
an outcome of a judgment formed by said first judgment unit indicates that said word of said word standard patterns corresponding to said first distance is not said word represented by said input speech as evidenced by the fact that said first distance is greater than said first threshold value;
said first distance is smaller than a second threshold value greater than said first threshold value; and
said first distance is smaller than said second distance stored in said candidate storage unit; and
a second judgment unit, which is used for determining that a word of said word standard pattern corresponding to said second distance stored in said candidate storage unit is said word represented by said input speech on the basis of said second distance stored in said candidate storage unit in case an outcome of a judgment formed by said first judgment unit indicates that said word of said word standard pattern corresponding to said first distance is not said word represented by said input speech within a predetermined period.
-
-
4. A speech-recognizing apparatus for recognizing an input speech, said apparatus comprising:
-
a phoneme-standard-pattern storage unit for storing a phoneme characteristic vector of a plurality of phoneme standard patterns in advance;
an analysis unit for computing a characteristic vector of each frame in said input speech;
a distance storage unit for storing vector-to-vector distances for each frame;
a vector-to-vector-distance-computing unit for computing a vector-to-vector distance between said characteristic vector of said frame and said phoneme characteristic vector of said phoneme standard patterns and storing said vector-to-vector distance into said distance storage unit;
a word-standard-pattern storage unit for storing a word standard pattern defining side information of said phoneme standard patterns for each word in advance;
a cumulative-distance-computing unit for reading out said vector-to-vector distances in a backward direction, that is, a direction from a most recent vector-to-vector distance to a less recent vector-to-vector distance, from said distance storage unit and computing a cumulative distance in said backward direction for all said words; and
a judgment unit for forming a judgment as to whether or not a word corresponding to said cumulative distance computed by said cumulative-distance-computing unit is a word represented by said input speech on the basis of said cumulative distance.
-
-
5. A speech-recognizing apparatus for recognizing input speech, said apparatus comprising:
-
a phoneme-standard-pattern storage unit for storing a phoneme characteristic vector of a plurality of phoneme standard patterns in advance;
an analysis unit for computing a characteristic vector of each frame in said input speech;
a similarity storage unit for storing similarities to said phoneme standard patterns for each frame;
a similarity-computing unit for computing a similarity between said characteristic vector of said frame and said phoneme characteristic vector of said phoneme standard patterns and storing said similarity into said similarity storage unit;
a word-standard-pattern storage unit for storing a word standard pattern defining side information of said phoneme standard patterns for each word in advance;
a cumulative-similarity-computing unit for reading out similarities in a backward direction, that is, a direction from a most recent similarity to a less recent similarity, from said similarity storage unit and computing a cumulative similarity in said backward direction for said all words; and
a judgment unit for forming a judgment as to whether or not a word corresponding to said cumulative similarity computed by said cumulative-similarity-computing unit is a word represented by said input speech on the basis of said cumulative similarity.
-
Specification