Speech recognition device and speech recognition method and recording medium utilizing preliminary word selection
First Claim
1. A speech recognition apparatus in which a score reflecting an acoustic likelihood of the results of speech recognition of an input speech is calculated and in which the speech is recognized based on the score, comprising:
- input means to receive the input speech;
extraction means for extracting characteristic values of said input speech, the input speech comprising a plurality of input words;
characteristic value storage means for storing characteristic values and an extraction time point associated with each characteristic value;
word concatenation information storage means for storing word concatenation information that is the relation of input words of a word sequence representing the results of speech recognition and includes acoustic scores, linguistic scores, time points of the beginning end and terminal end of speech portions associated with respective input words;
selection means for selecting one or more candidate first words from the plurality of input words to be processed by speech recognition processing, based on the concatenation information, and a word score that represents an evaluation of acoustic scores and language scores calculated using said characteristic values, and for selecting one or more candidate second words from the plurality of input words not based on the acoustic score, the candidate second words having unstable acoustic characteristic values with a number of phonemes and syllables less than a preset value, wherein the selection means is configured to select said candidate second words likely to be concatenated linguistically to a directly previously recognized word and configured to subject the concatenated candidate first and candidate second words to the matching processing;
score calculation means for calculating said score of said concatenated candidate first and candidate second words selected by said selection means referencing concatenation information of said first and second words; and
finalizing means for finalizing a word string, as the recognition result of said speech, based on said score,wherein the selected one or more candidate first words based on the word score have a number of phonemes and syllables above the number of phonemes and syllables of the candidate second words and more stable acoustic characteristic values than the selected one or more candidate second words,wherein the word concatenation information is sequentially updated based on the score.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition apparatus in which the accuracy in speech recognition is improved as the resource is prevented from increasing. Such a word which is probable as the result of the speech recognition is selected on the basis of an acoustic score and a linguistic score, while word selection is also performed on the basis of a measure different from the acoustic score, such as the number of phonemes being small, a part of speech being a pre-set one, inclusion in the past results of speech recognition or the linguistic score being not less than a pre-set value. The words so selected are subjected to matching processing.
44 Citations
6 Claims
-
1. A speech recognition apparatus in which a score reflecting an acoustic likelihood of the results of speech recognition of an input speech is calculated and in which the speech is recognized based on the score, comprising:
-
input means to receive the input speech; extraction means for extracting characteristic values of said input speech, the input speech comprising a plurality of input words; characteristic value storage means for storing characteristic values and an extraction time point associated with each characteristic value; word concatenation information storage means for storing word concatenation information that is the relation of input words of a word sequence representing the results of speech recognition and includes acoustic scores, linguistic scores, time points of the beginning end and terminal end of speech portions associated with respective input words; selection means for selecting one or more candidate first words from the plurality of input words to be processed by speech recognition processing, based on the concatenation information, and a word score that represents an evaluation of acoustic scores and language scores calculated using said characteristic values, and for selecting one or more candidate second words from the plurality of input words not based on the acoustic score, the candidate second words having unstable acoustic characteristic values with a number of phonemes and syllables less than a preset value, wherein the selection means is configured to select said candidate second words likely to be concatenated linguistically to a directly previously recognized word and configured to subject the concatenated candidate first and candidate second words to the matching processing; score calculation means for calculating said score of said concatenated candidate first and candidate second words selected by said selection means referencing concatenation information of said first and second words; and finalizing means for finalizing a word string, as the recognition result of said speech, based on said score, wherein the selected one or more candidate first words based on the word score have a number of phonemes and syllables above the number of phonemes and syllables of the candidate second words and more stable acoustic characteristic values than the selected one or more candidate second words, wherein the word concatenation information is sequentially updated based on the score. - View Dependent Claims (2, 3, 4)
-
-
5. A speech recognition method in which a score reflecting an acoustic likelihood of the results of speech recognition of an input speech is calculated and in which the speech is recognized based on the score, comprising:
-
an extraction step of extracting characteristic values of said input speech, said input speech comprising a plurality of input words; a first storing step of storing characteristic values and an extraction time point associated with each characteristic value; a second storing step of storing word concatenation information that is the relation of input words of a word sequence representing the results of speech recognition and includes acoustic scores, linguistic scores, time points of the beginning end and terminal end of speech portions associated with respective input words; a selection step of selecting one or more candidate first words from the plurality of input words to be processed by speech recognition processing, based on the concatenation information, and a word score that represents an evaluation of acoustic scores and language scores calculated using said characteristic values, and for selecting one or more candidate second words from the plurality of input words not based on the acoustic score, the candidate second words having unstable acoustic characteristic values with a number of phonemes and syllables less than a preset value, wherein the selection step selects said candidate second words likely to be concatenated linguistically to a directly previously recognized word and to subject the concatenated candidate first and candidate second words to the matching processing; a score calculation step of calculating said score of said candidate first and candidate second words selected by said selection step referencing concatenation information of said first and second words; and a finalizing step of finalizing a word string, as the recognition result of said speech, based on said score, wherein the selected one or more candidate first words based on the word score have a number of phonemes and syllables above the number of phonemes and syllables of the candidate second words and more stable acoustic characteristic values than the selected one or more candidate second words, wherein the word concatenation information is sequentially updated based on the score.
-
-
6. A non-transitory computer-readable medium having recorded thereon a program for causing a computer to perform speech recognition processing in which a score reflecting an acoustic likelihood of the results of speech recognition of an input speech is calculated and in which the speech is recognized based on the score, comprising:
-
an extraction step of extracting characteristic values of said input speech, said input speech comprising a plurality of input words; a first storing step of storing characteristic values and an extraction time point associated with each characteristic value; a second storing step of storing word concatenation information that is the relation of input words of a word sequence representing the results of speech recognition and includes acoustic scores, linguistic scores, time points of the beginning end and terminal end of speech portions associated with respective input words; a selection step of selecting one or more candidate first words from the plurality of input words to be processed by speech recognition processing, based on the concatenation information, and a word score that represents an evaluation of acoustic scores and language scores calculated using said characteristic values, and for selecting one or more candidate second words from the plurality of input words not based on the acoustic score, the candidate second words having unstable acoustic characteristic values with a number of phonemes and syllables less than a preset value, wherein the selection step selects said candidate second words likely to be concatenated linguistically to a directly previously recognized word and to subject the concatenated candidate first and candidate second words to the matching processing; a score calculation step of calculating said score of said candidate first and candidate second words selected by said selection step referencing concatenation information of said first and second words; and a finalizing step of finalizing a word string, as the recognition result of said speech, based on said score, wherein the selected one or more candidate first words based on the word score have a number of phonemes and syllables above the number of phonemes and syllables of the candidate second words and more stable acoustic characteristic values than the selected one or more candidate second words, wherein the word concatenation information is sequentially updated based on the score.
-
Specification