Detection of end of utterance in speech recognition system
First Claim
1. A system comprising a speech recognizer with end of utterance detection, whereinthe speech recognizer is configured to calculate values of state scores and token scores associated with frames of received speech data,the speech recognizer is configured to determine best state scores and best token scores, a best state score being a score of a state having the best probability amongst a number of states in a state model for speech recognition purposes, and a best token score being the best probability of a token amongst a number of tokens used for speech recognition purposes,the speech recognizer is configured to, at each received frame of received speech data, determine whether recognition result determined from received speech data is stabilized,if the recognition result determined from received speech data is not stabilized at a current frame, the speech recognizer is configured to continue speech processing for a next received speech frame and to calculate values of state scores and token scores and to determine the best state score and best token score for the next received speech frame,if the recognition result determined from speech data is stabilized at the current frame, the speech recognizer is configured to, in place of continuing speech processing for the next received frame, process values of the determined best state scores and best token scores associated with frames of received speech data for end of utterance detection purposes, and on the basis of the processed values of the best state scores and best token scores, whether end of utterance is detected or not,if the end of utterance is not detected on the basis of the processed values of the best state scores and best token scores, the speech recognizer is configured to continue speech processing for a next received speech frame and to calculate values of state scores and token scores and to determine the best state score and best token score for the next received speech frame, andif the end of utterance is detected on the basis of the processed values of the best state scores and best token scores, the speech recognizer is configured to end the speech processing.
10 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to speech recognition systems, especially to arranging detection of end-of utterance in such systems. A speech recognizer of the system is configured to determine whether recognition result determined from received speech data is stabilized. The speech recognizer is configured to process values of best state scores and best token scores associated with frames of received speech data for end of utterance detection purposes. Further, the speech recognizer is configured to determine whether end of utterance is detected or not, based on the processing, if the recognition result is stabilized.
23 Citations
36 Claims
-
1. A system comprising a speech recognizer with end of utterance detection, wherein
the speech recognizer is configured to calculate values of state scores and token scores associated with frames of received speech data, the speech recognizer is configured to determine best state scores and best token scores, a best state score being a score of a state having the best probability amongst a number of states in a state model for speech recognition purposes, and a best token score being the best probability of a token amongst a number of tokens used for speech recognition purposes, the speech recognizer is configured to, at each received frame of received speech data, determine whether recognition result determined from received speech data is stabilized, if the recognition result determined from received speech data is not stabilized at a current frame, the speech recognizer is configured to continue speech processing for a next received speech frame and to calculate values of state scores and token scores and to determine the best state score and best token score for the next received speech frame, if the recognition result determined from speech data is stabilized at the current frame, the speech recognizer is configured to, in place of continuing speech processing for the next received frame, process values of the determined best state scores and best token scores associated with frames of received speech data for end of utterance detection purposes, and on the basis of the processed values of the best state scores and best token scores, whether end of utterance is detected or not, if the end of utterance is not detected on the basis of the processed values of the best state scores and best token scores, the speech recognizer is configured to continue speech processing for a next received speech frame and to calculate values of state scores and token scores and to determine the best state score and best token score for the next received speech frame, and if the end of utterance is detected on the basis of the processed values of the best state scores and best token scores, the speech recognizer is configured to end the speech processing.
-
13. A method comprising:
-
processing, in a data processing device, values of best state scores and best token scores associated with frames of received speech data for end of utterance detection purposes, the processing comprising; calculating values of state scores and token scores associated with frames of received speech data, determining best state scores and best token scores, a best state score being a score of a state having the best probability amongst a number of states in a state model for speech recognition purposes, and a best token score being the best probability of a token amongst a number of tokens used for speech recognition purposes, determining whether recognition result determined from received speech data is stabilized, and determining, in response to the recognition result being stabilized, on the basis of the processed values of the best state scores and best token scores, whether end of utterance is detected or not. - View Dependent Claims (14, 15, 16, 17)
-
-
18. An electronic device comprising a speech recognizer, wherein the speech recognizer is configured to determine whether recognition result determined from received speech data is stabilized,
the speech recognizer is configured to process values of best state scores and best token scores associated with frames of received speech data for end of utterance detection purposes, the processing comprising: -
calculating values of state scores and token scores associated with frames of received speech data, determining best state scores and best token scores, a best state score being a score of a state having the best probability amongst a number of states in a state model for speech recognition purposes, and a best token score being the best probability of a token amongst a number of tokens used for speech recognition purposes, and the speech recognizer is configured to determine, in response to the recognition result being stabilized, on the basis of the processed values of the best state scores and best token scores whether end of utterance is detected or not. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A non-transitory computer readable medium encoded with a computer program, loadable into the memory of a data processing device, the computer program comprising:
-
program code for processing values of best state scores and best token scores associated with frames of received speech data for end of utterance detection purposes, the processing comprising calculating values of state scores and token scores associated with frames of received speech data, determining best state scores and best token scores, a best state score being a score of a state having the best probability amongst a number of states in a state model for speech recognition purposes, and a best token score being the best probability of a token amongst a number of tokens used for speech recognition purposes, program code for determining whether recognition result determined from received speech data is stabilized, and program code for determining, in response to the recognition result being stabilized, on the basis of the processed values of the best state scores and best token scores, whether end of utterance is detected or not. - View Dependent Claims (32)
-
-
33. An apparatus comprising a processor and a memory, the apparatus being configured to:
-
receive frames of speech data; determine whether recognition result determined from the received speech data is stabilized; process values of best state scores and best token scores associated with frames of received speech data for end of utterance detection purposes, the process comprising calculating values of state scores and token scores associated with frames of received speech data, determining best state scores and best token scores, a best state score being a score of a state having the best probability amongst a number of states in a state model for speech recognition purposes, and a best token score being the best probability of a token amongst a number of tokens used for speech recognition purposes; and determine, in response to the recognition result being stabilized, on the basis of the processed values of the best state scores and best token scores, whether end of utterance is detected or not. - View Dependent Claims (34)
-
-
35. An apparatus comprising:
-
means for receiving frames of speech data; means for determining whether a recognition result determined from the received speech data is stabilized; means for processing values of best state scores and best token scores associated with frames of received speech data for end of utterance detection purposes, the processing comprising means for calculating values of state scores and token scores associated with frames of received speech data, means for determining best state scores and best token scores, a best state score being a score of a state having the best probability amongst a number of states in a state model for speech recognition purposes, and a best token score being the best probability of a token amongst a number of tokens used for speech recognition purposes; and means for determining, in response to the recognition result being stabilized, on the basis of the processed values of the best state scores and best token scores, whether end of utterance is detected or not. - View Dependent Claims (36)
-
Specification