System and terminal for presenting recommended utterance candidates
First Claim
1. A speech processing system, comprising:
- utterance input means for receiving an input of utterance information including a speech signal representing an utterance and prescribed environmental information representing an environment in which the utterance is made;
speech recognition means for performing speech recognition on the speech signal in the utterance information received by said utterance input means and for outputting a recognition result as a text;
data processing means for executing a prescribed data processing on the text output by said speech recognition means;
utterance sequence model storage means for storing an utterance sequence model statistically trained such that upon reception of a text of an utterance and said prescribed environmental information, a probability of an utterance in a prescribed set of utterances to be uttered successively following the utterance represented by said text can be calculated;
utterance storage means for storing utterances in said prescribed set of utterances and degree of confidence of data processing when each of said utterances in said set of utterances is processed by said data processing means; and
utterance candidate recommendation means, for scoring, in said set of utterances, candidates of utterances to be recommended to a user who made the utterance recognized by said speech recognition means, based on an evaluation score obtained by combining, in a prescribed form, a probability calculated for each utterance in said prescribed set by said utterance sequence model stored in said utterance sequence model storage means, using the result of recognition by said speech recognition means of the utterance information received by said utterance input means and the environmental information included in the speech information, and the degree of confidence of said data processing on each utterance in said prescribed set of utterances, and for recommending an utterance candidate to the user based on the scores.
1 Assignment
0 Petitions
Accused Products
Abstract
[Object] An object is to provide an easy-to-use speech processing system attaining higher accuracy of speech recognition.
[Solution] Receiving a speech utterance, the speech processing system performs speech recognition and displays a text of the recognition result. Further, the speech processing system translates the recognition result in accordance with settings to a text of another language and displays and synthesizes speech of the translated result. Further, the speech processing system selects utterance candidates having high possibility to be uttered as the next utterance and having high translation and speech recognitions scores, using outputs of various sensors at the time of utterance, a pre-trained utterance sequence model and translation and speech recognition scores of utterance candidates, and recommends utterance candidates in the form of an utterance candidate recommendation list. A user can think of what to say next using the utterances in utterance candidate recommendation list as hints.
-
Citations
14 Claims
-
1. A speech processing system, comprising:
-
utterance input means for receiving an input of utterance information including a speech signal representing an utterance and prescribed environmental information representing an environment in which the utterance is made; speech recognition means for performing speech recognition on the speech signal in the utterance information received by said utterance input means and for outputting a recognition result as a text; data processing means for executing a prescribed data processing on the text output by said speech recognition means; utterance sequence model storage means for storing an utterance sequence model statistically trained such that upon reception of a text of an utterance and said prescribed environmental information, a probability of an utterance in a prescribed set of utterances to be uttered successively following the utterance represented by said text can be calculated; utterance storage means for storing utterances in said prescribed set of utterances and degree of confidence of data processing when each of said utterances in said set of utterances is processed by said data processing means; and utterance candidate recommendation means, for scoring, in said set of utterances, candidates of utterances to be recommended to a user who made the utterance recognized by said speech recognition means, based on an evaluation score obtained by combining, in a prescribed form, a probability calculated for each utterance in said prescribed set by said utterance sequence model stored in said utterance sequence model storage means, using the result of recognition by said speech recognition means of the utterance information received by said utterance input means and the environmental information included in the speech information, and the degree of confidence of said data processing on each utterance in said prescribed set of utterances, and for recommending an utterance candidate to the user based on the scores. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A terminal, comprising:
-
a microphone; a set of sensors for collecting pieces of information related to surrounding environment; a display device; a communication device; and utterance information transmitting means, connected to said microphone, said set of sensors and said communication device, for transmitting utterance information containing a speech signal obtained from a signal output by said microphone upon reception of an utterance and pieces of information obtained from said set of sensors when said speech signal is obtained, to a prescribed speech processing server through said communication device, and for requesting speech recognition and a prescribed data processing on a result of recognition;
further comprising;process result presenting means, connected to said communication device, for receiving a process result of said data processing transmitted from said speech processing server in response to said request, and for presenting the process result to a user; and utterance candidate recommendation list display means, receiving an utterance candidate recommendation list recommended as a plurality of utterance candidates from said speech processing server and displaying the list on said display device, and thereby for recommending utterance candidates to said user. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
-
14. A speech processing system, comprising:
-
a non-transitory computer readable medium storing a prescribed set of utterances; and at least one processor configured to; receive utterance information including a speech signal and environmental information, wherein the speech signal represents an utterance made by a user and the environmental information includes measurements of an environment in which the utterance is made; perform speech recognition on the received speech signal and output a speech recognition result as text; execute a prescribed data processing on the outputted text, wherein the prescribed data processing has been executed on each of the prescribed set of utterances and indicates for each of the prescribed set of utterances a degree of confidence of the executed prescribed data processing; for each particular utterance in the prescribed set of utterances stored in the non-transitory computer readable medium, calculate a probability of the particular utterance successively following the utterance represented by said text by applying a statistically trained utterance sequence model to the text and received environmental information; score said prescribed set of utterances to determine utterance candidates to be recommended to the user that made the utterance recognized by said speech recognition means, wherein the scoring for each of the prescribed set of utterances is based on an evaluation score obtained by combining the calculated probability and the degree of confidence; and presenting at least one of the utterance candidates to the user, wherein the presented at least one utterance candidate is selected from the utterance candidates having top scores.
-
Specification