System and terminal for presenting recommended utterance candidates

US 9,824,687 B2
Filed: 07/01/2013
Issued: 11/21/2017
Est. Priority Date: 07/09/2012
Status: Expired due to Fees

First Claim

Patent Images

1. A speech processing system, comprising:

utterance input means for receiving an input of utterance information including a speech signal representing an utterance and prescribed environmental information representing an environment in which the utterance is made;

speech recognition means for performing speech recognition on the speech signal in the utterance information received by said utterance input means and for outputting a recognition result as a text;

data processing means for executing a prescribed data processing on the text output by said speech recognition means;

utterance sequence model storage means for storing an utterance sequence model statistically trained such that upon reception of a text of an utterance and said prescribed environmental information, a probability of an utterance in a prescribed set of utterances to be uttered successively following the utterance represented by said text can be calculated;

utterance storage means for storing utterances in said prescribed set of utterances and degree of confidence of data processing when each of said utterances in said set of utterances is processed by said data processing means; and

utterance candidate recommendation means, for scoring, in said set of utterances, candidates of utterances to be recommended to a user who made the utterance recognized by said speech recognition means, based on an evaluation score obtained by combining, in a prescribed form, a probability calculated for each utterance in said prescribed set by said utterance sequence model stored in said utterance sequence model storage means, using the result of recognition by said speech recognition means of the utterance information received by said utterance input means and the environmental information included in the speech information, and the degree of confidence of said data processing on each utterance in said prescribed set of utterances, and for recommending an utterance candidate to the user based on the scores.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

[Object] An object is to provide an easy-to-use speech processing system attaining higher accuracy of speech recognition.

[Solution] Receiving a speech utterance, the speech processing system performs speech recognition and displays a text of the recognition result. Further, the speech processing system translates the recognition result in accordance with settings to a text of another language and displays and synthesizes speech of the translated result. Further, the speech processing system selects utterance candidates having high possibility to be uttered as the next utterance and having high translation and speech recognitions scores, using outputs of various sensors at the time of utterance, a pre-trained utterance sequence model and translation and speech recognition scores of utterance candidates, and recommends utterance candidates in the form of an utterance candidate recommendation list. A user can think of what to say next using the utterances in utterance candidate recommendation list as hints.

Citations

14 Claims

1. A speech processing system, comprising:
- utterance input means for receiving an input of utterance information including a speech signal representing an utterance and prescribed environmental information representing an environment in which the utterance is made;
  
  speech recognition means for performing speech recognition on the speech signal in the utterance information received by said utterance input means and for outputting a recognition result as a text;
  
  data processing means for executing a prescribed data processing on the text output by said speech recognition means;
  
  utterance sequence model storage means for storing an utterance sequence model statistically trained such that upon reception of a text of an utterance and said prescribed environmental information, a probability of an utterance in a prescribed set of utterances to be uttered successively following the utterance represented by said text can be calculated;
  
  utterance storage means for storing utterances in said prescribed set of utterances and degree of confidence of data processing when each of said utterances in said set of utterances is processed by said data processing means; and
  
  utterance candidate recommendation means, for scoring, in said set of utterances, candidates of utterances to be recommended to a user who made the utterance recognized by said speech recognition means, based on an evaluation score obtained by combining, in a prescribed form, a probability calculated for each utterance in said prescribed set by said utterance sequence model stored in said utterance sequence model storage means, using the result of recognition by said speech recognition means of the utterance information received by said utterance input means and the environmental information included in the speech information, and the degree of confidence of said data processing on each utterance in said prescribed set of utterances, and for recommending an utterance candidate to the user based on the scores.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The speech processing system according to claim 1, wherein said data processing means includes automatic translation means receiving a result of recognition output from said speech recognition means of a given utterance, for automatically translating the result of recognition to a language different from the language of said given utterance and for outputting the translated result as a text;
    - andsaid degree of confidence is likelihood of the translated result by said automatic translation means being a translation of said given utterance in said different language.
  - 3. The speech processing system according to claim 2, wherein said data processing means further includes speech synthesizing means for synthesizing, based on the text in said different language output from said automatic translation means, a speech signal of said different language.
  - 4. The speech processing system according to claim 1, whereinsaid utterance candidate recommendation means includes means for estimating, in said set of utterances, a candidate of an utterance that successively follows the utterance speech-recognized by said speech recognition means, based on an evaluation in terms of a linear sum of the probability calculated by said utterance sequence model for each utterance in said prescribed set and the degree of confidence of each utterance in said prescribed set stored in said utterance storage means;
    - andin said linear sum, coefficients of said degree of confidence and said probability are both positive.
  - 5. The speech processing system according to claim 1, further comprising utterance candidate presenting means for presenting to the user an utterance candidate recommended by said utterance candidate recommendation means.
  - 6. The speech processing system according to claim 1, further comprising utterance text information input means, receiving utterance text information including a text representing an utterance and said prescribed environmental information, for applying the text in said utterance text information to said utterance candidate recommendation means and said data processing means, in place of the output of said speech recognition means.

7. A terminal, comprising:
- a microphone;
  
  a set of sensors for collecting pieces of information related to surrounding environment;
  
  a display device;
  
  a communication device; and
  
  utterance information transmitting means, connected to said microphone, said set of sensors and said communication device, for transmitting utterance information containing a speech signal obtained from a signal output by said microphone upon reception of an utterance and pieces of information obtained from said set of sensors when said speech signal is obtained, to a prescribed speech processing server through said communication device, and for requesting speech recognition and a prescribed data processing on a result of recognition;
  
  further comprising;
  
  process result presenting means, connected to said communication device, for receiving a process result of said data processing transmitted from said speech processing server in response to said request, and for presenting the process result to a user; and
  
  utterance candidate recommendation list display means, receiving an utterance candidate recommendation list recommended as a plurality of utterance candidates from said speech processing server and displaying the list on said display device, and thereby for recommending utterance candidates to said user.
- View Dependent Claims (8, 9, 10, 11, 12, 13)
- - 8. The terminal according to claim 7, whereinsaid prescribed data processing performed by said speech processing server on the result of said speech recognition is a process of automatically translating said utterance to a language different from the language of said utterance and further synthesizing a speech of a result of the automatic translation;
    - the process result of said data processing transmitted from said speech processing server is a speech signal representing the speech synthesized by said speech processing server; and
      
      said process result presenting means includes a speaker, and means for driving said speaker with the speech signal representing the speech synthesized by said speech processing server.
  - 9. The terminal according to claim 7, further comprising:
    - selecting means operable by a user for selecting any of the utterance candidates displayed by said utterance candidate recommendation list; and
      
      utterance text information transmitting means, responsive to selection of any of the utterance candidates in said utterance candidate recommendation list by said selecting means, for transmitting utterance text information including a text of the selected utterance candidate and pieces of information obtained from said set of sensors to a prescribed speech processing server through said communication device, and requesting said prescribed data processing on said utterance text information.
  - 10. The terminal according to claim 7, wherein:
    - said display device has a plurality of display areas including a first area, a second area, and a third area;
      
      said terminal further comprising means for causing said display device to show said result of recognition on said first area,said prescribed data processing including performing automatic translation of said utterance into a language different from a language of said utterance and synthesizing a speech sound signal representing a result of said translation,said process result of said data processing including the speech sound signal synthesized by said speech processing server;
      
      said terminal further comprising;
      
      translation result display means connected to receive said process result of said data processing, for causing said display device to show at least said result of said translation in said result of said data processing on said second area; and
      
      utterance candidate recommend list display means for causing said display device to show said utterance candidate recommend list on said third area.
  - 11. The terminal according to claim 10, whereinsaid display area further includes a fourth area;
    - said prescribed data processing further includes performing reverse translation of said result of said translation into an original language of said utterance;
      
      said process result of said data processing includes a speech sound signal representing a result of said reverse translation; and
      
      said terminal further includes reverse translation result display means for causing said display device to show said result of said reverse translation on said fourth area.
  - 12. The terminal according to claim 11, further including:
    - selecting means operable by a user to select any one of the utterance candidates displayed on said third area; and
      
      means in response to said selecting means selecting one of utterance candidates in said utterance candidate recommend list, for causing said display device to show said utterance candidate selected on said first area.
  - 13. The terminal according to claim 10, further including:
    - selecting means operable by a user to select any one of the utterance candidates displayed on said third area; and
      
      means in response to said selecting means selecting one of utterance candidates in said utterance candidate recommend list, for causing said display device to show said utterance candidate selected on said first area of said display device.

14. A speech processing system, comprising:
- a non-transitory computer readable medium storing a prescribed set of utterances; and
  
  at least one processor configured to;
  
  receive utterance information including a speech signal and environmental information, wherein the speech signal represents an utterance made by a user and the environmental information includes measurements of an environment in which the utterance is made;
  
  perform speech recognition on the received speech signal and output a speech recognition result as text;
  
  execute a prescribed data processing on the outputted text, wherein the prescribed data processing has been executed on each of the prescribed set of utterances and indicates for each of the prescribed set of utterances a degree of confidence of the executed prescribed data processing;
  
  for each particular utterance in the prescribed set of utterances stored in the non-transitory computer readable medium, calculate a probability of the particular utterance successively following the utterance represented by said text by applying a statistically trained utterance sequence model to the text and received environmental information;
  
  score said prescribed set of utterances to determine utterance candidates to be recommended to the user that made the utterance recognized by said speech recognition means, wherein the scoring for each of the prescribed set of utterances is based on an evaluation score obtained by combining the calculated probability and the degree of confidence; and
  
  presenting at least one of the utterance candidates to the user, wherein the presented at least one utterance candidate is selected from the utterance candidates having top scores.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
National Institute of Information and Communications Technology
Original Assignee
National Institute of Information and Communications Technology
Inventors
Sugiura, Komei, Okuma, Hideo, Kimura, Noriyuki, Shiga, Yoshinori, Hayashi, Teruaki, Mizukami, Etsuo
Primary Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US14/406,015
Publication Number

US 20170148436A1
Time in Patent Office

1,604 Days
Field of Search

704 2
US Class Current
CPC Class Codes

G06F 3/0482   Interaction with lists of s...

G06F 3/167   Audio in a user interface, ...

G06F 40/58   Use of machine translation,...

G10L 13/08   Text analysis or generation...

G10L 15/14   using statistical models, e...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 25/69   for evaluating synthetic or...

System and terminal for presenting recommended utterance candidates

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

System and terminal for presenting recommended utterance candidates

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links