Speech recognition dependent on text message content
First Claim
Patent Images
1. A method of automatic speech recognition, comprising the steps of:
- a) receiving a text message at a speech recognition client device;
b) processing the text message with conversational context-specific language models and emotional context-specific language models stored on the client device using at least one processor of the client device to identify a conversational context and an emotional context corresponding to the text message;
c) synthesizing speech from the text message;
d) communicating the synthesized speech via a loudspeaker of the client device to a user of the client device;
e) receiving a reply utterance in response to the text message from the user via a microphone of the client device that converts the reply utterance into a speech signal;
f) pre-processing the speech signal using the at least one processor to extract acoustic data from the received speech signal;
g) communicating the extracted acoustic data, the identified conversational context, and identified emotional context to a speech recognition server;
h) identifying an acoustic model of a plurality of acoustic models stored at the server to be used for decoding the acoustic data based on the identified conversational context, the identified emotional context, or both;
i) decoding the acoustic data using the identified acoustic model to produce a plurality of hypotheses for the reply utterance; and
j) post-processing the plurality of hypotheses to identify one of the hypotheses as the reply utterance;
k) presenting the identified hypothesis to the user;
l) seeking confirmation from the user that the identified hypothesis is correct;
m) outputting the identified hypothesis as at least part of a reply text message if the user confirms that the identified hypothesis is correct;
otherwisen) using the emotional context to improve identification of the acoustic model, and repeating steps e) through m).
3 Assignments
0 Petitions
Accused Products
Abstract
A method of automatic speech recognition. An utterance is received from a user in reply to a text message, via a microphone that converts the reply utterance into a speech signal. The speech signal is processed using at least one processor to extract acoustic data from the speech signal. An acoustic model is identified from a plurality of acoustic models to decode the acoustic data, and using a conversational context associated with the text message. The acoustic data is decoded using the identified acoustic model to produce a plurality of hypotheses for the reply utterance.
63 Citations
2 Claims
-
1. A method of automatic speech recognition, comprising the steps of:
-
a) receiving a text message at a speech recognition client device; b) processing the text message with conversational context-specific language models and emotional context-specific language models stored on the client device using at least one processor of the client device to identify a conversational context and an emotional context corresponding to the text message; c) synthesizing speech from the text message; d) communicating the synthesized speech via a loudspeaker of the client device to a user of the client device; e) receiving a reply utterance in response to the text message from the user via a microphone of the client device that converts the reply utterance into a speech signal; f) pre-processing the speech signal using the at least one processor to extract acoustic data from the received speech signal; g) communicating the extracted acoustic data, the identified conversational context, and identified emotional context to a speech recognition server; h) identifying an acoustic model of a plurality of acoustic models stored at the server to be used for decoding the acoustic data based on the identified conversational context, the identified emotional context, or both; i) decoding the acoustic data using the identified acoustic model to produce a plurality of hypotheses for the reply utterance; and j) post-processing the plurality of hypotheses to identify one of the hypotheses as the reply utterance; k) presenting the identified hypothesis to the user; l) seeking confirmation from the user that the identified hypothesis is correct; m) outputting the identified hypothesis as at least part of a reply text message if the user confirms that the identified hypothesis is correct;
otherwisen) using the emotional context to improve identification of the acoustic model, and repeating steps e) through m).
-
-
2. A method of automatic speech recognition, comprising the steps of:
-
a) receiving a text message at a speech recognition client device; b) processing the text message with conversational context-specific language models and emotional context-specific language models stored on the client device using at least one processor of the client device to identify a conversational context and emotional context corresponding to the text message; c) synthesizing speech from the text message; d) communicating the synthesized speech via a loudspeaker of the client device to a user of the client device; e) receiving a reply utterance in response to the text message from the user via a microphone of the client device that converts the reply utterance into a speech signal; f) pre-processing the speech signal using the at least one processor to extract acoustic data from the received speech signal; g) identifying an acoustic model of a plurality of acoustic models to decode the acoustic data, using the identified conversational context and emotional context associated with the text message; h) decoding the acoustic data using the identified acoustic model to produce a plurality of hypotheses for the reply utterance; i) determining whether a confidence value associated with at least one of the plurality of hypotheses for the reply utterance is greater or less than a confidence threshold; j) communicating the extracted acoustic data, the conversational context, and the emotional context to a speech recognition server, if the confidence value is determined to be less than the confidence threshold, otherwise post-processing the plurality of hypotheses to identify one of the hypotheses as the reply utterance, and outputting from the client device the identified hypothesis as at least part of a reply text message; k) identifying at the server, an acoustic model of a plurality of acoustic models stored at the server to decode the acoustic data, using the identified conversational context, the emotional context, or both; l) decoding the acoustic data using the acoustic model identified at the server to produce a plurality of hypotheses for the reply utterance; m) post-processing the plurality of hypotheses to identify one of the hypotheses as the reply utterance; and n) outputting from the server the identified hypothesis as at least part of a reply text message.
-
Specification