SPEECH RECOGNITION DEPENDENT ON TEXT MESSAGE CONTENT

US 20120245934A1
Filed: 03/25/2011
Published: 09/27/2012
Est. Priority Date: 03/25/2011
Status: Active Grant

First Claim

Patent Images

1. A method of automatic speech recognition, comprising the steps of:

a) receiving from a user, an utterance in reply to a text message via a microphone that converts the reply utterance into a speech signal;

b) pre-processing the speech signal using at least one processor to extract acoustic data from the speech signal;

c) identifying an acoustic model of a plurality of acoustic models to decode the acoustic data, using a conversational context associated with the text message; and

d) decoding the acoustic data using the identified acoustic model to produce a plurality of hypotheses for the reply utterance.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of automatic speech recognition. An utterance is received from a user in reply to a text message, via a microphone that converts the reply utterance into a speech signal. The speech signal is processed using at least one processor to extract acoustic data from the speech signal. An acoustic model is identified from a plurality of acoustic models to decode the acoustic data, and using a conversational context associated with the text message. The acoustic data is decoded using the identified acoustic model to produce a plurality of hypotheses for the reply utterance.

Citations

17 Claims

1. A method of automatic speech recognition, comprising the steps of:
- a) receiving from a user, an utterance in reply to a text message via a microphone that converts the reply utterance into a speech signal;
  
  b) pre-processing the speech signal using at least one processor to extract acoustic data from the speech signal;
  
  c) identifying an acoustic model of a plurality of acoustic models to decode the acoustic data, using a conversational context associated with the text message; and
  
  d) decoding the acoustic data using the identified acoustic model to produce a plurality of hypotheses for the reply utterance.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, further comprising the step of:
    - e) post-processing the plurality of hypotheses to identify one of the hypotheses as the reply utterance.
  - 3. The method of claim 2, further comprising the steps of:
    - f) presenting the identified hypothesis to the user;
      
      g) seeking confirmation from the user that the identified hypothesis is correct; and
      
      h) outputting the identified hypothesis as at least part of a reply text message if the user confirms that the identified hypothesis is correct.
  - 4. The method of claim 3, further comprising the steps of:
    - i) processing the text message with conversational context-specific language models to identify the conversational context corresponding to the text message, and with emotional context-specific language models to identify an emotional context corresponding to the text message, wherein the language models are stored on a client device; and
      
      j) using the emotional context to improve identification of the acoustic model.
  - 5. The method of claim 2, further comprising the step of:
    - f) adapting the plurality of acoustic models with the identified hypothesis for improved speech recognition performance over time.
  - 6. The method of claim 5, wherein steps a) and b) are carried out on a speech recognition client device and steps c) through f) are carried out on a speech recognition server.
  - 7. The method of claim 6, wherein the adapting step f) also includes adapting a plurality of context-specific language models stored on the server with the identified hypothesis, and communicating the plurality of context-specific language models from the server to the client device to update language models stored on the client device for improved text message context classification over time.
  - 8. The method of claim 6, further comprising the steps of:
    - receiving the text message at the speech recognition client device;
      
      processing the text message with conversational context-specific language models to identify the conversational context corresponding to the text message, and with emotional context-specific language models to identify an emotional context corresponding to the text message, wherein the language models are stored on the client device; and
      
      communicating the identified conversational and emotional contexts to the speech recognition server.
  - 9. The method of claim 1 wherein the identifying and decoding steps c) and d) are initially carried out using a speech recognition client device.
  - 10. The method of claim 1 further comprising the steps of:
    - determining whether a confidence value associated with at least one of the plurality of hypotheses for the reply utterance is greater than a confidence threshold; and
      
      communicating the extracted acoustic data and the conversational context to a speech recognition server, if the confidence value is determined to be less than the confidence threshold;
      
      otherwisepost-processing the plurality of hypotheses to identify one of the hypotheses as the reply utterance; and
      
      outputting from the client device the identified hypothesis as at least part of a reply text message.

11. A method of automatic speech recognition, comprising the steps of:
- a) receiving a text message at a speech recognition client device;
  
  b) processing the text message with conversational context-specific language models stored on the client device using at least one processor of the client device to identify a conversational context corresponding to the text message;
  
  c) synthesizing speech from the text message;
  
  d) communicating the synthesized speech via a loudspeaker of the client device to a user of the client device;
  
  e) receiving a reply utterance from the user via a microphone of the client device that converts the reply utterance into a speech signal;
  
  f) pre-processing the speech signal using the at least one processor to extract acoustic data from the received speech signal;
  
  g) communicating the extracted acoustic data and the identified conversational context to a speech recognition server;
  
  h) identifying an acoustic model of a plurality of acoustic models stored at the server to decode the acoustic data, using the identified conversational context;
  
  i) decoding the acoustic data using the identified acoustic model to produce a plurality of hypotheses for the reply utterance; and
  
  j) post-processing the plurality of hypotheses to identify one of the hypotheses as the reply utterance.
- View Dependent Claims (12, 13, 14, 15, 16)
- - 12. The method of claim 11, further comprising the step of:
    - k) adapting the plurality of acoustic models with the identified hypotheses for improved speech recognition performance over time.
  - 13. The method of claim 12, wherein the adapting step also includes adapting a plurality of context-specific language models stored on the server with the identified hypothesis, and communicating the plurality of context-specific language models from the server to the client device to update the language models stored on the client device for improved text message context classification over time.
  - 14. The method of claim 12, further comprising the steps of:
    - l) processing the text message with emotional context-specific language models stored on the client device using at least one processor of the client device to identify an emotional context corresponding to the text message; and
      
      m) communicating the identified emotional context to the speech recognition server.
  - 15. The method of claim 14, wherein the identifying step is also carried out using the identified emotional context to improve identification of the acoustic model.
  - 16. The method of claim 14, further comprising the steps of:
    - n) presenting the identified hypothesis to the user;
      
      o) seeking confirmation from the user that the identified hypothesis is correct;
      
      p) outputting the identified hypothesis as at least part of a reply text message if the user confirms that the identified hypothesis is correct;
      
      otherwiseq) using the emotional context to improve identification of the acoustic model, and repeating steps e) through p).

17. A method of automatic speech recognition, comprising the steps of:
- a) receiving a text message at a speech recognition client device;
  
  b) processing the text message with conversational context-specific language models stored on the client device using at least one processor of the client device to identify a conversational context corresponding to the text message;
  
  c) synthesizing speech from the text message;
  
  d) communicating the synthesized speech via a loudspeaker of the client device to a user of the client device;
  
  e) receiving a reply utterance from the user via a microphone of the client device that converts the reply utterance into a speech signal;
  
  f) pre-processing the speech signal using the at least one processor to extract acoustic data from the received speech signal;
  
  g) identifying an acoustic model of a plurality of acoustic models to decode the acoustic data, using the identified conversational context associated with the text message;
  
  h) decoding the acoustic data using the identified acoustic model to produce a plurality of hypotheses for the reply utterance;
  
  i) determining whether a confidence value associated with at least one of the plurality of hypotheses for the reply utterance is greater or less than a confidence threshold;
  
  j) communicating the extracted acoustic data and the conversational context to a speech recognition server, if the confidence value is determined to be less than the confidence threshold, otherwise post-processing the plurality of hypotheses to identify one of the hypotheses as the reply utterance, and outputting from the client device the identified hypothesis as at least part of a reply text message;
  
  k) identifying at the server, an acoustic model of a plurality of acoustic models stored at the server to decode the acoustic data, using the identified conversational context;
  
  l) decoding the acoustic data using the acoustic model identified at the server to produce a plurality of hypotheses for the reply utterance;
  
  m) post-processing the plurality of hypotheses to identify one of the hypotheses as the reply utterance; and
  
  n) outputting from the server the identified hypothesis as at least part of a reply text message.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
General Motors LLC (General Motors Company)
Original Assignee
General Motors LLC (General Motors Company)
Inventors
Talwar, Gaurav, Zhao, Xufang

Granted Patent

US 9,202,465 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 2015/227   of the speaker; Human-fact...

G10L 2015/228   of application context

SPEECH RECOGNITION DEPENDENT ON TEXT MESSAGE CONTENT

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

SPEECH RECOGNITION DEPENDENT ON TEXT MESSAGE CONTENT

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links