Localized speech recognition with offload

US 8,880,398 B1
Filed: 01/21/2013
Issued: 11/04/2014
Est. Priority Date: 07/13/2012
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, by a local computing device, an utterance from a user device, wherein the user device and the local computing device are part of a local network;

in response to receiving the utterance, obtaining a text string transcription of the utterance from an automatic speech recognition (ASR) module of the local computing device, and selecting a response mode for the utterance from among a text-based response mode and a non-text-based response mode, wherein obtaining the text string transcription of the utterance comprises transcribing, by the ASR module of the local computing device, the utterance into the text string transcription, wherein the text string transcription includes a representation of the utterance, and wherein transcribing the utterance into the text string transcription comprises determining that the utterance matches a speaker adaptation profile, applying speaker adaptation parameters to the utterance, and updating the speaker adaptation parameters based at least on characteristics of the utterance, wherein the speaker adaptation parameters are associated with the speaker adaptation profile, and wherein the text string transcription is based on the speaker adaptation parameters;

if the selected response mode is the text-based response mode, providing, by the local computing device, the text string transcription to a target device;

if the selected response mode is the non-text-based response mode, (i) converting the text string transcription into one or more non-text, device-executable commands from a non-text, device-executable command set supported by the target device, and (ii) providing, by the local computing device, the one or more non-text, device-executable commands to the target device;

receiving, by the local computing device, a second utterance from the user device; and

in response to receiving the second utterance, obtaining a second text string transcription of the second utterance, wherein the second text string transcription is based on the updated speaker adaptation profile.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A local computing device may receive an utterance from a user device. In response to receiving the utterance, the local computing device may obtain a text string transcription of the utterance, and determine a response mode for the utterance. If the response mode is a text-based mode, the local computing device may provide the text string transcription to a target device. If the response mode is a non-text-based mode, the local computing device may convert the text string transcription into one or more commands from a command set supported by the target device, and provide the one or more commands to the target device.

152 Citations

12 Claims

1. A method comprising:
- receiving, by a local computing device, an utterance from a user device, wherein the user device and the local computing device are part of a local network;
  
  in response to receiving the utterance, obtaining a text string transcription of the utterance from an automatic speech recognition (ASR) module of the local computing device, and selecting a response mode for the utterance from among a text-based response mode and a non-text-based response mode, wherein obtaining the text string transcription of the utterance comprises transcribing, by the ASR module of the local computing device, the utterance into the text string transcription, wherein the text string transcription includes a representation of the utterance, and wherein transcribing the utterance into the text string transcription comprises determining that the utterance matches a speaker adaptation profile, applying speaker adaptation parameters to the utterance, and updating the speaker adaptation parameters based at least on characteristics of the utterance, wherein the speaker adaptation parameters are associated with the speaker adaptation profile, and wherein the text string transcription is based on the speaker adaptation parameters;
  
  if the selected response mode is the text-based response mode, providing, by the local computing device, the text string transcription to a target device;
  
  if the selected response mode is the non-text-based response mode, (i) converting the text string transcription into one or more non-text, device-executable commands from a non-text, device-executable command set supported by the target device, and (ii) providing, by the local computing device, the one or more non-text, device-executable commands to the target device;
  
  receiving, by the local computing device, a second utterance from the user device; and
  
  in response to receiving the second utterance, obtaining a second text string transcription of the second utterance, wherein the second text string transcription is based on the updated speaker adaptation profile.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, wherein the target device is the user device.
  - 3. The method of claim 1, wherein the non-text-based response mode is a non-text, device-executable command-based response mode.
  - 4. The method of claim 1, wherein selecting the response mode for the utterance comprises:
    - looking up a profile of the target device in a database accessible to the local computing device; and
      
      selecting the response mode for the utterance based of the profile of the target device.
  - 5. The method of claim 1, wherein selecting the response mode for the utterance comprises selecting the response mode for the utterance based on a message sent by the user device, the message containing at least part of the utterance.

6. An article of manufacture including a computer-readable storage medium, having stored thereon program instructions that, upon execution by a local computing device, cause the local computing device to perform operations comprising:
- receiving an utterance from a user device, wherein the user device and the local computing device are part of a local network;
  
  in response to receiving the utterance, obtaining a text string transcription of the utterance from an automatic speech recognition (ASR) module of the local computing device, and selecting a response mode for the utterance from among a text-based response mode and a non-text-based response mode, wherein obtaining the text string transcription of the utterance comprises transcribing, by the ASR module of the local computing device, the utterance into the text string transcription, wherein the text string transcription includes a representation of the utterance, and wherein transcribing the utterance into the text string transcription comprises determining that the utterance matches a speaker adaptation profile, applying speaker adaptation parameters to the utterance, and updating the speaker adaptation parameters based at least on characteristics of the utterance, wherein the speaker adaptation parameters are associated with the speaker adaptation profile, and wherein the text string transcription is based on the speaker adaptation parameters;
  
  if the selected response mode is the text-based response mode, providing, by the local computing device, the text string transcription to a target device; and
  
  if the selected response mode is the non-text-based response mode, (i) converting the text string transcription into one or more non-text, device-executable commands from a non-text, device-executable command set supported by the target device, and (ii) providing, by the local computing device, the one or more non-text, device-executable commands to the target device.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The article of manufacture of claim 6, wherein the target device is the user device.
  - 8. The article of manufacture of claim 6, wherein the non-text-based response mode is a non-text, device-executable command-based response mode.
  - 9. The article of manufacture of claim 6, wherein selecting the response mode for the utterance comprises:
    - looking up a profile of the target device in a database accessible to the local computing device; and
      
      selecting the response mode for the utterance based of the profile of the target device.
  - 10. The article of manufacture of claim 6, wherein selecting the response mode for the utterance comprises selecting the response mode for the utterance based on a message sent by the user device, the message containing at least part of the utterance.

11. A method comprising:
- receiving, by a local computing device, a first utterance from a first user device;
  
  in response to receiving the first utterance, obtaining a first text string transcription of the first utterance from an automatic speech recognition (ASR) module of the local computing device, wherein obtaining the first text string transcription of the first utterance comprises transcribing, by the ASR module of the local computing device, the first utterance into the first text string transcription, wherein the first text string transcription includes a representation of the first utterance, and wherein transcribing the first utterance into the first text string transcription comprises determining that the first utterance matches a speaker adaptation profile, applying speaker adaptation parameters to the utterance, and updating the speaker adaptation parameters based at least on characteristics of the first utterance, wherein the speaker adaptation parameters are associated with the speaker adaptation profile, and wherein the first text string transcription is based on the speaker adaptation parameters;
  
  determining that the first user device seeks a textual representation of the first utterance;
  
  in response to determining that the first user device seeks the textual representation of the first utterance, transmitting, by the local computing device, the first text string transcription to the first user device;
  
  receiving, by the local computing device, a second utterance from a second user device, wherein the first user device, the second user device, and the local computing device are part of a local network;
  
  in response to receiving the second utterance, obtaining a second text string transcription of the second utterance from the ASR module of the local computing device, wherein obtaining the second text string transcription of the second utterance comprises transcribing, by the ASR module of the local computing device, the second utterance into the second text string transcription, wherein the second text string transcription includes a representation of the second utterance, and wherein transcribing the second utterance into the second text string transcription comprises determining that the second utterance matches the speaker adaptation profile, applying the updated speaker adaptation parameters to the second utterance, and further updating the speaker adaptation parameters based at least on characteristics of the second utterance, wherein the second text string transcription is based on the updated speaker adaptation parameters;
  
  determining that the second user device seeks a non-text, device-executable command-based representation of the second utterance; and
  
  in response to determining that the second user device seeks the non-text, device-executable command-based representation of the second utterance, (i) converting the second text string transcription into one or more non-text, device-executable commands in a non-text, device-executable command set supported by the second user device, and (ii) transmitting, by the local computing device, the one or more non-text, device-executable commands to the second user device.
- View Dependent Claims (12)
- - 12. The method of claim 11, further comprising:
    - determining that the first user device has speech recognition capability; and
      
      based on the first user device having speech recognition capability, providing the updated speaker adaptation parameters to the first user device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Aleksic, Petar, Lei, Xin
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
SHIN, SEONG-AH A

Application Number

US13/746,039
Time in Patent Office

652 Days
Field of Search

704/235, 704/257, 704/275
US Class Current

704/235
CPC Class Codes

G10L 15/07   to the speaker

G10L 15/30   Distributed recognition, e....

G10L 2015/223   Execution procedure of a sp...

G10L 21/00   Speech or voice signal proc...

Localized speech recognition with offload

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

152 Citations

12 Claims

Specification

Use Cases

Quick Links

Others

Localized speech recognition with offload

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

152 Citations

12 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others