Speech interface device with caching component

US 10,777,203 B1
Filed: 03/23/2018
Issued: 09/15/2020
Est. Priority Date: 03/23/2018
Status: Active Grant

First Claim

Patent Images

1. A speech interface device comprising:

one or more processors; and

memory storing computer-executable instructions that, when executed by the one or more processors, cause the speech interface device to;

receive audio data that represents user speech;

perform, using a local speech processing component executing on the speech interface device, automatic speech recognition (ASR) processing on the audio data to generate first local text data;

perform, using the local speech processing component, natural language understanding (NLU) processing on the first local text data to generate first local intent data;

send the audio data to a remote speech processing component executing on a remote speech processing system;

receive response data from the remote speech processing system, the response data from the remote speech processing system including;

remote text data corresponding to the audio data,a remote intent data corresponding to the remote text data, andremote directive data;

determine that the first local text data differs from the remote text data;

perform, using the local speech processing component, NLU processing on the remote text data to generate second local intent data;

determine that the second local intent data matches the remote intent data;

store, in the memory;

the remote text data, andassociation data that associates the remote text data with the first local text data; and

perform an action based at least in part on the remote directive data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech interface device is configured to receive response data from a remote speech processing system for responding to user speech. This response data may be enhanced with information such as a remote ASR result(s) and a remote NLU result(s). The response data from the remote speech processing system may include one or more cacheable status indicators associated with the NLU result(s) and/or remote directive data, which indicate whether the remote NLU result(s) and/or the remote directive data are individually cacheable. A caching component of the speech interface device allows for caching at least some of this cacheable remote speech processing information, and using the cached information locally on the speech interface device when responding to user speech in the future. This allows for responding to user speech, even when the speech interface device is unable to communicate with a remote speech processing system over a wide area network.

108 Citations

View as Search Results

20 Claims

1. A speech interface device comprising:
- one or more processors; and
  
  memory storing computer-executable instructions that, when executed by the one or more processors, cause the speech interface device to;
  
  receive audio data that represents user speech;
  
  perform, using a local speech processing component executing on the speech interface device, automatic speech recognition (ASR) processing on the audio data to generate first local text data;
  
  perform, using the local speech processing component, natural language understanding (NLU) processing on the first local text data to generate first local intent data;
  
  send the audio data to a remote speech processing component executing on a remote speech processing system;
  
  receive response data from the remote speech processing system, the response data from the remote speech processing system including;
  
  remote text data corresponding to the audio data,a remote intent data corresponding to the remote text data, andremote directive data;
  
  determine that the first local text data differs from the remote text data;
  
  perform, using the local speech processing component, NLU processing on the remote text data to generate second local intent data;
  
  determine that the second local intent data matches the remote intent data;
  
  store, in the memory;
  
  the remote text data, andassociation data that associates the remote text data with the first local text data; and
  
  perform an action based at least in part on the remote directive data.
- View Dependent Claims (2, 3)
- - 2. The speech interface device of claim 1, wherein the computer-executable instructions, when executed by the one or more processors, further cause the speech interface device to:
    - receive second audio data that represents second user speech;
      
      perform, by the local speech processing component, the ASR processing on the second audio data to generate second local text data;
      
      determine, by accessing the association data in the memory of the speech interface device, that the second local text data matches the first local text data;
      
      retrieve the remote text data from the memory of the speech interface device as retrieved remote text data; and
      
      perform, by the local speech processing component, the NLU processing on the retrieved remote text data to generate the second local intent data.
  - 3. The speech interface device of claim 2, wherein:
    - the response data from the remote speech processing system further includes a cacheable status indicator associated with the remote directive data indicating that the remote directive data is not cacheable; and
      
      the computer-executable instructions, when executed by the one or more processors, further cause the speech interface device to;
      
      generate, by the local speech processing component, local directive data based at least in part on the second local intent data; and
      
      perform, for a second time, the action based at least in part on the local directive data.

4. A method, comprising:
- generating, by a speech processing component executing on a speech interface device and using audio data that represents user speech, first local automatic speech recognition (ASR) data;
  
  generating, by the speech interface device, first local natural language understanding (NLU) data;
  
  sending, by the speech interface device, the audio data to a remote speech processing system;
  
  receiving, by the speech interface device, response data from the remote speech processing system, the response data from the remote speech processing system including;
  
  remote ASR data corresponding to the audio data,remote NLU data corresponding to the remote ASR data, andremote directive data;
  
  determining, by the speech interface device, that the first local ASR data differs from the remote ASR data;
  
  performing, by the speech processing component, NLU processing on the remote ASR data to generate second local NLU data;
  
  determining, by the speech interface device, that the second local NLU data matches the remote NLU data;
  
  storing, in memory of the speech interface device;
  
  the remote ASR data, andassociation data that associates the remote ASR data with the first local ASR data; and
  
  causing the speech interface device to perform an action based at least in part on the remote directive data.
- View Dependent Claims (5, 6, 7, 8, 9, 10)
- - 5. The method of claim 4, further comprising:
    - receiving, by the speech interface device, second audio data that represents second user speech;
      
      generating, by the speech interface device, second local ASR data based at least in part on the second audio data;
      
      determining, by accessing the association data in the memory of the speech interface device, that the second local ASR data matches the first local ASR data;
      
      retrieving the remote ASR data from the memory of the speech interface device as retrieved remote ASR data; and
      
      generating, by the speech interface device, the second local NLU data based at least in part on the retrieved remote ASR data.
  - 6. The method of claim 5, further comprising:
    - generating, by the speech processing component, local directive data based at least in part on the second local NLU data; and
      
      causing, for a second time, the speech interface device to perform the action based at least in part on the local directive data.
  - 7. The method of claim 5, further comprising:
    - determining, by the speech interface device, that the second local NLU data generated based on the retrieved remote ASR data is not executable on the speech interface device;
      
      sending, by the speech interface device, the second local NLU data to the remote speech processing system;
      
      receiving, by the speech interface device, second remote directive data from the remote speech processing system; and
      
      causing, for a second time, the speech interface device to perform the action based at least in part on the second remote directive data.
  - 8. The method of claim 5, wherein the response data from the remote speech processing system further includes a cacheable status indicator associated with the remote directive data indicating that the remote directive data is cacheable, the method further comprising:
    - storing, in the memory of the speech interface device, the remote directive data, wherein the association data further associates the remote directive data with at least one of the remote ASR data or the first local ASR data;
      
      in response to determining that the second local ASR data matches the first local ASR data, retrieving the remote directive data from the memory of the speech interface device as retrieved remote directive data; and
      
      causing, for a second time, the speech interface device to perform the action based at least in part on the retrieved remote directive data.
  - 9. The method of claim 8, wherein:
    - the retrieved remote directive data includes content; and
      
      the action comprises outputting the content via an output device of the speech interface device.
  - 10. The method of claim 4, further comprising determining, by the speech interface device, that the first local NLU data differs from the remote NLU data,wherein the performing, by the speech processing component, the NLU processing on the remote ASR data is in response to the determining that the first local NLU data differs from the remote NLU data.

11. A speech interface device comprising:
- one or more processors; and
  
  memory storing computer-executable instructions that, when executed by the one or more processors, cause the speech interface device to;
  
  generate, by a speech processing component executing on the speech interface device and using audio data that represents user speech, first local automatic speech recognition (ASR) data;
  
  generate first local natural language understanding (NLU) data;
  
  send the audio data to a remote speech processing system;
  
  receive response data from the remote speech processing system, the response data from the remote speech processing system including;
  
  remote ASR data corresponding to the audio data,remote NLU data corresponding to the remote ASR data, andremote directive data;
  
  determine that the first local ASR data matches the remote ASR data;
  
  determine at least one of;
  
  (i) that the first local NLU data represents a failure to recognize an intent, or (ii) that the first local NLU data differs from the remote NLU data;
  
  store, in the memory of the speech interface device;
  
  the remote NLU data, andassociation data that associates the remote NLU data with the first local ASR data; and
  
  perform an action based at least in part on the remote directive data.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The speech interface device of claim 11, wherein the computer-executable instructions, when executed by the one or more processors, further cause the speech interface device to:
    - receive second audio data that represents second user speech;
      
      generate, by the speech processing component, second local ASR data based at least in part on the second audio data;
      
      determine, by accessing the association data in the memory of the speech interface device, that the second local ASR data matches the first local ASR data;
      
      retrieve the remote NLU data from the memory of the speech interface device as retrieved remote NLU data; and
      
      perform, for a second time, the action based at least in part on the retrieved remote NLU data.
  - 13. The speech interface device of claim 12, wherein receiving the second audio data occurs at a time when the remote speech processing system is unavailable to the speech interface device.
  - 14. The speech interface device of claim 12, wherein the computer-executable instructions, when executed by the one or more processors, further cause the speech interface device to:
    - determine that the retrieved remote NLU data is not executable on the speech interface device;
      
      send the retrieved remote NLU data to the remote speech processing system; and
      
      receive second remote directive data from the remote speech processing system,wherein performing, for the second time, the action is further based on the second remote directive data.
  - 15. The speech interface device of claim 12, wherein the response data from the remote speech processing system further includes a cacheable status indicator associated with the remote directive data indicating that the remote directive data is cacheable, and wherein the computer-executable instructions, when executed by the one or more processors, further cause the speech interface device to:
    - store, in the memory of the speech interface device, the remote directive data, wherein the association data further associates the remote directive data with at least one of the remote NLU data or the first local ASR data; and
      
      in response to determining that the second local ASR data matches the first local ASR data, retrieve the remote directive data from the memory of the speech interface device as retrieved remote directive data,wherein performing, for the second time, the action is further based on the retrieved remote directive data.
  - 16. The speech interface device of claim 15, wherein the computer-executable instructions, when executed by the one or more processors, further cause the speech interface device to:
    - determine that the audio data corresponds to an utterance that has been detected more than a threshold number of times, or above a threshold frequency, for a user account that is associated with the speech interface device,wherein storing, in the memory of the speech interface device, the remote directive data is based at least in part on the utterance having been detected more than the threshold number of times, or above the threshold frequency.
  - 17. The speech interface device of claim 15, wherein the audio data corresponds to an utterance, and wherein the computer-executable instructions, when executed by the one or more processors, further cause the speech interface device to:
    - determine that a probability that the utterance will be detected in the future for a user account that is associated with the speech interface device meets or exceeds a threshold probability,wherein storing, in the memory of the speech interface device, the remote directive data is based at least in part on the probability meeting or exceeding the threshold probability.
  - 18. The speech interface device of claim 12, wherein:
    - the remote NLU data includes at least one slot that is resolved with text;
      
      the response data from the remote speech processing system further includes a cacheable status indicator associated with the at least one slot indicating that the text of the at least one slot is not cacheable;
      
      storing, in the memory of the speech interface device, the remote NLU data comprises storing the at least one slot with an unresolved label;
      
      the computer-executable instructions, when executed by the one or more processors, further cause the speech interface device to, in response to retrieving the remote NLU data from the memory, perform, by the speech processing component, slot resolution to resolve the at least one slot with second text; and
      
      performing, for the second time, the action is based on the retrieved remote NLU data with the at least one slot that is resolved with the second text.
  - 19. The speech interface device of claim 11, wherein:
    - the computer-executable instructions, when executed by the one or more processors, further cause the speech interface device to determine that the remote NLU data is executable using a local speechlet on the speech interface device; and
      
      storing, in the memory of the speech interface device, the remote NLU data is based at least in part on determining that the remote NLU data is executable using the local speechlet.
  - 20. The speech interface device of claim 11, wherein:
    - the response data from the remote speech processing system further includes a cacheable status indicator associated with the remote NLU data indicating that the remote NLU data is cacheable; and
      
      storing, in the memory of the speech interface device, the remote NLU data is based at least in part on the cacheable status indicator indicating that the remote NLU data is cacheable.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Pasko, Stanislaw Ignacy
Primary Examiner(s)
Islam, Mohammad K

Application Number

US15/934,761
Time in Patent Office

907 Days
Field of Search
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G06F 40/30   Semantic analysis

G10L 15/18   using natural language mode...

G10L 15/1822   Parsing for meaning underst...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

H04L 67/568   Storing data temporarily at...

H04L 67/5683   Storage of data provided by...

Speech interface device with caching component

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

108 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Speech interface device with caching component

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

108 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others