Speech interface device with caching component
First Claim
1. A speech interface device comprising:
- one or more processors; and
memory storing computer-executable instructions that, when executed by the one or more processors, cause the speech interface device to;
receive audio data that represents user speech;
perform, using a local speech processing component executing on the speech interface device, automatic speech recognition (ASR) processing on the audio data to generate first local text data;
perform, using the local speech processing component, natural language understanding (NLU) processing on the first local text data to generate first local intent data;
send the audio data to a remote speech processing component executing on a remote speech processing system;
receive response data from the remote speech processing system, the response data from the remote speech processing system including;
remote text data corresponding to the audio data,a remote intent data corresponding to the remote text data, andremote directive data;
determine that the first local text data differs from the remote text data;
perform, using the local speech processing component, NLU processing on the remote text data to generate second local intent data;
determine that the second local intent data matches the remote intent data;
store, in the memory;
the remote text data, andassociation data that associates the remote text data with the first local text data; and
perform an action based at least in part on the remote directive data.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech interface device is configured to receive response data from a remote speech processing system for responding to user speech. This response data may be enhanced with information such as a remote ASR result(s) and a remote NLU result(s). The response data from the remote speech processing system may include one or more cacheable status indicators associated with the NLU result(s) and/or remote directive data, which indicate whether the remote NLU result(s) and/or the remote directive data are individually cacheable. A caching component of the speech interface device allows for caching at least some of this cacheable remote speech processing information, and using the cached information locally on the speech interface device when responding to user speech in the future. This allows for responding to user speech, even when the speech interface device is unable to communicate with a remote speech processing system over a wide area network.
108 Citations
20 Claims
-
1. A speech interface device comprising:
-
one or more processors; and memory storing computer-executable instructions that, when executed by the one or more processors, cause the speech interface device to; receive audio data that represents user speech; perform, using a local speech processing component executing on the speech interface device, automatic speech recognition (ASR) processing on the audio data to generate first local text data; perform, using the local speech processing component, natural language understanding (NLU) processing on the first local text data to generate first local intent data; send the audio data to a remote speech processing component executing on a remote speech processing system; receive response data from the remote speech processing system, the response data from the remote speech processing system including; remote text data corresponding to the audio data, a remote intent data corresponding to the remote text data, and remote directive data; determine that the first local text data differs from the remote text data; perform, using the local speech processing component, NLU processing on the remote text data to generate second local intent data; determine that the second local intent data matches the remote intent data; store, in the memory; the remote text data, and association data that associates the remote text data with the first local text data; and perform an action based at least in part on the remote directive data. - View Dependent Claims (2, 3)
-
-
4. A method, comprising:
-
generating, by a speech processing component executing on a speech interface device and using audio data that represents user speech, first local automatic speech recognition (ASR) data; generating, by the speech interface device, first local natural language understanding (NLU) data; sending, by the speech interface device, the audio data to a remote speech processing system; receiving, by the speech interface device, response data from the remote speech processing system, the response data from the remote speech processing system including; remote ASR data corresponding to the audio data, remote NLU data corresponding to the remote ASR data, and remote directive data; determining, by the speech interface device, that the first local ASR data differs from the remote ASR data; performing, by the speech processing component, NLU processing on the remote ASR data to generate second local NLU data; determining, by the speech interface device, that the second local NLU data matches the remote NLU data; storing, in memory of the speech interface device; the remote ASR data, and association data that associates the remote ASR data with the first local ASR data; and causing the speech interface device to perform an action based at least in part on the remote directive data. - View Dependent Claims (5, 6, 7, 8, 9, 10)
-
-
11. A speech interface device comprising:
-
one or more processors; and memory storing computer-executable instructions that, when executed by the one or more processors, cause the speech interface device to; generate, by a speech processing component executing on the speech interface device and using audio data that represents user speech, first local automatic speech recognition (ASR) data; generate first local natural language understanding (NLU) data; send the audio data to a remote speech processing system; receive response data from the remote speech processing system, the response data from the remote speech processing system including; remote ASR data corresponding to the audio data, remote NLU data corresponding to the remote ASR data, and remote directive data; determine that the first local ASR data matches the remote ASR data; determine at least one of;
(i) that the first local NLU data represents a failure to recognize an intent, or (ii) that the first local NLU data differs from the remote NLU data;store, in the memory of the speech interface device; the remote NLU data, and association data that associates the remote NLU data with the first local ASR data; and perform an action based at least in part on the remote directive data. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification