Techniques for client-side speech domain detection using gyroscopic data and a system using the same
First Claim
1. A device comprising:
- a memory;
a first processor coupled to the memory, wherein the first processor includes a low power mode, wherein while in the low power mode the first processor being configured to;
receive a plurality of audio samples;
identify at least one context characteristic associated with the received plurality of audio samples, wherein the at least one context characteristic includes linguistic characteristics associated with the received plurality of audio samples, wherein the at least one context characteristic includes a keyword or key phrase; and
a second processor to remotely host a voice recognition engine to analyze speech, and in response to establishing communication with the first processor, receive a session initialization message by the voice recognition engine, wherein the session initialization message includes the at least one context characteristic, wherein the session initialization message to cause the voice recognition engine to load one or more models into a memory based at least in part on the at least one identified context characteristic, wherein the voice recognition engine analyzes speech from at least part of the plurality of audio samples with the one or more models;
Wherein the session initialization message includes sensor data from a gyroscope.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques are disclosed for client-side analysis of audio samples to identify one or more characteristics associated with captured audio. The client-side analysis may then allow a user device, e.g., a smart phone, laptop computer, in-car infotainment system, and so on, to provide the one or more identified characteristics as configuration data to a voice recognition service at or shortly after connection with the same. In turn, the voice recognition service may load one or more recognition components, e.g., language models and/or application modules/engines, based on the received configuration data. Thus, latency may be reduced based on the voice recognition engine having “hints” that allow components to be loaded without necessarily having to process audio samples first. The reduction of latency may reduce processing time relative to other approaches to voice recognitions systems that exclusively perform server-side context recognition/classification.
26 Citations
19 Claims
-
1. A device comprising:
-
a memory; a first processor coupled to the memory, wherein the first processor includes a low power mode, wherein while in the low power mode the first processor being configured to; receive a plurality of audio samples; identify at least one context characteristic associated with the received plurality of audio samples, wherein the at least one context characteristic includes linguistic characteristics associated with the received plurality of audio samples, wherein the at least one context characteristic includes a keyword or key phrase; and a second processor to remotely host a voice recognition engine to analyze speech, and in response to establishing communication with the first processor, receive a session initialization message by the voice recognition engine, wherein the session initialization message includes the at least one context characteristic, wherein the session initialization message to cause the voice recognition engine to load one or more models into a memory based at least in part on the at least one identified context characteristic, wherein the voice recognition engine analyzes speech from at least part of the plurality of audio samples with the one or more models; Wherein the session initialization message includes sensor data from a gyroscope. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented method for performing client-side domain detection on a plurality of audio samples, the method comprising:
-
receiving, with a first processor, a plurality of audio samples, wherein the first processor includes a low power mode, wherein receiving the plurality of audio samples includes receiving the plurality of audio samples while the first processor operates in the low power mode; identifying, with the first processor, at least one context characteristic associated with the plurality of audio samples, wherein the at least one context characteristic includes linguistic characteristics associated with the received plurality of audio samples, wherein the at least one context characteristic includes a keyword or a key phrase; and hosting, with a second processor that is remotely coupled to the first processor, a voice recognition engine to analyze speech, and in response to establishing a connection with first processor, receiving a session initiation message, the session initiation message including at least one configuration parameter based on the at least one identified context characteristic and/or the at least one context characteristic, and wherein the session initiation message is configured to cause the voice recognition engine to load one or more recognition components, wherein the voice recognition engine analyzes speech from at least part of the plurality of audio samples with the one or more recognition components; Wherein the session initialization message includes sensor data from a gyroscope. - View Dependent Claims (12, 13, 14, 15)
-
-
16. A non-transitory computer-readable medium having a plurality of instructions encoded thereon that when executed by at least one processor cause a process to be carried out, the process being configured to:
-
receive, with a first processor, a plurality of audio samples, wherein the first processor includes a low power mode, wherein receive the plurality of audio samples includes receive the plurality of audio samples while the first processor operates in the low power mode; identify, with the first processor, at least one context characteristic associated with the plurality of audio samples, wherein the at least one context characteristic includes linguistic characteristics associated with the received plurality of audio samples, wherein the at least one context characteristic includes a keyword or a key phrase; and host, with a second processor that is remotely coupled to the first processor, a voice recognition engine to analyze speech, and in response to establishing a connection with first processor, receive a session initiation message, the session initiation message including at least one configuration parameter based on the at least one identified context characteristic and/or the at least one context characteristic, and wherein the session initiation message is configured to cause the voice recognition engine to load one or more recognition components, wherein the voice recognition engine analyzes speech from at least part of the plurality of audio samples with the one or more recognition components; Wherein the session initialization message includes sensor data from a gyroscope. - View Dependent Claims (17, 18, 19)
-
Specification