Techniques for client-side speech domain detection using gyroscopic data and a system using the same

US 10,692,492 B2
Filed: 09/29/2017
Issued: 06/23/2020
Est. Priority Date: 09/29/2017
Status: Active Grant

First Claim

Patent Images

1. A device comprising:

a memory;

a first processor coupled to the memory, wherein the first processor includes a low power mode, wherein while in the low power mode the first processor being configured to;

receive a plurality of audio samples;

identify at least one context characteristic associated with the received plurality of audio samples, wherein the at least one context characteristic includes linguistic characteristics associated with the received plurality of audio samples, wherein the at least one context characteristic includes a keyword or key phrase; and

a second processor to remotely host a voice recognition engine to analyze speech, and in response to establishing communication with the first processor, receive a session initialization message by the voice recognition engine, wherein the session initialization message includes the at least one context characteristic, wherein the session initialization message to cause the voice recognition engine to load one or more models into a memory based at least in part on the at least one identified context characteristic, wherein the voice recognition engine analyzes speech from at least part of the plurality of audio samples with the one or more models;

Wherein the session initialization message includes sensor data from a gyroscope.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques are disclosed for client-side analysis of audio samples to identify one or more characteristics associated with captured audio. The client-side analysis may then allow a user device, e.g., a smart phone, laptop computer, in-car infotainment system, and so on, to provide the one or more identified characteristics as configuration data to a voice recognition service at or shortly after connection with the same. In turn, the voice recognition service may load one or more recognition components, e.g., language models and/or application modules/engines, based on the received configuration data. Thus, latency may be reduced based on the voice recognition engine having “hints” that allow components to be loaded without necessarily having to process audio samples first. The reduction of latency may reduce processing time relative to other approaches to voice recognitions systems that exclusively perform server-side context recognition/classification.

26 Citations

View as Search Results

19 Claims

1. A device comprising:
- a memory;
  
  a first processor coupled to the memory, wherein the first processor includes a low power mode, wherein while in the low power mode the first processor being configured to;
  
  receive a plurality of audio samples;
  
  identify at least one context characteristic associated with the received plurality of audio samples, wherein the at least one context characteristic includes linguistic characteristics associated with the received plurality of audio samples, wherein the at least one context characteristic includes a keyword or key phrase; and
  
  a second processor to remotely host a voice recognition engine to analyze speech, and in response to establishing communication with the first processor, receive a session initialization message by the voice recognition engine, wherein the session initialization message includes the at least one context characteristic, wherein the session initialization message to cause the voice recognition engine to load one or more models into a memory based at least in part on the at least one identified context characteristic, wherein the voice recognition engine analyzes speech from at least part of the plurality of audio samples with the one or more models;
  
  Wherein the session initialization message includes sensor data from a gyroscope.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The device of claim 1, further comprising one or more sensors, the one or more sensors including at least one or more of a global positioning system (GPS) sensor, an accelerometer, and/or a gyroscope.
  - 3. The device of claim 2, wherein the session initialization message includes sensor data from the one or more sensors.
  - 4. The device of claim 1, wherein the at least one identified context characteristic comprises at least one of a signal quality indicator, a speaker identity, a speaker gender, a speaker mood, a language identifier, and/or a domain classification.
  - 5. The device of claim 1, wherein the first processor is further configured to send the session initialization message to the voice recognition engine prior to sending the received plurality of audio samples to the voice recognition engine.
  - 6. The device of claim 1, wherein the first processor is further configured to receive a session established message, and in response to receiving the session established message, send the received plurality of audio samples to the voice recognition engine, and wherein the session established message indicates that the voice recognition engine has loaded one or more speech recognition components based at least in part on the at least one identified context characteristic.
  - 7. The device of claim 1, wherein the first processor establishes the connection with the voice recognition engine based on detecting a wakeup voice command spoken by a user.
  - 8. The device of claim 1, wherein the second processor is further configured to instantiate a context recognition engine (CRE), the CRE including a plurality of processing stages for context recognition, and wherein the second processor is further configured to identify the one or more context characteristics based on the CRE.
  - 9. The device of claim 1, wherein the first processor is further configured to receive a result message from the voice recognition engine, and in response to receiving the result message, perform one or more functions based on the received result message.
  - 10. A plurality of networked devices implemented as the device of claim 1.

11. A computer-implemented method for performing client-side domain detection on a plurality of audio samples, the method comprising:
- receiving, with a first processor, a plurality of audio samples, wherein the first processor includes a low power mode, wherein receiving the plurality of audio samples includes receiving the plurality of audio samples while the first processor operates in the low power mode;
  
  identifying, with the first processor, at least one context characteristic associated with the plurality of audio samples, wherein the at least one context characteristic includes linguistic characteristics associated with the received plurality of audio samples, wherein the at least one context characteristic includes a keyword or a key phrase; and
  
  hosting, with a second processor that is remotely coupled to the first processor, a voice recognition engine to analyze speech, and in response to establishing a connection with first processor, receiving a session initiation message, the session initiation message including at least one configuration parameter based on the at least one identified context characteristic and/or the at least one context characteristic, and wherein the session initiation message is configured to cause the voice recognition engine to load one or more recognition components, wherein the voice recognition engine analyzes speech from at least part of the plurality of audio samples with the one or more recognition components;
  
  Wherein the session initialization message includes sensor data from a gyroscope.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The computer-implemented method of claim 11, further comprising:
    - receiving, by the first processor, a session established message from the voice recognition engine; and
      
      in response to receiving the session established message, sending by the first processor the received plurality of audio samples to the voice recognition engine.
  - 13. The computer-implemented method of claim 11, wherein the at least one identified context characteristic comprises at least one of a signal quality indicator, a speaker identity, a language identifier, and/or a domain classification.
  - 14. The computer-implemented method of claim 11, further comprising detecting, by the first processor, a user-input event, and wherein establishing the connection with the voice recognition engine is based on the detected user-input event.
  - 15. The computer-implemented method of claim 11, further comprising receiving, by the first processor, a result message from the voice recognition engine, and in response to receiving the result message, performing one or more functions based on the received message.

16. A non-transitory computer-readable medium having a plurality of instructions encoded thereon that when executed by at least one processor cause a process to be carried out, the process being configured to:
- receive, with a first processor, a plurality of audio samples, wherein the first processor includes a low power mode, wherein receive the plurality of audio samples includes receive the plurality of audio samples while the first processor operates in the low power mode;
  
  identify, with the first processor, at least one context characteristic associated with the plurality of audio samples, wherein the at least one context characteristic includes linguistic characteristics associated with the received plurality of audio samples, wherein the at least one context characteristic includes a keyword or a key phrase; and
  
  host, with a second processor that is remotely coupled to the first processor, a voice recognition engine to analyze speech, and in response to establishing a connection with first processor, receive a session initiation message, the session initiation message including at least one configuration parameter based on the at least one identified context characteristic and/or the at least one context characteristic, and wherein the session initiation message is configured to cause the voice recognition engine to load one or more recognition components, wherein the voice recognition engine analyzes speech from at least part of the plurality of audio samples with the one or more recognition components;
  
  Wherein the session initialization message includes sensor data from a gyroscope.
- View Dependent Claims (17, 18, 19)
- - 17. The computer-readable medium of claim 16, wherein the process is further configured to:
    - receive a session established message from the voice recognition engine; and
      
      in response to receiving the session established message, send the received plurality of audio samples to the voice recognition engine.
  - 18. The computer-readable medium of claim 16, wherein the at least one identified context characteristic comprises at least one of a signal quality indicator, a speaker identity, a language identifier, and/or a domain classification.
  - 19. The computer-readable medium of claim 16, wherein the process is further configured to receive a result message from the voice recognition engine, and in response to receiving the result message, performing one or more functions based on the received message.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel IP Corporation (Intel Corporation)
Inventors
Rozen, Piotr, Bocklet, Tobias, Nowicki, Jakub, Georges, Munir
Primary Examiner(s)
Kazeminezhad, Farzad

Application Number

US15/721,486
Publication Number

US 20190103100A1
Time in Patent Office

998 Days
Field of Search

704 70, 704249, 704252
US Class Current
CPC Class Codes

G06F 3/1423   controlling a plurality of ...

G06F 3/167   Audio in a user interface, ...

G10L 15/005   Language recognition

G10L 15/01   Assessment or evaluation of...

G10L 15/08   Speech classification or se...

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 17/00   Speaker identification or v...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

G10L 2015/227   of the speaker; Human-fact...

G10L 25/60   for measuring the quality o...

G10L 25/63   for estimating an emotional...

H04W 76/10   Connection setup

Techniques for client-side speech domain detection using gyroscopic data and a system using the same

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

26 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Techniques for client-side speech domain detection using gyroscopic data and a system using the same

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

26 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links