Selective speech recognition for chat and digital personal assistant systems

US 9,865,264 B2
Filed: 05/14/2016
Issued: 01/09/2018
Est. Priority Date: 03/15/2013
Status: Active Grant

First Claim

Patent Images

1. A method for speech recognition in a Chat Information System (CIS), the method comprising:

receiving, by a processor operatively coupled with a memory, a first audio input, the first audio input captured by a microphone of a device of a user;

recognizing, by a first speech recognizer of a plurality of speech recognizers, at least a part of the first audio input to generate a first recognized input, wherein each of the plurality of speech recognizers is configured to generate a plurality of outputs provided with corresponding confidence levels, the processor being configured to select an output from the plurality of outputs based on the confidence levels;

providing, by the processor, a response to the first recognized input utilizing the CIS, the response being provided for presentation to the user via the device of the user;

determining, by the processor, a response type of the response provided utilizing the CIS, the response type predicting a type of input of the user that will follow the response;

receiving, by the processor, a second audio input that follows the response;

based on the determined response type of the response provided utilizing the CIS, selecting, by the processor, a second speech recognizer, of the plurality of speech recognizers, for use in recognizing the second audio input that follows the response;

recognizing, by the second speech recognizer, at least a part of the second audio input to generate a second recognized input; and

providing, by the processor, a second response based on the second recognized input utilizing the CIS, the second response being provided for presentation to the user via the device.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed are computer-implemented methods and systems for dynamic selection of speech recognition systems for the use in Chat Information Systems (CIS) based on multiple criteria and context of human-machine interaction. Specifically, once a first user audio input is received, it is analyzed so as to locate specific triggers, determine the context of the interaction or predict the subsequent user audio inputs. Based on at least one of these criteria, one of a free-diction recognizer, pattern-based recognizer, address book based recognizer or dynamically created recognizer is selected for recognizing the subsequent user audio input. The methods described herein increase the accuracy of automatic recognition of user voice commands, thereby enhancing overall user experience of using CIS, chat agents and similar digital personal assistant systems.

156 Citations

6 Claims

1. A method for speech recognition in a Chat Information System (CIS), the method comprising:
- receiving, by a processor operatively coupled with a memory, a first audio input, the first audio input captured by a microphone of a device of a user;
  
  recognizing, by a first speech recognizer of a plurality of speech recognizers, at least a part of the first audio input to generate a first recognized input, wherein each of the plurality of speech recognizers is configured to generate a plurality of outputs provided with corresponding confidence levels, the processor being configured to select an output from the plurality of outputs based on the confidence levels;
  
  providing, by the processor, a response to the first recognized input utilizing the CIS, the response being provided for presentation to the user via the device of the user;
  
  determining, by the processor, a response type of the response provided utilizing the CIS, the response type predicting a type of input of the user that will follow the response;
  
  receiving, by the processor, a second audio input that follows the response;
  
  based on the determined response type of the response provided utilizing the CIS, selecting, by the processor, a second speech recognizer, of the plurality of speech recognizers, for use in recognizing the second audio input that follows the response;
  
  recognizing, by the second speech recognizer, at least a part of the second audio input to generate a second recognized input; and
  
  providing, by the processor, a second response based on the second recognized input utilizing the CIS, the second response being provided for presentation to the user via the device.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, wherein the selecting of the second speech recognizer includes selecting, by the processor, a free-dictation recognizer, when the response type predicts that the type of the input of the user that will follow the response includes a free speech of the user.
  - 3. The method of claim 1, wherein the selecting of the second speech recognizer includes selecting, by the processor, a pattern-based recognizer, when the response type predicts that the type of the input of the user that will follow the response includes a pattern-based speech of the user.
  - 4. The method of claim 1, wherein the selecting of the second speech recognizer includes selecting, by the processor, an address book based recognizer, when the response type predicts that the type of the input of the user that will follow the response includes a name or nickname from a digital address book.
  - 5. The method of claim 1, wherein the selecting of the second speech recognizer includes selecting, by the processor, a dynamically created recognizer, when the response type predicts that the type of the input of the user that will follow the response includes an item from a list storing items of the same type.

6. A Chat Information System (CIS), the CIS comprising:
- a machine-readable medium storing instructions;
  
  one or more hardware processors executing the stored instructions to;
  
  receive a first audio input, the first audio input captured by a microphone;
  
  recognize, using a first speech recognizer of a plurality of speech recognizers, at least a part of the first audio input to generate a first recognized input, wherein each of the plurality of speech recognizers is configured to generate a plurality of outputs provided with corresponding confidence levels, one or more of the processors being configured to select an output from the plurality of outputs based on the confidence levels;
  
  provide a response to the first recognized input utilizing the CIS, the response being provided for presentation to the user via the device of the user;
  
  determine a response type of the response provided utilizing the CIS, the response type predicting a type of input of the user that will follow the response;
  
  receive a second audio input that follows the response;
  
  based on the determined response type of the response provided utilizing the CIS, select a second speech recognizer, of the plurality of speech recognizers, for use in recognizing the second audio input that follows the response;
  
  recognize, using the second speech recognizer, at least a part of the second audio input to generate a second recognized input; and
  
  provide a second response based on the second recognized input utilizing the CIS, the second response being provided for presentation to the user via the device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Gelfenbeyn, Ilya Genadevich, Goncharuk, Artem, Platonov, Ilya Andreevich, Sirotin, Pavel Aleksandrovich, Gelfenbeyn, Olga Aleksandrovna
Primary Examiner(s)
Mishra, Richa

Application Number

US15/154,944
Publication Number

US 20160260434A1
Time in Patent Office

605 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/07   to the speaker

G10L 15/22   Procedures used during a sp...

G10L 15/32   Multiple recognisers used i...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

G10L 2015/228   of application context

Selective speech recognition for chat and digital personal assistant systems

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

156 Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Selective speech recognition for chat and digital personal assistant systems

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

156 Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links