Selective speech recognition for chat and digital personal assistant systems

US 9,875,741 B2
Filed: 03/15/2013
Issued: 01/23/2018
Est. Priority Date: 03/15/2013
Status: Active Grant

First Claim

Patent Images

1. A method for speech recognition in a chat information system (CIS), the method comprising:

receiving, by a processor operatively coupled to a memory, an audio input;

separating, by the processor, the audio input into a plurality of parts having at least a first part of the audio input and a second part of the audio input;

selecting, from a plurality of speech recognizers, a specific first speech recognizer to recognize the first part of the audio input, wherein selecting of the specific first speech recognizer to recognize the first part of the audio input is by the processor and is based on predetermined criteria,wherein each of the plurality of speech recognizers, from which the specific first speech recognizer is selected to recognize the first part of the audio input, is configured to generate, based on a corresponding audio input, a plurality of outputs provided with corresponding confidence levels;

recognizing, by the specific first speech recognizer of a plurality of speech recognizers, the first part of the audio input to generate a first recognized input;

analyzing, by the processor, the first recognized input associated with the first part of the audio input to identify at least one first trigger in the first recognized input;

predicting, by the processor, a type of the second part of the audio input based at least in part on the at least one first trigger;

based on the prediction of the type of the second part of the audio input, selecting, by the processor, a specific second speech recognizer from the plurality of speech recognizers;

recognizing, by the specific second speech recognizer, the second part of the audio input to generate a second recognized input;

analyzing, by the processor, the second recognized input to identify at least one second trigger in the second recognized input;

predicting, by the processor, types of further parts of the audio input based at least in part on triggers identified in recognized inputs;

selecting, from the plurality of speech recognizers, further specific speech recognizers based on the predicted types of the further parts of the audio input, the further specific speech recognizers being in addition to the first speech recognizer and the second speech recognizer; and

recognizing, by the further specific speech recognizers, the further parts of the audio input until all parts of the audio input are recognized.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed are computer-implemented methods and systems for dynamic selection of speech recognition systems for the use in Chat Information Systems (CIS) based on multiple criteria and context of human-machine interaction. Specifically, once a first user audio input is received, it is analyzed so as to locate specific triggers, determine the context of the interaction or predict the subsequent user audio inputs. Based on at least one of these criteria, one of a free-diction recognizer, pattern-based recognizer, address book based recognizer or dynamically created recognizer is selected for recognizing the subsequent user audio input. The methods described herein increase the accuracy of automatic recognition of user voice commands, thereby enhancing overall user experience of using CIS, chat agents and similar digital personal assistant systems.

77 Citations

View as Search Results

12 Claims

1. A method for speech recognition in a chat information system (CIS), the method comprising:
- receiving, by a processor operatively coupled to a memory, an audio input;
  
  separating, by the processor, the audio input into a plurality of parts having at least a first part of the audio input and a second part of the audio input;
  
  selecting, from a plurality of speech recognizers, a specific first speech recognizer to recognize the first part of the audio input, wherein selecting of the specific first speech recognizer to recognize the first part of the audio input is by the processor and is based on predetermined criteria,wherein each of the plurality of speech recognizers, from which the specific first speech recognizer is selected to recognize the first part of the audio input, is configured to generate, based on a corresponding audio input, a plurality of outputs provided with corresponding confidence levels;
  
  recognizing, by the specific first speech recognizer of a plurality of speech recognizers, the first part of the audio input to generate a first recognized input;
  
  analyzing, by the processor, the first recognized input associated with the first part of the audio input to identify at least one first trigger in the first recognized input;
  
  predicting, by the processor, a type of the second part of the audio input based at least in part on the at least one first trigger;
  
  based on the prediction of the type of the second part of the audio input, selecting, by the processor, a specific second speech recognizer from the plurality of speech recognizers;
  
  recognizing, by the specific second speech recognizer, the second part of the audio input to generate a second recognized input;
  
  analyzing, by the processor, the second recognized input to identify at least one second trigger in the second recognized input;
  
  predicting, by the processor, types of further parts of the audio input based at least in part on triggers identified in recognized inputs;
  
  selecting, from the plurality of speech recognizers, further specific speech recognizers based on the predicted types of the further parts of the audio input, the further specific speech recognizers being in addition to the first speech recognizer and the second speech recognizer; and
  
  recognizing, by the further specific speech recognizers, the further parts of the audio input until all parts of the audio input are recognized.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein the separating of the audio input comprises recognizing, by one of the plurality of speech recognizers, at least a beginning part of the audio input to generate a recognized input.
  - 3. The method of claim 2, further comprising selecting, by the processor, the specific first speech recognizer based at least in part on the recognized input.
  - 4. The method of claim 1, wherein the at least one first trigger includes a type of the audio input identified based at least in part on the first recognized input.
  - 5. The method of claim 4, wherein the type of the audio input includes a free speech input or a pattern-based speech input.
  - 6. The method of claim 5, wherein the pattern-based speech input includes at least one of the following:
    - a name, a nickname, a title, an address, and a number.
  - 7. The method of claim 1, wherein the specific first speech recognizer or the specific second speech recognizer includes a pattern-based speech recognizer.
  - 8. The method of claim 1, wherein the specific first speech recognizer or the specific second speech recognizer includes a free-dictation recognizer.
  - 9. The method of claim 1, wherein the specific first speech recognizer or the specific second speech recognizer includes an address book based recognizer.
  - 10. The method of claim 1, wherein the specific first speech recognizer or the specific second speech recognizer includes a dynamically created recognizer.
  - 11. The method of claim 1, further comprising combining, by the processor, the first recognized input and the second recognized input.
  - 12. The method of claim 1, further comprising generating, by the CIS, a response based at least in part on the first recognized input or the second recognized input.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Gelfenbeyn, Ilya Genadevich, Goncharuk, Artem, Platonov, Ilya Andreevich, Sirotin, Pavel Aleksandrovich, Gelfenbeyn, Olga Aleksandrovna
Primary Examiner(s)
Mishra, Richa

Application Number

US14/775,729
Publication Number

US 20160027440A1
Time in Patent Office

1,775 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 15/02   Feature extraction for spee...

G10L 15/07   to the speaker

G10L 15/22   Procedures used during a sp...

G10L 15/32   Multiple recognisers used i...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

G10L 2015/228   of application context

Selective speech recognition for chat and digital personal assistant systems

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

77 Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Selective speech recognition for chat and digital personal assistant systems

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

77 Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links