Selective speech recognition for chat and digital personal assistant systems
First Claim
1. A method for speech recognition in a chat information system (CIS), the method comprising:
- receiving, by a processor operatively coupled to a memory, an audio input;
separating, by the processor, the audio input into a plurality of parts having at least a first part of the audio input and a second part of the audio input;
selecting, from a plurality of speech recognizers, a specific first speech recognizer to recognize the first part of the audio input, wherein selecting of the specific first speech recognizer to recognize the first part of the audio input is by the processor and is based on predetermined criteria,wherein each of the plurality of speech recognizers, from which the specific first speech recognizer is selected to recognize the first part of the audio input, is configured to generate, based on a corresponding audio input, a plurality of outputs provided with corresponding confidence levels;
recognizing, by the specific first speech recognizer of a plurality of speech recognizers, the first part of the audio input to generate a first recognized input;
analyzing, by the processor, the first recognized input associated with the first part of the audio input to identify at least one first trigger in the first recognized input;
predicting, by the processor, a type of the second part of the audio input based at least in part on the at least one first trigger;
based on the prediction of the type of the second part of the audio input, selecting, by the processor, a specific second speech recognizer from the plurality of speech recognizers;
recognizing, by the specific second speech recognizer, the second part of the audio input to generate a second recognized input;
analyzing, by the processor, the second recognized input to identify at least one second trigger in the second recognized input;
predicting, by the processor, types of further parts of the audio input based at least in part on triggers identified in recognized inputs;
selecting, from the plurality of speech recognizers, further specific speech recognizers based on the predicted types of the further parts of the audio input, the further specific speech recognizers being in addition to the first speech recognizer and the second speech recognizer; and
recognizing, by the further specific speech recognizers, the further parts of the audio input until all parts of the audio input are recognized.
3 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are computer-implemented methods and systems for dynamic selection of speech recognition systems for the use in Chat Information Systems (CIS) based on multiple criteria and context of human-machine interaction. Specifically, once a first user audio input is received, it is analyzed so as to locate specific triggers, determine the context of the interaction or predict the subsequent user audio inputs. Based on at least one of these criteria, one of a free-diction recognizer, pattern-based recognizer, address book based recognizer or dynamically created recognizer is selected for recognizing the subsequent user audio input. The methods described herein increase the accuracy of automatic recognition of user voice commands, thereby enhancing overall user experience of using CIS, chat agents and similar digital personal assistant systems.
77 Citations
12 Claims
-
1. A method for speech recognition in a chat information system (CIS), the method comprising:
-
receiving, by a processor operatively coupled to a memory, an audio input; separating, by the processor, the audio input into a plurality of parts having at least a first part of the audio input and a second part of the audio input; selecting, from a plurality of speech recognizers, a specific first speech recognizer to recognize the first part of the audio input, wherein selecting of the specific first speech recognizer to recognize the first part of the audio input is by the processor and is based on predetermined criteria, wherein each of the plurality of speech recognizers, from which the specific first speech recognizer is selected to recognize the first part of the audio input, is configured to generate, based on a corresponding audio input, a plurality of outputs provided with corresponding confidence levels; recognizing, by the specific first speech recognizer of a plurality of speech recognizers, the first part of the audio input to generate a first recognized input; analyzing, by the processor, the first recognized input associated with the first part of the audio input to identify at least one first trigger in the first recognized input; predicting, by the processor, a type of the second part of the audio input based at least in part on the at least one first trigger; based on the prediction of the type of the second part of the audio input, selecting, by the processor, a specific second speech recognizer from the plurality of speech recognizers; recognizing, by the specific second speech recognizer, the second part of the audio input to generate a second recognized input; analyzing, by the processor, the second recognized input to identify at least one second trigger in the second recognized input; predicting, by the processor, types of further parts of the audio input based at least in part on triggers identified in recognized inputs; selecting, from the plurality of speech recognizers, further specific speech recognizers based on the predicted types of the further parts of the audio input, the further specific speech recognizers being in addition to the first speech recognizer and the second speech recognizer; and recognizing, by the further specific speech recognizers, the further parts of the audio input until all parts of the audio input are recognized. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
Specification