Passive training for automatic speech recognition
First Claim
1. A method for passive training for automatic speech recognition, the method comprising:
- maintaining both a first model and a second model that are configured to detect a spoken keyword or a key phrase in spoken utterances, wherein the spoken utterances comprise words and/or phrases spoken by a user;
utilizing only the first model, the first model being a speaker-independent model, to detect the spoken keyword or the key phrase in spoken utterances until the second model has been trained;
passively training the second model, using the spoken utterances comprising words and/or phrases spoken by the user, to detect the spoken keyword or the key phrase in the spoken utterances; and
in response to completion of the passive training, wherein completion requires performing the passive training with a plurality of the spoken utterances satisfying a predetermined set of criteria, switching from utilizing only the first, speaker-independent model to utilizing only the second, passively trained model to detect the spoken keyword or the key phrase in the spoken utterances,wherein passively training the second model includes selecting at least one utterance from the spoken utterances comprising words and/or phrases spoken by the user according to at least one predetermined criterion, and wherein the at least one pre-determined criterion includes one or more of a determination that a signal-to-noise ratio level of the selected utterance is below a pre-determined level and a determination that a duration of the selected utterance is below a pre-determined time period.
5 Assignments
0 Petitions
Accused Products
Abstract
Provided are methods and systems for passive training for automatic speech recognition. An example method includes utilizing a first, speaker-independent model to detect a spoken keyword or a key phrase in spoken utterances. While utilizing the first model, a second model is passively trained to detect the spoken keyword or the key phrase in the spoken utterances using at least partially the spoken utterances. The second, speaker dependent model may utilize deep neural network (DNN) or convolutional neural network (CNN) techniques. In response to completion of the training, a switch is made from utilizing the first model to utilizing the second model to detect the spoken keyword or the key phrase in spoken utterances. While utilizing the second model, parameters associated therewith are updated using the spoken utterances in response to detecting the keyword or the key phrase in the spoken utterances. User authentication functionality may be provided.
-
Citations
16 Claims
-
1. A method for passive training for automatic speech recognition, the method comprising:
-
maintaining both a first model and a second model that are configured to detect a spoken keyword or a key phrase in spoken utterances, wherein the spoken utterances comprise words and/or phrases spoken by a user; utilizing only the first model, the first model being a speaker-independent model, to detect the spoken keyword or the key phrase in spoken utterances until the second model has been trained; passively training the second model, using the spoken utterances comprising words and/or phrases spoken by the user, to detect the spoken keyword or the key phrase in the spoken utterances; and in response to completion of the passive training, wherein completion requires performing the passive training with a plurality of the spoken utterances satisfying a predetermined set of criteria, switching from utilizing only the first, speaker-independent model to utilizing only the second, passively trained model to detect the spoken keyword or the key phrase in the spoken utterances, wherein passively training the second model includes selecting at least one utterance from the spoken utterances comprising words and/or phrases spoken by the user according to at least one predetermined criterion, and wherein the at least one pre-determined criterion includes one or more of a determination that a signal-to-noise ratio level of the selected utterance is below a pre-determined level and a determination that a duration of the selected utterance is below a pre-determined time period. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for passive training for automatic speech recognition, comprising:
-
at least one processor; and a memory communicatively coupled to the at least one processor, the memory storing instructions which when executed by the at least one processor perform a method comprising; maintaining both a first model and a second model that are configured to detect a spoken keyword or a key phrase in spoken utterances, wherein the spoken utterances comprise words and/or phrases spoken by a user; utilizing only the first model, the first model being a speaker-independent model, to detect the spoken keyword or the key phrase in spoken utterances until the second model has been trained; passively training the second model, using the spoken utterances comprising words and/or phrases spoken by the user, to detect the spoken keyword or the key phrase in the spoken utterances; and in response to completion of the passive training, wherein completion requires performing the passive training with a plurality of the spoken utterances satisfying a predetermined set of criteria, switching from utilizing only the first, speaker-independent model to utilizing only the second, passively trained model to detect the spoken keyword or the key phrase in the spoken utterances, wherein passively training the second model includes selecting at least one utterance from the spoken utterances comprising words and/or phrases spoken by the user according to at least one predetermined criterion, and wherein the at least one pre-determined criterion includes one or more of a determination that a signal-to-noise ratio level of the selected utterance is below a pre-determined level and a determination that a duration of the selected utterance is below a pre-determined time period. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A non-transitory processor-readable medium having embodied thereon a program being executable by at least one processor to perform a method for passive training for automatic speech recognition, the method comprising:
-
maintaining both a first model and a second model that are configured to detect a spoken keyword or a key phrase in spoken utterances, wherein the spoken utterances comprise words and/or phrases spoken by a user; utilizing only the first model, the first model being a speaker-independent model, to detect the spoken keyword or the key phrase in spoken utterances until the second model has been trained; passively training the second model, using the spoken utterances comprising words and/or phrases spoken by the user, to detect the spoken keyword or the key phrase in the spoken utterances; and in response to completion of the passive training, wherein completion requires performing the passive training with a plurality of the spoken utterances satisfying a predetermined set of criteria, switching from utilizing only the first, speaker-independent model to utilizing only the second, passively trained model to detect the spoken keyword or the key phrase in the spoken utterances, wherein passively training the second model includes selecting at least one utterance from the spoken utterances comprising words and/or phrases spoken by the user according to at least one predetermined criterion, and wherein the at least one pre-determined criterion includes one or more of a determination that a signal-to-noise ratio level of the selected utterance is below a pre-determined level and a determination that a duration of the selected utterance is below a pre-determined time period.
-
Specification