Passive training for automatic speech recognition

US 9,953,634 B1
Filed: 12/17/2014
Issued: 04/24/2018
Est. Priority Date: 12/17/2013
Status: Active Grant

First Claim

Patent Images

1. A method for passive training for automatic speech recognition, the method comprising:

maintaining both a first model and a second model that are configured to detect a spoken keyword or a key phrase in spoken utterances, wherein the spoken utterances comprise words and/or phrases spoken by a user;

utilizing only the first model, the first model being a speaker-independent model, to detect the spoken keyword or the key phrase in spoken utterances until the second model has been trained;

passively training the second model, using the spoken utterances comprising words and/or phrases spoken by the user, to detect the spoken keyword or the key phrase in the spoken utterances; and

in response to completion of the passive training, wherein completion requires performing the passive training with a plurality of the spoken utterances satisfying a predetermined set of criteria, switching from utilizing only the first, speaker-independent model to utilizing only the second, passively trained model to detect the spoken keyword or the key phrase in the spoken utterances,wherein passively training the second model includes selecting at least one utterance from the spoken utterances comprising words and/or phrases spoken by the user according to at least one predetermined criterion, and wherein the at least one pre-determined criterion includes one or more of a determination that a signal-to-noise ratio level of the selected utterance is below a pre-determined level and a determination that a duration of the selected utterance is below a pre-determined time period.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Provided are methods and systems for passive training for automatic speech recognition. An example method includes utilizing a first, speaker-independent model to detect a spoken keyword or a key phrase in spoken utterances. While utilizing the first model, a second model is passively trained to detect the spoken keyword or the key phrase in the spoken utterances using at least partially the spoken utterances. The second, speaker dependent model may utilize deep neural network (DNN) or convolutional neural network (CNN) techniques. In response to completion of the training, a switch is made from utilizing the first model to utilizing the second model to detect the spoken keyword or the key phrase in spoken utterances. While utilizing the second model, parameters associated therewith are updated using the spoken utterances in response to detecting the keyword or the key phrase in the spoken utterances. User authentication functionality may be provided.

Citations

16 Claims

1. A method for passive training for automatic speech recognition, the method comprising:
- maintaining both a first model and a second model that are configured to detect a spoken keyword or a key phrase in spoken utterances, wherein the spoken utterances comprise words and/or phrases spoken by a user;
  
  utilizing only the first model, the first model being a speaker-independent model, to detect the spoken keyword or the key phrase in spoken utterances until the second model has been trained;
  
  passively training the second model, using the spoken utterances comprising words and/or phrases spoken by the user, to detect the spoken keyword or the key phrase in the spoken utterances; and
  
  in response to completion of the passive training, wherein completion requires performing the passive training with a plurality of the spoken utterances satisfying a predetermined set of criteria, switching from utilizing only the first, speaker-independent model to utilizing only the second, passively trained model to detect the spoken keyword or the key phrase in the spoken utterances,wherein passively training the second model includes selecting at least one utterance from the spoken utterances comprising words and/or phrases spoken by the user according to at least one predetermined criterion, and wherein the at least one pre-determined criterion includes one or more of a determination that a signal-to-noise ratio level of the selected utterance is below a pre-determined level and a determination that a duration of the selected utterance is below a pre-determined time period.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the second model is a speaker-dependent model for the user.
  - 3. The method of claim 1, wherein the second model is operable to provide at least one additional functionality different from detecting the spoken keyword or the key phrase.
  - 4. The method of claim 3, wherein the at least one additional functionality includes an authentication of the user.
  - 5. The method of claim 1, wherein the second model includes one or more of the following:
    - a deep neural network (DNN) and a convolutional neural network (CNN).
  - 6. The method of claim 1, wherein a threshold of detecting the keyword or the key phrase by the second model is narrower than a threshold of detecting the keyword or the key phrase by the first model, such that the second model provides substantially more sensitive keyword detection compared to the first model.
  - 7. The method of claim 1, wherein the passive training of the second model completes upon collecting a pre-determined number of the selected at least one utterance.
  - 8. The method of claim 1, further comprising updating parameters associated with the second model using the spoken utterances in response to detecting the keyword or the key phrase in the spoken utterances.

9. A system for passive training for automatic speech recognition, comprising:
- at least one processor; and
  
  a memory communicatively coupled to the at least one processor, the memory storing instructions which when executed by the at least one processor perform a method comprising;
  
  maintaining both a first model and a second model that are configured to detect a spoken keyword or a key phrase in spoken utterances, wherein the spoken utterances comprise words and/or phrases spoken by a user;
  
  utilizing only the first model, the first model being a speaker-independent model, to detect the spoken keyword or the key phrase in spoken utterances until the second model has been trained;
  
  passively training the second model, using the spoken utterances comprising words and/or phrases spoken by the user, to detect the spoken keyword or the key phrase in the spoken utterances; and
  
  in response to completion of the passive training, wherein completion requires performing the passive training with a plurality of the spoken utterances satisfying a predetermined set of criteria, switching from utilizing only the first, speaker-independent model to utilizing only the second, passively trained model to detect the spoken keyword or the key phrase in the spoken utterances,wherein passively training the second model includes selecting at least one utterance from the spoken utterances comprising words and/or phrases spoken by the user according to at least one predetermined criterion, and wherein the at least one pre-determined criterion includes one or more of a determination that a signal-to-noise ratio level of the selected utterance is below a pre-determined level and a determination that a duration of the selected utterance is below a pre-determined time period.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The system of claim 9, wherein the second model is a speaker-dependent model for the user.
  - 11. The system of claim 9, wherein the second model includes one or more of the following:
    - a deep neural network (DNN) and a convolutional neural network (CNN).
  - 12. The system of claim 9, wherein the second model is operable to provide at least one additional functionality different from detecting the spoken keyword or the key phrase.
  - 13. The system of claim 12, wherein the at least one additional functionality includes an authentication of the user.
  - 14. The system of claim 9, wherein a threshold of detecting the keyword or the key phrase by the second model is narrower than a threshold of detecting the keyword or the key phrase by the first model.
  - 15. The system of claim 9, further comprising updating parameters associated with the second model using the spoken utterances in response to detecting the keyword or the key phrase in the spoken utterances.

16. A non-transitory processor-readable medium having embodied thereon a program being executable by at least one processor to perform a method for passive training for automatic speech recognition, the method comprising:
- maintaining both a first model and a second model that are configured to detect a spoken keyword or a key phrase in spoken utterances, wherein the spoken utterances comprise words and/or phrases spoken by a user;
  
  utilizing only the first model, the first model being a speaker-independent model, to detect the spoken keyword or the key phrase in spoken utterances until the second model has been trained;
  
  passively training the second model, using the spoken utterances comprising words and/or phrases spoken by the user, to detect the spoken keyword or the key phrase in the spoken utterances; and
  
  in response to completion of the passive training, wherein completion requires performing the passive training with a plurality of the spoken utterances satisfying a predetermined set of criteria, switching from utilizing only the first, speaker-independent model to utilizing only the second, passively trained model to detect the spoken keyword or the key phrase in the spoken utterances,wherein passively training the second model includes selecting at least one utterance from the spoken utterances comprising words and/or phrases spoken by the user according to at least one predetermined criterion, and wherein the at least one pre-determined criterion includes one or more of a determination that a signal-to-noise ratio level of the selected utterance is below a pre-determined level and a determination that a duration of the selected utterance is below a pre-determined time period.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Knowles Electronics Llc (Knowles Corporation)
Inventors
Pearce, David, Clark, Brian
Primary Examiner(s)
Yang, Qian

Application Number

US14/573,846
Time in Patent Office

1,224 Days
Field of Search
US Class Current
CPC Class Codes

G10L 15/063 Training

Passive training for automatic speech recognition

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Passive training for automatic speech recognition

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links