Method and apparatus for evaluating trigger phrase enrollment
First Claim
Patent Images
1. A computer-implemented method comprising:
- during a trigger phrase enrollment process;
receiving, at a speech recognition-enabled electronic device, a first audio signal corresponding to a user of the speech recognition-enabled electronic device speaking a trigger phrase, the first audio signal comprising a first number of frames having a measure of noise variability of background noise exceeding a noise variability threshold;
when a count of the first number of frames in the first audio signal satisfies a frame number threshold, prompting, by the speech recognition-enabled electronic device, the user to speak the trigger phrase again;
receiving, by the speech recognition-enabled electronic device, a second audio signal corresponding to the user speaking the trigger phrase again, the second audio signal comprising a second number of frames having the measure of noise variability of background noise exceeding the noise variability threshold; and
when a count of the second number of frames in the second audio signal dissatisfies the frame number threshold, training, by the speech recognition-enabled electronic device, a trigger phrase model with the second audio signal corresponding to the user speaking the trigger phrase again; and
after the trigger phrase enrollment process;
receiving, at the speech recognition-enabled electronic device and while the speech recognition-enabled electronic device is in a sleep mode, a third audio signal including an utterance of the trigger phrase spoken by the user; and
detecting, by the speech recognition-enabled electronic device and using the trigger phrase model trained during the trigger phrase enrollment process, the utterance of the trigger phrase in the third audio signal, the trigger phrase when detected in the third audio signal causing the speech recognition-enabled electronic device to wake from the sleep mode, the sleep mode comprising a power-saving mode of operation in which one or more parts of the speech recognition-enabled electronic device are in a low-power state or powered off.
3 Assignments
0 Petitions
Accused Products
Abstract
An electronic device includes a microphone that receives an audio signal that includes a spoken trigger phrase, and a processor that is electrically coupled to the microphone. The processor measures characteristics of the audio signal, and determines, based on the measured characteristics, whether the spoken trigger phrase is acceptable for trigger phrase model training. If the spoken trigger phrase is determined not to be acceptable for trigger phrase model training, the processor rejects the trigger phrase for trigger phrase model training.
132 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
during a trigger phrase enrollment process; receiving, at a speech recognition-enabled electronic device, a first audio signal corresponding to a user of the speech recognition-enabled electronic device speaking a trigger phrase, the first audio signal comprising a first number of frames having a measure of noise variability of background noise exceeding a noise variability threshold; when a count of the first number of frames in the first audio signal satisfies a frame number threshold, prompting, by the speech recognition-enabled electronic device, the user to speak the trigger phrase again; receiving, by the speech recognition-enabled electronic device, a second audio signal corresponding to the user speaking the trigger phrase again, the second audio signal comprising a second number of frames having the measure of noise variability of background noise exceeding the noise variability threshold; and when a count of the second number of frames in the second audio signal dissatisfies the frame number threshold, training, by the speech recognition-enabled electronic device, a trigger phrase model with the second audio signal corresponding to the user speaking the trigger phrase again; and after the trigger phrase enrollment process; receiving, at the speech recognition-enabled electronic device and while the speech recognition-enabled electronic device is in a sleep mode, a third audio signal including an utterance of the trigger phrase spoken by the user; and detecting, by the speech recognition-enabled electronic device and using the trigger phrase model trained during the trigger phrase enrollment process, the utterance of the trigger phrase in the third audio signal, the trigger phrase when detected in the third audio signal causing the speech recognition-enabled electronic device to wake from the sleep mode, the sleep mode comprising a power-saving mode of operation in which one or more parts of the speech recognition-enabled electronic device are in a low-power state or powered off. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; during a trigger phrase enrollment process; receiving a first audio signal corresponding to a user of a speech recognition-enable electronic device speaking a trigger phrase into the speech recognition-enabled electronic device, the first audio signal comprising a first number of frames having a measure of noise variability of background noise exceeding a noise variability threshold; when a count of the first number of frames in the first audio signal satisfies a frame number threshold, prompting the user to speak the trigger phrase into the speech recognition- enabled electronic device again; receiving a second audio signal corresponding to the user speaking the trigger phrase again, the second audio signal comprising a second number of frames having the measure of noise variability of background noise exceeding the noise variability threshold; and when a count of the second number of frames in the second audio signal dissatisfies the frame number threshold, training a trigger phrase model with the second audio signal corresponding to the user speaking the trigger phrase again; and after the trigger phrase enrollment process; receiving, while the speech recognition-enabled electronic device is in a sleep mode, a third audio signal including an utterance of the trigger phrase spoken by the user; and detecting, using the trigger phrase model trained during the trigger phrase enrollment process, the utterance of the trigger phrase in the third audio signal, the trigger phrase when detected in the third audio signal causing the speech recognition-enabled electronic device to wake from the sleep mode, the sleep mode comprising a power-saving mode of operation in which one or more parts of the speech recognition-enabled electronic device are in a low-power state or powered off. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
during a trigger phrase enrollment process; receiving a first audio signal corresponding to a user of a speech recognition-enabled electronic device speaking a trigger phrase into the speech recognition-enabled electronic device, the first audio signal comprising a first number of frames having a measure of noise variability of background noise exceeding a noise variability threshold; when a count of the first number of frames in the first audio signal satisfies a frame number threshold, prompting the user to speak the trigger phrase into the speech recognition-enabled electronic device again; receiving a second audio signal corresponding to the user speaking the trigger phrase again, the second audio signal comprising a second number of frames having the measure of noise variability of background noise exceeding the noise variability threshold; and when a count of the second number of frames in the second audio signal dissatisfies the frame number threshold, training a trigger phrase model with the second audio signal corresponding to the user speaking the trigger phrase again; and after the trigger phrase enrollment process; receiving, while the speech recognition-enabled electronic device is in a sleep mode, a third audio signal including an utterance of the trigger phrase spoken by the user; and detecting, using the trigger phrase model trained during the trigger phrase enrollment process, the utterance of the trigger phrase in the third audio signal, the trigger phrase when detected in the third audio signal causing the speech recognition-enabled electronic device to wake from the sleep mode, the sleep mode comprising a power-saving mode of operation in which one or more parts of the speech recognition-enabled electronic device are in a low-power state or powered off. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification