Method and system for controlling home assistant devices
First Claim
1. A method of controlling a home assistant device, comprising:
- at a computing system having one or more processors and memory;
receiving an audio input;
performing speaker recognition on the audio input;
in accordance with a determination from performing speaker recognition that the audio input includes a voice input from a first user that is authorized to control the home assistant device;
performing, using speech recognition, speech-to-text conversion on the audio input to obtain a textual string;
searching for a predefined trigger word for activating the home assistant device in the textual string;
selecting, from a plurality of task domains of the home assistant device, one or more first task domains that the first user is authorized to control, to perform intent deduction on the textual string; and
forgoing using one or more second task domains among the plurality of task domains that the first user is not authorized to control to process the textual string; and
in accordance with a determination from performing speaker recognition that the audio input includes a voice input from the home assistant device;
forgoing performance of speech-to-text conversion on the audio input; and
forgoing search for the predefined trigger word, so that the home assistant device avoids being triggered by the home assistant device'"'"'s own speech or a speech output of a neighboring home assistant device,wherein the speaker recognition uses less resources than the speech recognition.
1 Assignment
0 Petitions
Accused Products
Abstract
System and method for controlling a home assistant device include: receiving an audio input; performing speaker recognition on the audio input; in accordance with a determination that the audio input includes a voice input from a first user that is authorized to control the home assistant device: performing speech-to-text conversion on the audio input to obtain a textual string; and searching for a predefined trigger word for activating the home assistant device in the textual string; and in accordance with a determination that the audio input includes a voice input from the home assistant device: forgoing performance of speech-to-text conversion on the audio input; and forgoing search for the predefined trigger word.
17 Citations
17 Claims
-
1. A method of controlling a home assistant device, comprising:
at a computing system having one or more processors and memory; receiving an audio input; performing speaker recognition on the audio input; in accordance with a determination from performing speaker recognition that the audio input includes a voice input from a first user that is authorized to control the home assistant device; performing, using speech recognition, speech-to-text conversion on the audio input to obtain a textual string; searching for a predefined trigger word for activating the home assistant device in the textual string; selecting, from a plurality of task domains of the home assistant device, one or more first task domains that the first user is authorized to control, to perform intent deduction on the textual string; and forgoing using one or more second task domains among the plurality of task domains that the first user is not authorized to control to process the textual string; and in accordance with a determination from performing speaker recognition that the audio input includes a voice input from the home assistant device; forgoing performance of speech-to-text conversion on the audio input; and forgoing search for the predefined trigger word, so that the home assistant device avoids being triggered by the home assistant device'"'"'s own speech or a speech output of a neighboring home assistant device, wherein the speaker recognition uses less resources than the speech recognition. - View Dependent Claims (2, 3, 4, 5, 6)
-
7. A system for controlling a home assistant device, comprising:
-
one or more processors; and memory storing instructions, the instructions, when executed by the processors, cause the processors to perform operations comprising; receiving an audio input; performing speaker recognition on the audio input; in accordance with a determination from performing speaker recognition that the audio input includes a voice input from a first user that is authorized to control the home assistant device; performing, using speech recognition, speech-to-text conversion on the audio input to obtain a textual string; searching for a predefined trigger word for activating the home assistant device in the textual string; selecting, from a plurality of task domains of the home assistant device, one or more first task domains that the first user is authorized to control, to perform intent deduction on the textual string; and forgoing using one or more second task domains among the plurality of task domains that the first user is not authorized to control to process the textual string; and in accordance with a determination from performing speaker recognition that the audio input includes a voice input from the home assistant device; forgoing performance of speech-to-text conversion on the audio input; and forgoing search for the predefined trigger word, so that the home assistant device avoids being triggered by the home assistant device'"'"'s own speech or a speech output of a neighboring home assistant device, wherein the speaker recognition uses less resources than the speech recognition. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable storage medium storing instructions, the instructions, when executed by one or more processors, cause the processors to perform operations comprising:
-
receiving an audio input; performing speaker recognition on the audio input; in accordance with a determination from performing speaker recognition that the audio input includes a voice input from a first user that is authorized to control a home assistant device; performing, using speech recognition, speech-to-text conversion on the audio input to obtain a textual string; searching for a predefined trigger word for activating the home assistant device in the textual string; selecting, from a plurality of task domains of the home assistant device, one or more first task domains that the first user is authorized to control, to perform intent deduction on the textual string; and forgoing using one or more second task domains among the plurality of task domains that the first user is not authorized to control to process the textual string; and in accordance with a determination from performing speaker recognition that the audio input includes a voice input from the home assistant device; forgoing performance of speech-to-text conversion on the audio input; and forgoing search for the predefined trigger word, so that the home assistant device avoids being triggered by the home assistant device'"'"'s own speech or a speech output of a neighboring home assistant device, wherein the speaker recognition uses less resources than the speech recognition. - View Dependent Claims (14, 15, 16, 17)
-
Specification