Speaker identification and unsupervised speaker adaptation techniques
First Claim
1. A method for operating a virtual assistant, the method comprising:
- at an electronic device;
receiving, at the electronic device, an audio input comprising user speech, wherein the audio input is associated with a contextual data;
determining whether the user speech contains one or more predetermined words;
in response to determining that the user speech contains one or more predetermined words;
determining whether a speaker of the user speech is a predetermined user based at least in part on a speaker profile for the predetermined user; and
in accordance with a determination that the speaker of the user speech is the predetermined user, adding the audio input comprising user speech to the speaker profile for the predetermined user, wherein adding the audio input comprising user speech to the speaker profile includes annotating the audio input in the speaker profile with the contextual data;
receiving a second audio input comprising a second user speech;
determining whether a second contextual data associated with the second audio input matches the contextual data;
in accordance with a determination that the second contextual data associated with the second audio input matches the contextual data;
determining whether a speaker of the second user speech is the predetermined user based at least in part on the audio input added to the speaker profile; and
in accordance with a determination that the speaker of the second user speech is the predetermined user, activating the virtual assistant and processing a spoken command received subsequent to the second user speech.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and processes for generating a speaker profile for use in performing speaker identification for a virtual assistant are provided. One example process can include receiving an audio input including user speech and determining whether a speaker of the user speech is a predetermined user based on a speaker profile for the predetermined user. In response to determining that the speaker of the user speech is the predetermined user, the user speech can be added to the speaker profile and operation of the virtual assistant can be triggered. In response to determining that the speaker of the user speech is not the predetermined user, the user speech can be added to an alternate speaker profile and operation of the virtual assistant may not be triggered. In some examples, contextual information can be used to verify results produced by the speaker identification process.
3662 Citations
39 Claims
-
1. A method for operating a virtual assistant, the method comprising:
at an electronic device; receiving, at the electronic device, an audio input comprising user speech, wherein the audio input is associated with a contextual data; determining whether the user speech contains one or more predetermined words; in response to determining that the user speech contains one or more predetermined words; determining whether a speaker of the user speech is a predetermined user based at least in part on a speaker profile for the predetermined user; and in accordance with a determination that the speaker of the user speech is the predetermined user, adding the audio input comprising user speech to the speaker profile for the predetermined user, wherein adding the audio input comprising user speech to the speaker profile includes annotating the audio input in the speaker profile with the contextual data; receiving a second audio input comprising a second user speech; determining whether a second contextual data associated with the second audio input matches the contextual data; in accordance with a determination that the second contextual data associated with the second audio input matches the contextual data; determining whether a speaker of the second user speech is the predetermined user based at least in part on the audio input added to the speaker profile; and in accordance with a determination that the speaker of the second user speech is the predetermined user, activating the virtual assistant and processing a spoken command received subsequent to the second user speech. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
14. A system comprising:
-
one or more processors; memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for; receiving an audio input comprising user speech, wherein the audio input is associated with a contextual data; determining whether the user speech contains one or more predetermined words; in response to determining that the user speech contains one or more predetermined words; determining whether a speaker of the user speech is a predetermined user based at least in part on a speaker profile for the predetermined user; and in accordance with a determination that the speaker of the user speech is the predetermined user, adding the audio input comprising user speech to the speaker profile for the predetermined user, wherein adding the audio input comprising user speech to the speaker profile includes annotating the audio input in the speaker profile with the contextual data; receiving a second audio input comprising a second user speech; determining whether a second contextual data associated with the second audio input matches the contextual data; in accordance with a determination that the second contextual data associated with the second audio input matches the contextual data; determining whether a speaker of the second user speech is the predetermined user based at least in part on the audio input added to the speaker profile; and in accordance with a determination that the speaker of the second user speech is the predetermined user, activating the virtual assistant and processing a spoken command received subsequent to the second user speech. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A non-transitory computer-readable storage medium comprising instructions for:
-
receiving an audio input comprising user speech, wherein the audio input is associated with a contextual data; determining whether the user speech contains one or more predetermined words; in response to determining that the user speech contains one or more predetermined words; determining whether a speaker of the user speech is a predetermined user based at least in part on a speaker profile for the predetermined user; and in accordance with a determination that the speaker of the user speech is the predetermined user, adding the audio input comprising user speech to the speaker profile for the predetermined user, wherein adding the audio input comprising user speech to the speaker profile includes annotating the audio input in the speaker profile with the contextual data; receiving a second audio input comprising a second user speech; determining whether a second contextual data associated with the second audio input matches the contextual data; in accordance with a determination that the second contextual data associated with the second audio input matches the contextual data; determining whether a speaker of the second user speech is the predetermined user based at least in part on the audio input added to the speaker profile; and in accordance with a determination that the speaker of the second user speech is the predetermined user, activating the virtual assistant and processing a spoken command received subsequent to the second user speech. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
-
Specification