Speaker identification and unsupervised speaker adaptation techniques
First Claim
1. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions for operating a virtual assistant, which when executed by one or more processors of an electronic device, cause the device to:
- receive current user speech for activating the virtual assistant, wherein the current user speech is associated with current contextual data;
select, based on the current contextual data, a first set of stored voiceprints from a plurality of sets of stored voiceprints in a speaker profile of the device, wherein the first set of stored voiceprints is annotated to indicate first contextual data;
determine whether a current voiceprint derived from the current user speech matches the first set of stored voiceprints within a predetermined threshold; and
in accordance with a determination that the current voiceprint matches the first set of stored voiceprints within the predetermined threshold;
add the current voiceprint to the first set of stored voiceprints in the speaker profile;
annotate the current voiceprint to indicate the first contextual data; and
activate the virtual assistant to process a spoken command received subsequent to the user speech.
0 Assignments
0 Petitions
Accused Products
Abstract
Systems and processes for generating a speaker profile for use in performing speaker identification for a virtual assistant are provided. One example process can include receiving an audio input including user speech and determining whether a speaker of the user speech is a predetermined user based on a speaker profile for the predetermined user. In response to determining that the speaker of the user speech is the predetermined user, the user speech can be added to the speaker profile and operation of the virtual assistant can be triggered. In response to determining that the speaker of the user speech is not the predetermined user, the user speech can be added to an alternate speaker profile and operation of the virtual assistant may not be triggered. In some examples, contextual information can be used to verify results produced by the speaker identification process.
2785 Citations
33 Claims
-
1. A non-transitory computer-readable storage medium storing one or more programs, the one or more programs comprising instructions for operating a virtual assistant, which when executed by one or more processors of an electronic device, cause the device to:
-
receive current user speech for activating the virtual assistant, wherein the current user speech is associated with current contextual data; select, based on the current contextual data, a first set of stored voiceprints from a plurality of sets of stored voiceprints in a speaker profile of the device, wherein the first set of stored voiceprints is annotated to indicate first contextual data; determine whether a current voiceprint derived from the current user speech matches the first set of stored voiceprints within a predetermined threshold; and in accordance with a determination that the current voiceprint matches the first set of stored voiceprints within the predetermined threshold; add the current voiceprint to the first set of stored voiceprints in the speaker profile; annotate the current voiceprint to indicate the first contextual data; and activate the virtual assistant to process a spoken command received subsequent to the user speech. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 31)
-
-
11. A method for operating a virtual assistant, the method comprising:
at an electronic device having a processor and memory; receiving current user speech for activating the virtual assistant, wherein the current user speech is associated with current contextual data; selecting, based on the current contextual data, a first set of stored voiceprints from a plurality of sets of stored voiceprints in a speaker profile of the device, wherein the first set of stored voiceprints is annotated to indicate first contextual data; determining whether a current voiceprint derived from the current user speech matches the first set of stored voiceprints within a predetermined threshold; and in accordance with a determination that the current voiceprint matches the first set of stored voiceprints within the predetermined threshold; adding the current voiceprint to the first set of stored voiceprints in the speaker profile; annotating the current voiceprint to indicate the first contextual data; and activating the virtual assistant to process a spoken command received subsequent to the user speech. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 32)
-
21. An electronic device, comprising:
-
one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for; receiving current user speech for activating the virtual assistant, wherein the current user speech is associated with current contextual data; selecting, based on the current contextual data, a first set of stored voiceprints from a plurality of sets of stored voiceprints in a speaker profile of the device, wherein the first set of stored voiceprints is annotated to indicate first contextual data; determining whether a current voiceprint derived from the current user speech matches the first set of stored voiceprints within a predetermined threshold; and in accordance with a determination that the current voiceprint matches the first set of stored voiceprints within the predetermined threshold; adding the current voiceprint to the first set of stored voiceprints in the speaker profile; annotating the current voiceprint to indicate the first contextual data; and activating the virtual assistant to process a spoken command received subsequent to the user speech. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 33)
-
Specification