Individualized hotword detection models
First Claim
1. A computer-implemented method comprising:
- obtaining enrollment acoustic data for a user representing an utterance of a particular, predefined hotword that was spoken by the user during an enrollment process associated with a mobile device;
obtaining a set of candidate acoustic data representing utterances of the same, particular predefined hotword that were previously-spoken by other users;
after receiving the enrollment acoustic data, selecting, from the set of candidate acoustic data, a subset of the candidate acoustic data that is acoustically similar to the enrollment acoustic data;
training a neural network-based, hotword detection model to generate a neural network-based, hotword detection model that is customized for the user, wherein the training uses (1) the enrollment acoustic data for the user and (2) the selected subset of the candidate acoustic data that is acoustically similar to the enrollment acoustic data as examples of acceptable utterances of the particular, predefined hotword for the user; and
providing the neural network-based, hotword detection model that is customized for the user for use in detecting an utterance of the particular, predefined hotword that is subsequently spoken by the user.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for presenting notifications in an enterprise system. In one aspect, a method include actions of obtaining enrollment acoustic data representing an enrollment utterance spoken by a user, obtaining a set of candidate acoustic data representing utterances spoken by other users, determining, for each candidate acoustic data of the set of candidate acoustic data, a similarity score that represents a similarity between the enrollment acoustic data and the candidate acoustic data, selecting a subset of candidate acoustic data from the set of candidate acoustic data based at least on the similarity scores, generating a detection model based on the subset of candidate acoustic data, and providing the detection model for use in detecting an utterance spoken by the user.
32 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
obtaining enrollment acoustic data for a user representing an utterance of a particular, predefined hotword that was spoken by the user during an enrollment process associated with a mobile device; obtaining a set of candidate acoustic data representing utterances of the same, particular predefined hotword that were previously-spoken by other users; after receiving the enrollment acoustic data, selecting, from the set of candidate acoustic data, a subset of the candidate acoustic data that is acoustically similar to the enrollment acoustic data; training a neural network-based, hotword detection model to generate a neural network-based, hotword detection model that is customized for the user, wherein the training uses (1) the enrollment acoustic data for the user and (2) the selected subset of the candidate acoustic data that is acoustically similar to the enrollment acoustic data as examples of acceptable utterances of the particular, predefined hotword for the user; and providing the neural network-based, hotword detection model that is customized for the user for use in detecting an utterance of the particular, predefined hotword that is subsequently spoken by the user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system comprising:
-
one or more computers; and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; obtaining enrollment acoustic data for a user representing an utterance of a particular, predefined hotword that was spoken by the user during an enrollment process associated with a mobile device; obtaining a set of candidate acoustic data representing utterances of the same, particular predefined hotword that were previously-spoken by other users; after receiving the enrollment acoustic data, selecting, from the set of candidate acoustic data, a subset of the candidate acoustic data that is acoustically similar to the enrollment acoustic data; training a neural network-based, hotword detection model to generate a neural network-based, hotword detection model that is customized for the user, wherein the training uses (1) the enrollment acoustic data for the user and (2) the selected subset of the candidate acoustic data that is acoustically similar to the enrollment acoustic data as examples of acceptable utterances of the particular, predefined hotword for the user; and providing the neural network-based, hotword detection model that is customized for the user for use in detecting an utterance of the particular, predefined hotword that is subsequently spoken by the user. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. One or more non-transitory computer-readable media storing instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
obtaining enrollment acoustic data for a user representing an utterance of a particular, predefined hotword that was spoken by the user during an enrollment process associated with a mobile device; obtaining a set of candidate acoustic data representing utterances of the same, particular predefined hotword that were previously-spoken by other users; after receiving the enrollment acoustic data, selecting, from the set of candidate acoustic data, a subset of the candidate acoustic data that is acoustically similar to the enrollment acoustic data; training a neural network-based, hotword detection model to generate a neural network-based, hotword detection model that is customized for the user, wherein the training uses (1) the enrollment acoustic data for the user and (2) the selected subset of the candidate acoustic data that is acoustically similar to the enrollment acoustic data as examples of acceptable utterances of the particular, predefined hotword for the user; and providing the neural network-based, hotword detection model that is customized for the user for use in detecting an utterance of the particular, predefined hotword that is subsequently spoken by the user. - View Dependent Claims (19, 20)
-
Specification