INDIVIDUALIZED HOTWORD DETECTION MODELS
First Claim
1. A computer implement method comprising:
- obtaining enrollment acoustic data representing an utterance of a particular, predefined hotword that was spoken by a user during an enrollment process associated with a mobile device;
obtaining a set of candidate acoustic data representing utterances that were previously-spoken by other users, wherein the utterances are of the same, particular, predefined hotword that was spoken by the user during the enrollment process associated with the mobile device;
determining, for each candidate acoustic data of the set of candidate acoustic data, a similarity score that represents a similarity between the enrollment acoustic data and the candidate acoustic data, wherein the similarity score is associated with the candidate acoustic data;
determining, for each candidate acoustic data of the set of candidate acoustic data, whether the similarity score associated with the candidate acoustic data satisfies a threshold similarity score;
selecting a subset of candidate acoustic data from the set of candidate acoustic data, in response to determining that the similarity score associated with the candidate acoustic data satisfies the threshold similarity score; and
generating a neural network-based, hotword detection model based using the enrollment acoustic data, and the selected subset of candidate acoustic data; and
providing the neural network-based, hotword detection model for use in detecting an utterance of the particular, predefined hotword that is subsequently spoken by the user.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for presenting notifications in an enterprise system. In one aspect, a method include actions of obtaining enrollment acoustic data representing an enrollment utterance spoken by a user, obtaining a set of candidate acoustic data representing utterances spoken by other users, determining, for each candidate acoustic data of the set of candidate acoustic data, a similarity score that represents a similarity between the enrollment acoustic data and the candidate acoustic data, selecting a subset of candidate acoustic data from the set of candidate acoustic data based at least on the similarity scores, generating a detection model based on the subset of candidate acoustic data, and providing the detection model for use in detecting an utterance spoken by the user.
-
Citations
20 Claims
-
1. A computer implement method comprising:
-
obtaining enrollment acoustic data representing an utterance of a particular, predefined hotword that was spoken by a user during an enrollment process associated with a mobile device; obtaining a set of candidate acoustic data representing utterances that were previously-spoken by other users, wherein the utterances are of the same, particular, predefined hotword that was spoken by the user during the enrollment process associated with the mobile device; determining, for each candidate acoustic data of the set of candidate acoustic data, a similarity score that represents a similarity between the enrollment acoustic data and the candidate acoustic data, wherein the similarity score is associated with the candidate acoustic data; determining, for each candidate acoustic data of the set of candidate acoustic data, whether the similarity score associated with the candidate acoustic data satisfies a threshold similarity score; selecting a subset of candidate acoustic data from the set of candidate acoustic data, in response to determining that the similarity score associated with the candidate acoustic data satisfies the threshold similarity score; and generating a neural network-based, hotword detection model based using the enrollment acoustic data, and the selected subset of candidate acoustic data; and providing the neural network-based, hotword detection model for use in detecting an utterance of the particular, predefined hotword that is subsequently spoken by the user. - View Dependent Claims (2, 3, 4, 5, 6, 8)
-
-
7. (canceled)
-
9. A system comprising:
-
one or more computers; and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; obtaining enrollment acoustic data representing an utterance of a particular, predefined hotword that was spoken by a user during an enrollment process associated with a mobile device; obtaining a set of candidate acoustic data representing utterances that were previously-spoken by other users, wherein the utterances are of the same, particular, predefined hotword that was spoken by the user during the enrollment process associated with the mobile device; determining, for each candidate acoustic data of the set of candidate acoustic data a similarity score that represents a similarity between the enrollment acoustic data and the candidate acoustic data, wherein the similarity score is associated with the candidate acoustic data; determining, for each candidate acoustic data of the set of candidate acoustic data, whether the similarity score associated with the candidate acoustic data satisfies a threshold similarity score; selecting a subset of candidate acoustic data from the set of candidate acoustic data in response to determining that the similarity score associated with the candidate acoustic data satisfies the threshold similarity score; and generating a neural network-based, hotword detection model using the enrollment acoustic data and the selected subset of candidate acoustic data; and providing the neural network-based detection model for use in detecting an utterance of the particular, predefined hotword that is subsequently spoken by the user. - View Dependent Claims (10, 11, 12, 13, 14, 16)
-
-
15. (canceled)
-
17. A non-transitory computer-readable medium storing instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
obtaining enrollment acoustic data representing an utterance of a particular, predefined hotword that was spoken by a user during an enrollment process associated with a mobile device; obtaining a set of candidate acoustic data representing utterances that were previously-spoken by other users, wherein the utterances are of the same, particular, predefined hotword that was spoken by the user during the enrollment process associated with the mobile device; determining, for each candidate acoustic data of the set of candidate acoustic data, a similarity score that represents a similarity between the enrollment acoustic data and the candidate acoustic data, wherein the similarity score is associated with the candidate acoustic data; selecting a subset of candidate acoustic data from the set of candidate acoustic data in response to determining that the similarity score associated with the candidate acoustic data satisfies the threshold similarity score; and generating a neural network-based, hotword detection model using the enrollment acoustic data and the selected subset of candidate acoustic data; and providing the neural network-based, hotword detection model for use in detecting an utterance of the particular, predefined hotword that is subsequently spoken by the user. - View Dependent Claims (18, 19, 20)
-
Specification