INDIVIDUALIZED HOTWORD DETECTION MODELS
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for presenting notifications in an enterprise system. In one aspect, a method include actions of obtaining enrollment acoustic data representing an enrollment utterance spoken by a user, obtaining a set of candidate acoustic data representing utterances spoken by other users, determining, for each candidate acoustic data of the set of candidate acoustic data, a similarity score that represents a similarity between the enrollment acoustic data and the candidate acoustic data, selecting a subset of candidate acoustic data from the set of candidate acoustic data based at least on the similarity scores, generating a detection model based on the subset of candidate acoustic data, and providing the detection model for use in detecting an utterance spoken by the user.
17 Citations
21 Claims
-
1. (canceled)
-
2. A computer-implemented method comprising:
-
during an enrollment process, prompting, by a client device, a user to speak a particular hotword, and receiving, by the client device, audio data corresponding to only a single utterance of the particular hotword by the user; in response to receiving the audio data corresponding to only the single utterance of the particular hotword by the user, obtaining, by the client device, a personalized hotword detection model, wherein the personalized hotword detection model is trained to detect a likely utterance of the particular hotword by the user using the audio data corresponding to only the single utterance of the particular hotword by the user and without requiring the user to speak additional utterances of the particular hotword; and after obtaining the personalized hotword detection model that is trained to detect when the user speaks the particular hotword using the audio data corresponding to only the single utterance of the particular hotword by the user and without requiring the user to speak the additional utterances of the particular hotword, detecting, by the client device, the likely utterance of the particular hotword by the user in subsequently received audio data using the personalized hotword detection model. - View Dependent Claims (3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
-
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; during an enrollment process, prompting, by a client device, a user to speak a particular hotword, and receiving, by the client device, audio data corresponding to only a single utterance of the particular hotword by the user; in response to receiving the audio data corresponding to only the single utterance of the particular hotword by the user, obtaining, by the client device, a personalized hotword detection model, wherein the personalized hotword detection model is trained to detect a likely utterance of the particular hotword by the user using the audio data corresponding to only the single utterance of the particular hotword by the user and without requiring the user to speak additional utterances of the particular hotword; and after obtaining the personalized hotword detection model that is trained to detect when the user speaks the particular hotword using the audio data corresponding to only the single utterance of the particular hotword by the user and without requiring the user to speak the additional utterances of the particular hotword, detecting, by the client device, the likely utterance of the particular hotword by the user in subsequently received audio data using the hotword detection model. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; during an enrollment process, prompting, by a client device, a user to speak a particular hotword, and receiving, by the client device, audio data corresponding to only a single utterance of the particular hotword by the user; in response to receiving the audio data corresponding to only the single utterance of the particular hotword by the user, obtaining, by the client device, a personalized hotword detection model, wherein the personalized hotword detection model is trained to detect a likely utterance of the particular hotword by the user using the audio data corresponding to only the single utterance of the particular hotword by the user and without requiring the user to speak additional utterances of the particular hotword; and after obtaining the personalized hotword detection model that is trained to detect when the user speaks the particular hotword using the audio data corresponding to only the single utterance of the particular hotword by the user and without requiring the user to speak the additional utterances of the particular hotword, detecting, by the client device, the likely utterance of the particular hotword by the user in subsequently received audio data using the hotword detection model. - View Dependent Claims (17, 18, 19, 20, 21)
-
Specification