End of query detection
First Claim
1. A computer-implemented method comprising:
- receiving audio data that corresponds to an utterance spoken by a user;
applying, to the audio data, an end of query model that (i) is configured to determine a confidence score that reflects a likelihood that the utterance is a complete utterance and (ii) was trained using audio data from complete utterances and from incomplete utterances;
based on applying the end of query model that (i) is configured to determine the confidence score that reflects the likelihood that the utterance is a complete utterance and (ii) was trained using the audio data from the complete utterances and from the incomplete utterances, determining the confidence score that reflects a likelihood that the utterance is a complete utterance;
comparing the confidence score that reflects the likelihood that the utterance is a complete utterance to a confidence score threshold;
based on comparing the confidence score that reflects the likelihood that the utterance is a complete utterance to the confidence score threshold, determining whether the utterance is likely complete or likely incomplete; and
based on determining whether the utterance is likely complete or likely incomplete, providing, for output, an instruction to (i) maintain a microphone that is receiving the utterance in an active state or (ii) deactivate the microphone that is receiving the utterance.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for detecting an end of a query are disclosed. In one aspect, a method includes the actions of receiving audio data that corresponds to an utterance spoken by a user. The actions further include applying, to the audio data, an end of query model. The actions further include determining the confidence score that reflects a likelihood that the utterance is a complete utterance. The actions further include comparing the confidence score that reflects the likelihood that the utterance is a complete utterance to a confidence score threshold. The actions further include determining whether the utterance is likely complete or likely incomplete. The actions further include providing, for output, an instruction to (i) maintain a microphone that is receiving the utterance in an active state or (ii) deactivate the microphone that is receiving the utterance.
85 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
receiving audio data that corresponds to an utterance spoken by a user; applying, to the audio data, an end of query model that (i) is configured to determine a confidence score that reflects a likelihood that the utterance is a complete utterance and (ii) was trained using audio data from complete utterances and from incomplete utterances; based on applying the end of query model that (i) is configured to determine the confidence score that reflects the likelihood that the utterance is a complete utterance and (ii) was trained using the audio data from the complete utterances and from the incomplete utterances, determining the confidence score that reflects a likelihood that the utterance is a complete utterance; comparing the confidence score that reflects the likelihood that the utterance is a complete utterance to a confidence score threshold; based on comparing the confidence score that reflects the likelihood that the utterance is a complete utterance to the confidence score threshold, determining whether the utterance is likely complete or likely incomplete; and based on determining whether the utterance is likely complete or likely incomplete, providing, for output, an instruction to (i) maintain a microphone that is receiving the utterance in an active state or (ii) deactivate the microphone that is receiving the utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
-
one or more computers; and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving audio data that corresponds to an utterance spoken by a user; applying, to the audio data, an end of query model that (i) is configured to determine a confidence score that reflects a likelihood that the utterance is a complete utterance and (ii) was trained using audio data from complete utterances and from incomplete utterances; based on applying the end of query model that (i) is configured to determine the confidence score that reflects the likelihood that the utterance is a complete utterance and (ii) was trained using the audio data from the complete utterances and from the incomplete utterances, determining the confidence score that reflects a likelihood that the utterance is a complete utterance; comparing the confidence score that reflects the likelihood that the utterance is a complete utterance to a confidence score threshold; based on comparing the confidence score that reflects the likelihood that the utterance is a complete utterance to the confidence score threshold, determining whether the utterance is likely complete or likely incomplete; and based on determining whether the utterance is likely complete or likely incomplete, providing, for output, an instruction to (i) maintain a microphone that is receiving the utterance in an active state or (ii) deactivate the microphone that is receiving the utterance. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
receiving audio data that corresponds to an utterance spoken by a user; applying, to the audio data, an end of query model that (i) is configured to determine a confidence score that reflects a likelihood that the utterance is a complete utterance and (ii) was trained using audio data from complete utterances and from incomplete utterances; based on applying the end of query model that (i) is configured to determine the confidence score that reflects the likelihood that the utterance is a complete utterance and (ii) was trained using the audio data from the complete utterances and from the incomplete utterances, determining the confidence score that reflects a likelihood that the utterance is a complete utterance; comparing the confidence score that reflects the likelihood that the utterance is a complete utterance to a confidence score threshold; based on comparing the confidence score that reflects the likelihood that the utterance is a complete utterance to the confidence score threshold, determining whether the utterance is likely complete or likely incomplete; and based on determining whether the utterance is likely complete or likely incomplete, providing, for output, an instruction to (i) maintain a microphone that is receiving the utterance in an active state or (ii) deactivate the microphone that is receiving the utterance. - View Dependent Claims (20)
-
Specification