Speech endpointing based on word comparisons
First Claim
1. A computer-implemented method comprising:
- receiving, by a computing device, audio that includes an utterance spoken by a userdetermining, by the computing device, that an energy level of the audio that includes the utterance is above a threshold energy level;
while the computing device receives the audio that includes the utterance and while the energy level of the audio that includes the utterance remains above the threshold energy level, determining, by the computing device, to delay designating an endpoint of the utterance spoken by the user until the energy level of the audio that includes the utterance is below the threshold energy level;
while the computing device receives the audio that includes the utterance and while the energy level of the audio that includes the utterance remains above the threshold energy level, obtaining, by the computing device, a transcription of the utterance spoken by the user; and
while the computing device receives the audio that includes the utterance, while the energy level of the audio that includes the utterance remains above the threshold energy level, and based on the transcription of the utterance, overriding, by the computing device, the determination to delay designating an endpoint of the utterance spoken by the user and designating an endpoint of the utterance spoken by the user.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing based on word comparisons are described. In one aspect, a method includes the actions of obtaining a transcription of an utterance. The actions further include determining, as a first value, a quantity of text samples in a collection of text samples that (i) include terms that match the transcription, and (ii) do not include any additional terms. The actions further include determining, as a second value, a quantity of text samples in the collection of text samples that (i) include terms that match the transcription, and (ii) include one or more additional terms. The actions further include classifying the utterance as a likely incomplete utterance or not a likely incomplete utterance based at least on comparing the first value and the second value.
-
Citations
20 Claims
-
1. A computer-implemented method comprising:
-
receiving, by a computing device, audio that includes an utterance spoken by a user determining, by the computing device, that an energy level of the audio that includes the utterance is above a threshold energy level; while the computing device receives the audio that includes the utterance and while the energy level of the audio that includes the utterance remains above the threshold energy level, determining, by the computing device, to delay designating an endpoint of the utterance spoken by the user until the energy level of the audio that includes the utterance is below the threshold energy level; while the computing device receives the audio that includes the utterance and while the energy level of the audio that includes the utterance remains above the threshold energy level, obtaining, by the computing device, a transcription of the utterance spoken by the user; and while the computing device receives the audio that includes the utterance, while the energy level of the audio that includes the utterance remains above the threshold energy level, and based on the transcription of the utterance, overriding, by the computing device, the determination to delay designating an endpoint of the utterance spoken by the user and designating an endpoint of the utterance spoken by the user. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
one or more computers; and one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving, by a computing device, audio that includes an utterance spoken by a user determining, by the computing device, that an energy level of the audio that includes the utterance is above a threshold energy level; while the computing device receives the audio that includes the utterance and while the energy level of the audio that includes the utterance remains above the threshold energy level, determining, by the computing device, to delay designating an endpoint of the utterance spoken by the user until the energy level of the audio that includes the utterance is below the threshold energy level; while the computing device receives the audio that includes the utterance and while the energy level of the audio that includes the utterance remains above the threshold energy level, obtaining, by the computing device, a transcription of the utterance spoken by the user; and while the computing device receives the audio that includes the utterance, while the energy level of the audio that includes the utterance remains above the threshold energy level, and based on the transcription of the utterance, overriding, by the computing device, the determination to delay designating an endpoint of the utterance spoken by the user and designating an endpoint of the utterance spoken by the user. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
receiving, by a computing device, audio that includes an utterance spoken by a user determining, by the computing device, that an energy level of the audio that includes the utterance is above a threshold energy level; while the computing device receives the audio that includes the utterance and while the energy level of the audio that includes the utterance remains above the threshold energy level, determining, by the computing device, to delay designating an endpoint of the utterance spoken by the user until the energy level of the audio that includes the utterance is below the threshold energy level; while the computing device receives the audio that includes the utterance and while the energy level of the audio that includes the utterance remains above the threshold energy level, obtaining, by the computing device, a transcription of the utterance spoken by the user; and while the computing device receives the audio that includes the utterance, while the energy level of the audio that includes the utterance remains above the threshold energy level, and based on the transcription of the utterance, overriding, by the computing device, the determination to delay designating an endpoint of the utterance spoken by the user and designating an endpoint of the utterance spoken by the user. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification