Speech endpointing based on word comparisons
First Claim
1. A computer-implemented method comprising:
- receiving, from a given user and by a microphone of a mobile device that includes (i) the microphone, (ii) an automated speech recognition system, and (iii) an end of utterance detector that is configured to identify an endpoint of an utterance spoken by a user in response to determining that a speaker has stopped speaking for a fixed duration, a first utterance;
determining, by the end of utterance detector, that the given user has stopped speaking for the fixed duration after the first utterance;
generating, by the automated speech recognition system, a first transcription of the first utterance;
based on the first transcription of the first utterance, maintaining the microphone in an active state without endpointing the first utterance;
after the given user has stopped speaking for at least the fixed duration after the first utterance, receiving, by the microphone and from the given user, a second utterance;
generating, by the automated speech recognition system, a second transcription of the second utterance;
based on both the first transcription and the second transcription, deactivating the microphone and endpointing the second utterance;
in response to endpointing the second utterance, submitting, by the mobile device, a single search query that includes both the first transcription and the second transcription;
receiving, by the mobile device, search results in response to the single search query that includes both the first transcription and the second transcription; and
providing, for output by the mobile device, the search results.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech endpointing based on word comparisons are described. In one aspect, a method includes the actions of obtaining a transcription of an utterance. The actions further include determining, as a first value, a quantity of text samples in a collection of text samples that (i) include terms that match the transcription, and (ii) do not include any additional terms. The actions further include determining, as a second value, a quantity of text samples in the collection of text samples that (i) include terms that match the transcription, and (ii) include one or more additional terms. The actions further include classifying the utterance as a likely incomplete utterance or not a likely incomplete utterance based at least on comparing the first value and the second value.
-
Citations
23 Claims
-
1. A computer-implemented method comprising:
-
receiving, from a given user and by a microphone of a mobile device that includes (i) the microphone, (ii) an automated speech recognition system, and (iii) an end of utterance detector that is configured to identify an endpoint of an utterance spoken by a user in response to determining that a speaker has stopped speaking for a fixed duration, a first utterance; determining, by the end of utterance detector, that the given user has stopped speaking for the fixed duration after the first utterance; generating, by the automated speech recognition system, a first transcription of the first utterance; based on the first transcription of the first utterance, maintaining the microphone in an active state without endpointing the first utterance; after the given user has stopped speaking for at least the fixed duration after the first utterance, receiving, by the microphone and from the given user, a second utterance; generating, by the automated speech recognition system, a second transcription of the second utterance; based on both the first transcription and the second transcription, deactivating the microphone and endpointing the second utterance; in response to endpointing the second utterance, submitting, by the mobile device, a single search query that includes both the first transcription and the second transcription; receiving, by the mobile device, search results in response to the single search query that includes both the first transcription and the second transcription; and providing, for output by the mobile device, the search results. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving, from a given user and by a microphone of a mobile device that includes (i) the microphone, (ii) an automated speech recognition system, and (iii) an end of utterance detector that is configured to identify an endpoint of an utterance spoken by a user in response to determining that a speaker has stopped speaking for a fixed duration, a first utterance; determining, by the end of utterance detector, that the given user has stopped speaking for the fixed duration after the first utterance; generating, by the automated speech recognition system, a first transcription of the first utterance; based on the first transcription of the first utterance, maintaining the microphone in an active state without endpointing the first utterance; after the given user has stopped speaking for at least the fixed duration after the first utterance, receiving, by the microphone and from the given user, a second utterance; generating, by the automated speech recognition system, a second transcription of the second utterance; based on both the first transcription and the second transcription, deactivating the microphone and endpointing the second utterance; in response to endpointing the second utterance, submitting, by the mobile device, a single search query that includes both the first transcription and the second transcription; receiving, by the mobile device, search results in response to the single search query that includes both the first transcription and the second transcription; and providing, for output by the mobile device, the search results. - View Dependent Claims (21, 22)
-
23. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
receiving, from a given user and by a microphone of a mobile device that includes (i) the microphone, (ii) an automated speech recognition system, and (iii) an end of utterance detector that is configured to identify an endpoint of an utterance spoken by a user in response to determining that a speaker has stopped speaking for a fixed duration, a first utterance; determining, by the end of utterance detector, that the given user has stopped speaking for the fixed duration after the first utterance; generating, by the automated speech recognition system, a first transcription of the first utterance; based on the first transcription of the first utterance, maintaining the microphone in an active state without endpointing the first utterance; after the given user has stopped speaking for at least the fixed duration after the first utterance, receiving, by the microphone and from the given user, a second utterance; generating, by the automated speech recognition system, a second transcription of the second utterance; based on both the first transcription and the second transcription, deactivating the microphone and endpointing the second utterance; in response to endpointing the second utterance, submitting, by the mobile device, a single search query that includes both the first transcription and the second transcription; receiving, by the mobile device, search results in response to the single search query that includes both the first transcription and the second transcription; and providing, for output by the mobile device, the search results.
-
Specification