Enhanced speech endpointing
First Claim
1. A computer implemented method, comprising:
- storing, by a computing device that is configured to set an end of speech condition after a user has stopped speaking for a period of time, (i) a context identifier in association with one or more expected speech recognition results for a first context and (ii) an additional context identifier in association with one or more additional expected speech recognition results for a second context;
after storing the context identifier in association with the one or more expected speech recognition results for the first context and the additional context identifier in association with the one or more additional expected speech recognition results for the second context, receiving, by the computing device, audio data corresponding to an utterance spoken by the user of the client device;
while receiving the audio data corresponding to the utterance spoken, receiving, by the computing device, the context identifier that indicates a context associated with (i) the client device or (ii) the user of the client device;
accessing, by the computing device and from among the one or more expected speech recognition results and the one or more additional expected speech recognition results, the one or more expected speech recognition results based on the one or more expected speech recognition results being stored in association with the context identifier;
before an automated speech recognizer provides a final speech recognition result for the audio data for output, comparing, by the computing device, an intermediate speech recognition result generated for the audio data by the automated speech recognizer to each of the one or more expected speech recognition results with the context identifier;
based at least on comparing the intermediate speech recognition result generated for the audio data by the automated speech recognizer to the one or more expected speech recognition results associated with the context identifier, determining, by the computing device, that the intermediate speech recognition result generated for the audio data by the automated speech recognizer matches one of the one or more expected speech recognition results associated with the context identifier; and
based on determining that the intermediate speech recognition result generated for the audio data by the automated speech recognizer matches the one of the one or more expected speech recognition results associated with the context identifier and before the period of time has elapsed after the user stopped speaking,setting the end of speech condition and providing, for output by the computing device the intermediate speech recognition result that matches the one of the one or more expected speech recognition results as the final speech recognition result based on the audio data.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving audio data including an utterance, obtaining context data that indicates one or more expected speech recognition results, determining an expected speech recognition result based on the context data, receiving an intermediate speech recognition result generated by a speech recognition engine, comparing the intermediate speech recognition result to the expected speech recognition result for the audio data based on the context data, determining whether the intermediate speech recognition result corresponds to the expected speech recognition result for the audio data based on the context data, and setting an end of speech condition and providing a final speech recognition result in response to determining the intermediate speech recognition result matches the expected speech recognition result, the final speech recognition result including the one or more expected speech recognition results indicated by the context data.
119 Citations
11 Claims
-
1. A computer implemented method, comprising:
-
storing, by a computing device that is configured to set an end of speech condition after a user has stopped speaking for a period of time, (i) a context identifier in association with one or more expected speech recognition results for a first context and (ii) an additional context identifier in association with one or more additional expected speech recognition results for a second context; after storing the context identifier in association with the one or more expected speech recognition results for the first context and the additional context identifier in association with the one or more additional expected speech recognition results for the second context, receiving, by the computing device, audio data corresponding to an utterance spoken by the user of the client device; while receiving the audio data corresponding to the utterance spoken, receiving, by the computing device, the context identifier that indicates a context associated with (i) the client device or (ii) the user of the client device; accessing, by the computing device and from among the one or more expected speech recognition results and the one or more additional expected speech recognition results, the one or more expected speech recognition results based on the one or more expected speech recognition results being stored in association with the context identifier; before an automated speech recognizer provides a final speech recognition result for the audio data for output, comparing, by the computing device, an intermediate speech recognition result generated for the audio data by the automated speech recognizer to each of the one or more expected speech recognition results with the context identifier; based at least on comparing the intermediate speech recognition result generated for the audio data by the automated speech recognizer to the one or more expected speech recognition results associated with the context identifier, determining, by the computing device, that the intermediate speech recognition result generated for the audio data by the automated speech recognizer matches one of the one or more expected speech recognition results associated with the context identifier; and based on determining that the intermediate speech recognition result generated for the audio data by the automated speech recognizer matches the one of the one or more expected speech recognition results associated with the context identifier and before the period of time has elapsed after the user stopped speaking, setting the end of speech condition and providing, for output by the computing device the intermediate speech recognition result that matches the one of the one or more expected speech recognition results as the final speech recognition result based on the audio data. - View Dependent Claims (2, 3, 9, 10, 11)
-
-
4. A system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising:
-
storing, by a computing device that is configured to set an end of speech condition after a user has stopped speaking for a period of time, (i) a context identifier in association with one or more expected speech recognition results for a first context and (ii) an additional context identifier in association with one or more additional expected speech recognition results for a second context; after storing the context identifier in association with the one or more expected speech recognition results for the first context and the additional context identifier in association with the one or more additional expected speech recognition results for the second context, receiving, by the computing device, audio data corresponding to an utterance spoken by the user of the client device; while receiving the audio data corresponding to the utterance spoken, receiving, by the computing device, the context identifier that indicates a context associated with (i) the client device or (ii) the user of the client device; accessing, by the computing device and from among the one or more expected speech recognition results and the one or more additional expected speech recognition results, the one or more expected speech recognition results based on the one or more expected speech recognition results being stored in association with the context identifier; before an automated speech recognizer provides a final speech recognition result for the audio data for output, comparing, by the computing device, an intermediate speech recognition result generated for the audio data by the automated speech recognizer to each of the one or more expected speech recognition results with the context identifier; based at least on comparing the intermediate speech recognition result generated for the audio data by the automated speech recognizer to the one or more expected speech recognition results associated with the context identifier, determining, by the computing device, that the intermediate speech recognition result generated for the audio data by the automated speech recognizer matches one of the one or more expected speech recognition results associated with the context identifier; and based on determining that the intermediate speech recognition result generated for the audio data by the automated speech recognizer matches the one of the one or more expected speech recognition results associated with the context identifier and before the period of time has elapsed after the user stopped speaking, setting the end of speech condition and providing, for output by the computing device, the intermediate speech recognition result that matches the one of the one or more expected speech recognition results as the final speech recognition result based on the audio data. - View Dependent Claims (5, 6)
-
-
7. A computer-readable storage device storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
storing, by a computing device that is configured to set an end of speech condition after a user has stopped speaking for a period of time, (i) a context identifier in association with one or more expected speech recognition results for a first context and (ii) an additional context identifier in association with one or more additional expected speech recognition results for a second context, after storing the context identifier in association with the one or more expected speech recognition results for the first context and the additional context identifier in association with the one or more additional expected speech recognition results for the second context, receiving, by the computing device, audio data corresponding to an utterance spoken by the user of the client device; while receiving the audio data corresponding to the utterance spoken, receiving, by the computing device, the context identifier that indicates a context associated with (i) the client device or (ii) the user of the client device; accessing, by the computing device and from among the one or more expected speech recognition results and the one or more additional expected speech recognition results, the one or more expected speech recognition results based on the one or more expected speech recognition results being stored in association with the context identifier; before an automated speech recognizer provides a final speech recognition result for the audio data for output, comparing, by the computing device, an intermediate speech recognition result generated for the audio data by the automated speech recognizer to each of the one or more expected speech recognition results with the context identifier; based at least on comparing the intermediate speech recognition result generated for the audio data by the automated speech recognizer to the one or more expected speech recognition results associated with the context identifier, determining, by the computing device, that the intermediate speech recognition result generated for the audio data by the automated speech recognizer matches one of the one or more expected speech recognition results associated with the context identifier; and based on determining that the intermediate speech recognition result generated for the audio data by the automated speech recognizer match matches the one of the one or more expected speech recognition results associated with the context identifier and before the period of time has elapsed after the user stopped speaking, setting the end of speech condition and providing, for output by the computing device, the intermediate speech recognition result that matches the one of the one or more expected speech recognition results as the final speech recognition result based on the audio data. - View Dependent Claims (8)
-
Specification