Hybridized client-server speech recognition
First Claim
Patent Images
1. A method comprising:
- receiving, at a recipient computing device, a speech utterance;
dynamically determining a confidence threshold value and an audio quality threshold value based on environmental conditions at which the recipient computing device is located, the environmental conditions comprising one or more of;
a type of environment in which the recipient computing device is located, availability of noise cancelling devices at the recipient computing device, and number of microphones used by the recipient computing device;
segmenting the speech utterance into two or more speech utterance segments, including performing an initial analysis on the speech utterance, to determine where to perform speech recognition processing for each of the two or more speech utterance segments, by applying to the speech utterance a dynamically adaptable acoustic model implemented at the recipient computing device, with the dynamically adaptable acoustic model adjusted based on locally available data at the recipient computing device, including a user location and time, to determine a confidence score and an audio quality metric for each of the two or more speech utterance segments;
assigning, based on the initial analysis performed by the adaptable acoustic model generating the determined confidence score and audio quality metric for the each of the two or more speech utterance segments, and based on the dynamically determined confidence threshold and the audio quality threshold, a first segment from the two or more speech utterance segments to a first speech recognizer implemented on a separate computing device than the recipient computing device, and a second segment from the two or more speech utterance segments to a second speech recognizer implemented on the recipient computing device;
sending the first segment from the recipient computing device to the separate computing device for processing;
receiving first segment processing results back from the separate computing device, the sending and the receiving occurring via a data network;
processing the second segment at the recipient computing device to generate second segment processing results; and
returning a completed speech recognition result assembled from the first segment processing results and the second segment processing results.
1 Assignment
0 Petitions
Accused Products
Abstract
A recipient computing device can receive a speech utterance to be processed by speech recognition and segment the speech utterance into two or more speech utterance segments, each of which can be to one of a plurality of available speech recognizers. A first one of the plurality of available speech recognizers can be implemented on a separate computing device accessible via a data network. A first segment can be processed by the first recognizer and the results of the processing returned to the recipient computing device, and a second segment can be processed by a second recognizer implemented at the recipient computing device.
-
Citations
20 Claims
-
1. A method comprising:
-
receiving, at a recipient computing device, a speech utterance; dynamically determining a confidence threshold value and an audio quality threshold value based on environmental conditions at which the recipient computing device is located, the environmental conditions comprising one or more of;
a type of environment in which the recipient computing device is located, availability of noise cancelling devices at the recipient computing device, and number of microphones used by the recipient computing device;segmenting the speech utterance into two or more speech utterance segments, including performing an initial analysis on the speech utterance, to determine where to perform speech recognition processing for each of the two or more speech utterance segments, by applying to the speech utterance a dynamically adaptable acoustic model implemented at the recipient computing device, with the dynamically adaptable acoustic model adjusted based on locally available data at the recipient computing device, including a user location and time, to determine a confidence score and an audio quality metric for each of the two or more speech utterance segments; assigning, based on the initial analysis performed by the adaptable acoustic model generating the determined confidence score and audio quality metric for the each of the two or more speech utterance segments, and based on the dynamically determined confidence threshold and the audio quality threshold, a first segment from the two or more speech utterance segments to a first speech recognizer implemented on a separate computing device than the recipient computing device, and a second segment from the two or more speech utterance segments to a second speech recognizer implemented on the recipient computing device; sending the first segment from the recipient computing device to the separate computing device for processing; receiving first segment processing results back from the separate computing device, the sending and the receiving occurring via a data network; processing the second segment at the recipient computing device to generate second segment processing results; and returning a completed speech recognition result assembled from the first segment processing results and the second segment processing results. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A recipient computing device comprising:
-
at least one programmable processor; a communication unit to communicate with remote computing devices; and a computer-readable storage medium, coupled to the at least one processor and the communication unit, storing instructions that, when executed by the at least one processor, cause the at least one programmable processor to; receive a speech utterance; dynamically determine a confidence threshold value and an audio quality threshold value based on environmental conditions at which the recipient computing device is located, the environmental conditions comprising one or more of;
a type of environment in which the recipient computing device is located, availability of noise cancelling devices at the recipient computing device, and number of microphones used by the recipient computing device;segment the speech utterance into two or more speech utterance segments, including to perform an initial analysis on the speech utterance, in order to determine where to perform speech recognition processing for each of the two or more speech utterance segments, so as to apply to the speech utterance a dynamically adaptable acoustic model implemented at the recipient computing device, with the dynamically adaptable acoustic model adjusted based on locally available data at the recipient computing device, including a user location and time, to determine a confidence score and an audio quality metric for each of the two or more speech utterance segments; assign, based on the initial analysis performed by the adaptable acoustic model generating the determined confidence score and audio quality metric for the each of the two or more speech utterance segments, and based on the dynamically determined confidence threshold and the audio quality threshold, a first segment from the two or more speech utterance segments to a first speech recognizer implemented on a separate computing device than the recipient computing device, and a second segment from the two or more speech utterance segments to a second speech recognizer implemented on the recipient computing device; send the first segment from the recipient computing device to the separate computing device for processing; receive first segment processing results back from the separate computing device, the sending and the receiving occurring via a data network; process the second segment at the recipient computing device to generate second segment processing results; and return a completed speech recognition result assembled from the first segment processing results and the second segment processing results. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A computer program product comprising a non-transitory computer-readable storage medium storing instructions that, when executed by a computing system comprising at least one programmable processor, cause the computing system to perform operations comprising:
-
receiving, at a recipient computing device, a speech utterance; dynamically determining a confidence threshold value and an audio quality threshold value based on environmental conditions at which the recipient computing device is located, the environmental conditions comprising one or more of;
a type of environment in which the recipient computing device is located, availability of noise cancelling devices at the recipient computing device, and number of microphones used by the recipient computing device;segmenting the speech utterance into two or more speech utterance segments, including performing an initial analysis on the speech utterance, to determine where to perform speech recognition processing for each of the two or more speech utterance segments, by applying to the speech utterance a dynamically adaptable acoustic model implemented at the recipient computing device, with the dynamically adaptable acoustic model adjusted based on locally available data at the recipient computing device, including a user location and time, to determine a confidence score and an audio quality metric for each of the two or more speech utterance segments; assigning, based on the initial analysis performed by the adaptable acoustic model generating the determined confidence score and audio quality metric for the each of the two or more speech utterance segments, and based on the dynamically determined confidence threshold and the audio quality threshold, a first segment from the two or more speech utterance segments to a first speech recognizer implemented on a separate computing device than the recipient computing device, and a second segment from the two or more speech utterance segments to a second speech recognizer implemented on the recipient computing device; sending the first segment from the recipient computing device to the separate computing device for processing; receiving first segment processing results back from the separate computing device, the sending and the receiving occurring via a data network; processing the second segment at the recipient computing device to generate second segment processing results; and returning a completed speech recognition result assembled from the first segment processing results and the second segment processing results. - View Dependent Claims (18, 19, 20)
-
Specification