System and method for performing dual mode speech recognition
First Claim
1. A method for performing dual mode speech recognition, comprising:
- receiving a spoken query from a user;
processing the spoken query, including;
sending the spoken query to a local recognition system on a mobile device;
transmitting the spoken query to a remote recognition system via a communications link; and
setting a latency timer period to a preset timeout value;
in the event that the spoken query is not recognized by either the local recognition system or the remote recognition system within the latency timer period, choosing recognition failure as a final result;
in the event that the spoken query is recognized by both the local recognition system and the remote recognition system within the latency timer period, obtaining a recognition result and associated recognition score from both the local recognition system and the remote recognition system and choosing the recognition result associated with the higher recognition score as the final result;
in the event that the spoken query is recognized by only the local recognition within the latency timer period, obtaining a recognition result from the local recognition system, and choosing the local recognition result as the final result;
in the event that the spoken query is recognized by only the remote recognition system within the latency timer period, obtaining a recognition result from the remote recognition system, and choosing the remote recognition result as the final result;
taking action on behalf of the user based on the final result.
9 Assignments
0 Petitions
Accused Products
Abstract
A system and method is presented for performing dual mode speech recognition, employing a local recognition module on a mobile device and a remote recognition engine on a server device. The system accepts a spoken query from a user, and both the local recognition module and the remote recognition engine perform speech recognition operations on the query, returning a transcription and confidence score, subject to a latency cutoff time. If both sources successfully transcribe the query, then the system accepts the result having the higher confidence score. If only one source succeeds, then that result is accepted. In either case, if the remote recognition engine does succeed in transcribing the query, then a client vocabulary is updated if the remote system result includes information not present in the client vocabulary.
18 Citations
17 Claims
-
1. A method for performing dual mode speech recognition, comprising:
-
receiving a spoken query from a user; processing the spoken query, including; sending the spoken query to a local recognition system on a mobile device; transmitting the spoken query to a remote recognition system via a communications link; and setting a latency timer period to a preset timeout value; in the event that the spoken query is not recognized by either the local recognition system or the remote recognition system within the latency timer period, choosing recognition failure as a final result; in the event that the spoken query is recognized by both the local recognition system and the remote recognition system within the latency timer period, obtaining a recognition result and associated recognition score from both the local recognition system and the remote recognition system and choosing the recognition result associated with the higher recognition score as the final result; in the event that the spoken query is recognized by only the local recognition within the latency timer period, obtaining a recognition result from the local recognition system, and choosing the local recognition result as the final result; in the event that the spoken query is recognized by only the remote recognition system within the latency timer period, obtaining a recognition result from the remote recognition system, and choosing the remote recognition result as the final result; taking action on behalf of the user based on the final result. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for dual mode speech recognition, comprising:
a local recognition system housed in a mobile device, including; a communication module programmed to communicate with a user and other devices and for receiving a spoken query; a recognition module programmed to recognize and transcribe audio content; a control module; and a client vocabulary programmed to describe words or phrases available to the recognition module; a remote recognition system housed in a server, including; a recognition engine programmed to recognize and transcribe audio content; a vocabulary download module programmed to provide updates to the vocabulary update module; a latency timer; wherein the control module of the local recognition system is programmed to; set a latency timer period to a preset timeout value; and in the event that the spoken query is recognized by both the local recognition system and the remote recognition system within the latency timer period, obtain a recognition result and associated recognition score from both the local recognition system and the remote recognition system, and choosing the final result as the recognition result associated with the higher recognition score. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
16. A system for dual mode speech recognition, comprising:
-
a latency timer; a local recognition system housed in a mobile device, including; a communication module programmed to communicate with a user and other devices and to receive a spoken query; a recognition module programmed to recognize and transcribe audio content; a control module; a client vocabulary programmed to describe words or phrases available to the recognition module; and a remote recognition system housed in a server, including; a recognition engine programmed to recognize and transcribe audio content; a vocabulary download module programmed to provide updates to the vocabulary update module; wherein the control module of the local recognition system is programmed to; set a latency timer period to a predefined value; in the event that the spoken query is recognized by both the local recognition system and the remote recognition system within the latency timer period, obtain a recognition result and associated recognition score from both the local recognition system and the remote recognition system, and choosing the final result as the recognition result associated with the higher recognition score; in the event that the spoken query is recognized by only the local recognition within the latency timer period, obtaining a recognition result and associated score from the local recognition system; and
choosing the local recognition result as the final result; andin the event that the spoken query is recognized by only the remote recognition system within the latency timer period, obtaining a recognition result and associated score from the remote recognition system; and
choosing the remote recognition result as the final result.
-
-
17. A method for performing dual mode speech recognition, comprising:
-
receiving a spoken query from a user; processing the spoken query, including; sending the spoken query to a local recognition system on a mobile device; transmitting the spoken query to a remote recognition system via a communications link; and setting a latency timer period to a preset timeout value; in the event that the spoken query is recognized by both the local recognition system and the remote recognition system within the latency timer period, obtaining a recognition result and associated recognition score from both the local recognition system and the remote recognition system, and choosing the final result as the recognition result associated with the higher recognition score; and in the event that the spoken query is recognized by the remote recognition system within the latency timer period, upon determining that the remote recognition result contains vocabulary information not contained within a client vocabulary maintained within the local recognition system, requesting that the remote recognition system update the client vocabulary.
-
Specification