Speech Recognition Based on Context and Multiple Recognition Engines
First Claim
1. A method of speech recognition, comprising steps of:
- receiving audio input to a device;
sending said audio input via a network node to a plurality of speech recognition engines;
receiving back via said network node a plurality of text versions of said audio input, wherein each of said plurality of speech recognition engines provided one of said plurality of text versions of said audio input;
determining characteristics of at least some data exhibited on a display of said device at a time of said audio input;
selecting a text version of said plurality of text versions which most closely corresponds with options available for operation of said device, based on said characteristics determined for said at least some data exhibited on said display;
operating said device based on said selected text version.
1 Assignment
0 Petitions
Accused Products
Abstract
Using many speech recognition engines, one can select which one is best at any given iteration of sending a command to a device to be interpreted and carried out. Depending on the context, a different result of many results received from speech recognition engines is chosen. The context is determined based on window history, including rendered webpages represented by URLs previously displayed on the device or windows resulting from executed code on the computing device. In this manner, the operation of the computer is improved as a more accurate result of receiving audio and processing it to text many times is used.
4 Citations
20 Claims
-
1. A method of speech recognition, comprising steps of:
-
receiving audio input to a device; sending said audio input via a network node to a plurality of speech recognition engines; receiving back via said network node a plurality of text versions of said audio input, wherein each of said plurality of speech recognition engines provided one of said plurality of text versions of said audio input; determining characteristics of at least some data exhibited on a display of said device at a time of said audio input; selecting a text version of said plurality of text versions which most closely corresponds with options available for operation of said device, based on said characteristics determined for said at least some data exhibited on said display; operating said device based on said selected text version. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 19, 20)
-
-
11. A device carrying out an operation based on speech recognition, comprising:
-
an audio input device; a network interface comprising; a network transmitter over which a version of audio received by said audio input device is sent to a plurality of speech recognition engines; a network received receiving back via said network node a plurality of text versions of said audio input, wherein each of said plurality of speech recognition engines provided one of said plurality of text versions of said audio; a display exhibiting output of code executed on said device, said output comprising distinct characteristics with options displayed for interacting with the device; a processor selecting a text version of said plurality of text versions which most closely corresponds with options available for operation of said device, based on said options displayed on said device; operating said device based on said selected text version. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
Specification