Speech recognition based on context and multiple recognition engines
First Claim
Patent Images
1. A method of speech recognition, comprising steps of:
- receiving audio input to a device;
sending said audio input via a network node to a plurality of speech recognition engines;
receiving back via said network node a plurality of text versions of said audio input, wherein each of said plurality of speech recognition engines provided one of said plurality of text versions of said audio input;
determining characteristics of at least some data exhibited on a display of said device at a time of said audio input;
selecting a text version of said plurality of text versions which most closely corresponds with options available for operation of said device, based on said characteristics determined for said at least some data exhibited on said display;
operating said device based on said selected text version;
determining characteristics of at least some data exhibited on said display of said device before said time of said audio input;
wherein said selecting a text version is based on said at least some data exhibited on said display both at said time of said audio input and before said time of said audio input, wherein said before said time of said audio input includes a recorded and ordered list of window names opened on said display;
wherein said text version, which most closely corresponds with operations available for operation of said device, is based on actions previously taken after said ordered list of window names was opened as well as said output of said plurality of speech recognition engines;
wherein said ordered list of window names and resulting said operating of said device is based in part on said plurality of text versions received, and in part on crowd-sourced operations of other devices after said ordered list of window names was produced based on actions carried out by other users of said other devices; and
wherein said device receives said crowd-sourced operations of other devices and carries out an operation of said device based on prior operations of said device, based on prior operations of said device, based on said ordered list of window names produced based on prior actions of said device, when such an operation conflicts with said crowd-sourced operations.
1 Assignment
0 Petitions
Accused Products
Abstract
Using many speech recognition engines, one can select which one is best at any given iteration of sending a command to a device to be interpreted and carried out. Depending on the context, a different result of many results received from speech recognition engines is chosen. The context is determined based on window history, including rendered webpages represented by URLs previously displayed on the device or windows resulting from executed code on the computing device. In this manner, the operation of the computer is improved as a more accurate result of receiving audio and processing it to text many times is used.
31 Citations
12 Claims
-
1. A method of speech recognition, comprising steps of:
-
receiving audio input to a device; sending said audio input via a network node to a plurality of speech recognition engines; receiving back via said network node a plurality of text versions of said audio input, wherein each of said plurality of speech recognition engines provided one of said plurality of text versions of said audio input; determining characteristics of at least some data exhibited on a display of said device at a time of said audio input; selecting a text version of said plurality of text versions which most closely corresponds with options available for operation of said device, based on said characteristics determined for said at least some data exhibited on said display; operating said device based on said selected text version; determining characteristics of at least some data exhibited on said display of said device before said time of said audio input; wherein said selecting a text version is based on said at least some data exhibited on said display both at said time of said audio input and before said time of said audio input, wherein said before said time of said audio input includes a recorded and ordered list of window names opened on said display; wherein said text version, which most closely corresponds with operations available for operation of said device, is based on actions previously taken after said ordered list of window names was opened as well as said output of said plurality of speech recognition engines; wherein said ordered list of window names and resulting said operating of said device is based in part on said plurality of text versions received, and in part on crowd-sourced operations of other devices after said ordered list of window names was produced based on actions carried out by other users of said other devices; and wherein said device receives said crowd-sourced operations of other devices and carries out an operation of said device based on prior operations of said device, based on prior operations of said device, based on said ordered list of window names produced based on prior actions of said device, when such an operation conflicts with said crowd-sourced operations. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A device carrying out an operation based on speech recognition, comprising:
-
an audio input device; a network interface comprising; a network transmitter over which a version of audio received by said audio input device is sent to a plurality of speech recognition engines; a network received receiving back via said network node a plurality of text versions of said audio input, wherein each of said plurality of speech recognition engines provided one of said plurality of text versions of said audio; a display exhibiting output of code executed on said device, said output comprising distinct characteristics with options displayed for interacting with the device; a processor selecting a text version of said plurality of text versions which most closely corresponds with options available for operation of said device, based on said options displayed on said device; operating said device based on said selected text version; wherein said text version which most closely corresponds with operations available for operation of said device is based on actions previously taken after said ordered list of window names was opened as well as said output of said plurality of speech recognition engines; wherein said ordered list of window names and resulting said operating of said device is based in part on said plurality of text versions received and in part on crowd-sourced operations of other devices after said ordered list of window names was produced based on actions carried out by other users of said other devices; and wherein said device receives said crowd-sourced operations of other devices and carries out an operation of said device based on prior operations of said device based on said ordered list of window names produced based on prior actions of said device when such an operation conflicts with said crowd-sourced operations. - View Dependent Claims (7, 8, 9, 10, 11, 12)
-
Specification