Speech recognition based on context and multiple recognition engines

US 10,360,914 B2
Filed: 01/26/2017
Issued: 07/23/2019
Est. Priority Date: 01/26/2017
Status: Active Grant

First Claim

Patent Images

1. A method of speech recognition, comprising steps of:

receiving audio input to a device;

sending said audio input via a network node to a plurality of speech recognition engines;

receiving back via said network node a plurality of text versions of said audio input, wherein each of said plurality of speech recognition engines provided one of said plurality of text versions of said audio input;

determining characteristics of at least some data exhibited on a display of said device at a time of said audio input;

selecting a text version of said plurality of text versions which most closely corresponds with options available for operation of said device, based on said characteristics determined for said at least some data exhibited on said display;

operating said device based on said selected text version;

determining characteristics of at least some data exhibited on said display of said device before said time of said audio input;

wherein said selecting a text version is based on said at least some data exhibited on said display both at said time of said audio input and before said time of said audio input, wherein said before said time of said audio input includes a recorded and ordered list of window names opened on said display;

wherein said text version, which most closely corresponds with operations available for operation of said device, is based on actions previously taken after said ordered list of window names was opened as well as said output of said plurality of speech recognition engines;

wherein said ordered list of window names and resulting said operating of said device is based in part on said plurality of text versions received, and in part on crowd-sourced operations of other devices after said ordered list of window names was produced based on actions carried out by other users of said other devices; and

wherein said device receives said crowd-sourced operations of other devices and carries out an operation of said device based on prior operations of said device, based on prior operations of said device, based on said ordered list of window names produced based on prior actions of said device, when such an operation conflicts with said crowd-sourced operations.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Using many speech recognition engines, one can select which one is best at any given iteration of sending a command to a device to be interpreted and carried out. Depending on the context, a different result of many results received from speech recognition engines is chosen. The context is determined based on window history, including rendered webpages represented by URLs previously displayed on the device or windows resulting from executed code on the computing device. In this manner, the operation of the computer is improved as a more accurate result of receiving audio and processing it to text many times is used.

31 Citations

12 Claims

1. A method of speech recognition, comprising steps of:
- receiving audio input to a device;
  
  sending said audio input via a network node to a plurality of speech recognition engines;
  
  receiving back via said network node a plurality of text versions of said audio input, wherein each of said plurality of speech recognition engines provided one of said plurality of text versions of said audio input;
  
  determining characteristics of at least some data exhibited on a display of said device at a time of said audio input;
  
  selecting a text version of said plurality of text versions which most closely corresponds with options available for operation of said device, based on said characteristics determined for said at least some data exhibited on said display;
  
  operating said device based on said selected text version;
  
  determining characteristics of at least some data exhibited on said display of said device before said time of said audio input;
  
  wherein said selecting a text version is based on said at least some data exhibited on said display both at said time of said audio input and before said time of said audio input, wherein said before said time of said audio input includes a recorded and ordered list of window names opened on said display;
  
  wherein said text version, which most closely corresponds with operations available for operation of said device, is based on actions previously taken after said ordered list of window names was opened as well as said output of said plurality of speech recognition engines;
  
  wherein said ordered list of window names and resulting said operating of said device is based in part on said plurality of text versions received, and in part on crowd-sourced operations of other devices after said ordered list of window names was produced based on actions carried out by other users of said other devices; and
  
  wherein said device receives said crowd-sourced operations of other devices and carries out an operation of said device based on prior operations of said device, based on prior operations of said device, based on said ordered list of window names produced based on prior actions of said device, when such an operation conflicts with said crowd-sourced operations.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of speech recognition of claim 1, wherein said characteristic is at least one of a title of a foreground window, one element within a window, or a uniform resource locator for a website displayed in a foreground window.
  - 3. The method of speech recognition of claim 1, wherein said actions were carried out on said device and said ordered list of window names was produced based on actions of said device.
  - 4. The method of speech recognition of claim 1, wherein a statistical probability of a next said operating of said device is based on:
    - a) said characteristics of said at least some data exhibited on said display at a time of said audio input, said characteristics comprised at least of a window name or uniform resource locator; and
      
      b) prior window names and uniform resource locators displayed on said device;
      
      and said text version selected in said step of selecting is one which matches one of a top five statistically most probable operation represented by said audio input.
  - 5. The method of speech recognition of claim 4, wherein said step of selecting is selection of an operation which matches a single most probable operation when output of at least one said speech recognition engine matches said single most probable operation.

6. A device carrying out an operation based on speech recognition, comprising:
- an audio input device;
  
  a network interface comprising;
  
  a network transmitter over which a version of audio received by said audio input device is sent to a plurality of speech recognition engines;
  
  a network received receiving back via said network node a plurality of text versions of said audio input, wherein each of said plurality of speech recognition engines provided one of said plurality of text versions of said audio;
  
  a display exhibiting output of code executed on said device, said output comprising distinct characteristics with options displayed for interacting with the device;
  
  a processor selecting a text version of said plurality of text versions which most closely corresponds with options available for operation of said device, based on said options displayed on said device;
  
  operating said device based on said selected text version;
  
  wherein said text version which most closely corresponds with operations available for operation of said device is based on actions previously taken after said ordered list of window names was opened as well as said output of said plurality of speech recognition engines;
  
  wherein said ordered list of window names and resulting said operating of said device is based in part on said plurality of text versions received and in part on crowd-sourced operations of other devices after said ordered list of window names was produced based on actions carried out by other users of said other devices; and
  
  wherein said device receives said crowd-sourced operations of other devices and carries out an operation of said device based on prior operations of said device based on said ordered list of window names produced based on prior actions of said device when such an operation conflicts with said crowd-sourced operations.
- View Dependent Claims (7, 8, 9, 10, 11, 12)
- - 7. The device of claim 6, wherein said characteristics are of at least some data exhibited on said display of said device before said time of said audio being received;
    - andwherein said selecting a text version is on both said at least some data exhibited on said display at said time of said audio being inputted and before said time of said audio input.
  - 8. The device of claim 7, wherein a characteristic of said characteristics used to determine said options available for operation of said device is at least one of a title of a foreground window or a uniform resource locator for a website displayed in a foreground window.
  - 9. The device of claim 7, wherein said before said time of said audio input includes a recorded and ordered list of window names opened on said display and stored on said device.
  - 10. The device of claim 6, wherein said actions were carried out on said device and said ordered list of window names was produced based on actions of said device.
  - 11. The device of claim 6, wherein a statistical probability of a next said operating of said device is based on:
    - a) said characteristics of said at least some data exhibited on said display at a time of said audio input, said characteristics comprising at least of a window name or uniform resource locator; and
      
      b) prior window names and uniform resource locators displayed on said device;
      
      and said text version selected in said step of selecting is one which matches one of a top five statistically most probable operation represented by said audio input.
  - 12. The device of claim 11, wherein said step of selecting is selection of an operation which matches a single most probable operation when output of at least one said speech recognition engine matches said single most probable operation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Essence, Inc.
Original Assignee
Essence, Inc.
Inventors
Corcoran, Holly R, Klein, Barry, Morake, Llewellyn Q
Primary Examiner(s)
Shah, Paras D
Assistant Examiner(s)
Ogunbiyi, Oluwadamilola M

Application Number

US15/416,398
Publication Number

US 20180211669A1
Time in Patent Office

908 Days
Field of Search

704235
US Class Current
CPC Class Codes

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

G10L 2015/228   of application context

Speech recognition based on context and multiple recognition engines

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

31 Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition based on context and multiple recognition engines

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

31 Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links