Methods and apparatus for voice-enabling a web application

US 9,781,262 B2
Filed: 08/02/2012
Issued: 10/03/2017
Est. Priority Date: 08/02/2012
Status: Active Grant

First Claim

Patent Images

1. A method of determining a collective set of supported voice interactions for a plurality of frames of a web page displayed in a window of a web browser, wherein each of the plurality of frames includes content for a different web application, wherein the content for each of the plurality of frames is displayed simultaneously in the window of the web browser, wherein the plurality of frames includes a first frame and a second frame, wherein the first frame displays content for a first web application rendered by the web browser and the second frame displays content for a second web application rendered by the web browser, wherein the first web application is different from the second web application, the method comprising:

identifying a first data structure that includes information identifying a plurality of contexts of the first web application and supported voice interactions for the first web application in each of the plurality of contexts of the first web application;

determining a first current context of the first web application, wherein determining the first current context comprises analyzing whether a particular marker is present in the content displayed in the first frame;

determining based, at least in part, on the first current context of the first web application and the information included in the first data structure, a first set of supported voice interactions available for the first frame;

identifying a second data structure that includes information identifying a plurality of contexts of the second web application and supported voice interactions for the second web application in each of the plurality of contexts of the second web application;

determining based, at least in part, on a second current context of the second web application and the information included in the second data structure, a second set of supported voice interactions available for the second frame;

determining the collective set of supported voice interactions based on the first set of supported voice interactions and the second set of voice interactions; and

instructing an external speech engine to recognize voice input corresponding to the collective set of voice interactions.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and apparatus for voice-enabling a web application, wherein the web application includes one or more web pages rendered by a web browser on a computer. At least one information source external to the web application is queried to determine whether information describing a set of one or more supported voice interactions for the web application is available, and in response to determining that the information is available, the information is retrieved from the at least one information source. Voice input for the web application is then enabled based on the retrieved information.

Citations

20 Claims

1. A method of determining a collective set of supported voice interactions for a plurality of frames of a web page displayed in a window of a web browser, wherein each of the plurality of frames includes content for a different web application, wherein the content for each of the plurality of frames is displayed simultaneously in the window of the web browser, wherein the plurality of frames includes a first frame and a second frame, wherein the first frame displays content for a first web application rendered by the web browser and the second frame displays content for a second web application rendered by the web browser, wherein the first web application is different from the second web application, the method comprising:
- identifying a first data structure that includes information identifying a plurality of contexts of the first web application and supported voice interactions for the first web application in each of the plurality of contexts of the first web application;
  
  determining a first current context of the first web application, wherein determining the first current context comprises analyzing whether a particular marker is present in the content displayed in the first frame;
  
  determining based, at least in part, on the first current context of the first web application and the information included in the first data structure, a first set of supported voice interactions available for the first frame;
  
  identifying a second data structure that includes information identifying a plurality of contexts of the second web application and supported voice interactions for the second web application in each of the plurality of contexts of the second web application;
  
  determining based, at least in part, on a second current context of the second web application and the information included in the second data structure, a second set of supported voice interactions available for the second frame;
  
  determining the collective set of supported voice interactions based on the first set of supported voice interactions and the second set of voice interactions; and
  
  instructing an external speech engine to recognize voice input corresponding to the collective set of voice interactions.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, further comprising:
    - associating a first agent with the first frame and a second agent with the second frame, wherein the first agent is configured to monitor first events associated with the first frame and the second agent is configured to monitor second events associated with the second frame; and
      
      wherein the first current context of the first web application is determined based, at least in part, on the first events and the second current context of the second web application is determined based, at least in part on the second events.
  - 3. The method of claim 1, wherein the collective set of supported voice interactions comprises a union of the supported voice interactions in the first set and the second set.
  - 4. The method of claim 1, wherein the collective set of supported voice interactions includes less than all of the supported voice interactions in the first set and the second set.
  - 5. The method of claim 1, wherein determining the first set of supported voice interactions comprises determining an identity of the first web application.
  - 6. The method of claim 5, wherein determining the identity of the first web application is performed based, at least in part, on an identifier for a web page of the first web application.
  - 7. The method of claim 1, wherein determining the first set of supported voice interactions available for the first frame comprises monitoring for browser events in the first frame.

8. A non-transitory computer-readable storage medium encoded with a plurality of instructions that, when executed by a computer, perform a method of determining a collective set of supported voice interactions for a plurality of frames of a web page displayed in a window of a web browser, wherein each of the plurality of frames includes content for a different web application, wherein the content for each of the plurality of frames is displayed simultaneously in the window of the web browser, wherein the plurality of frames includes a first frame and a second frame, wherein the first frame displays content for a first web application rendered by the web browser and the second frame displays content for a second web application rendered by the web browser, wherein the first web application is different from the second web application, the method comprising:
- identifying a first data structure that includes information identifying a plurality of contexts of the first web application and supported voice interactions for the first web application in each of the plurality of contexts of the first web application;
  
  determining a first current context of the first web application, wherein determining the first current context comprises analyzing whether a particular marker is present in the content displayed in the first frame;
  
  determining based, at least in part, on the first current context of the first web application and the information included in the first data structure, a first set of supported voice interactions available for the first frame;
  
  identifying a second data structure that includes information identifying a plurality of contexts of the second web application and supported voice interactions for the second web application in each of the plurality of contexts of the second web application;
  
  determining based, at least in part, on a second current context of the second web application and the information included in the second data structure, a second set of supported voice interactions available for the second frame;
  
  determining the collective set of supported voice interactions based on the first set of supported voice interactions and the second set of voice interactions; and
  
  instructing an external speech engine to recognize voice input corresponding to the collective set of voice interactions.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The computer-readable storage medium of claim 8, wherein the method further comprises:
    - associating a first agent with the first frame and a second agent with the second frame, wherein the first agent is configured to monitor first events associated with the first frame and the second agent is configured to monitor second events associated with the second frame; and
      
      wherein the first current context of the first web application is determined based, at least in part, on the first events and the second current context of the second web application is determined based, at least in part on the second events.
  - 10. The computer-readable storage medium of claim 8, wherein the collective set of supported voice interactions comprises a union of the supported voice interactions in the first set and the second set.
  - 11. The computer-readable storage medium of claim 8, wherein the collective set of supported voice interactions includes less than all of the supported voice interactions in the first set and the second set.
  - 12. The computer-readable storage medium of claim 8, wherein determining the first set of supported voice interactions comprises determining an identity of the first web application.
  - 13. The computer-readable storage medium of claim 12, wherein determining the identity of the first web application is performed based, at least in part, on an identifier for a web page of the first web application.
  - 14. The computer-readable storage medium of claim 8, wherein determining the first set of supported voice interactions available for the first frame comprises monitoring for browser events in the first frame.

15. A computer for determining a collective set of supported voice interactions for a plurality of frames of a web page displayed in a window of a web browser, wherein each of the plurality of frames includes content for a different web application, wherein the content for each of the plurality of frames is displayed simultaneously in the window of the web browser, wherein the plurality of frames includes a first frame and a second frame, wherein the first frame displays content for a first web application rendered by the web browser and the second frame displays content for a second web application rendered by the web browser, wherein the first web application is different from the second web application, the computer comprising:
- at least one processor programmed to;
  
  identify a first data structure that includes information identifying a plurality of contexts of the first web application and supported voice interactions for the first web application in each of the plurality of contexts of the first web application;
  
  determine a first current context of the first web application, wherein determining the first current context comprises analyzing whether a particular marker is present in the content displayed in the first frame;
  
  determine based, at least in part, on the first current context of the first web application and the information included in the first data structure, a first set of supported voice interactions available for the first frame;
  
  identify a second data structure that includes information identifying a plurality of contexts of the second web application and supported voice interactions for the second web application in each of the plurality of contexts of the second web application;
  
  determine based, at least in part, on a second current context of the second web application and the information included in the second data structure, a second set of supported voice interactions available for the second frame;
  
  determine the collective set of supported voice interactions based on the first set of supported voice interactions and the second set of voice interactions; and
  
  instruct an external speech engine to recognize voice input corresponding to the collective set of voice interactions.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The computer of claim 15, wherein the at least one processor is further programmed to:
    - associate a first agent with the first frame and a second agent with the second frame, wherein the first agent is configured to monitor first events associated with the first frame and the second agent is configured to monitor second events associated with the second frame; and
      
      wherein the first current context of the first web application is determined based, at least in part, on the first events and the second current context of the second web application is determined based, at least in part on the second events.
  - 17. The computer of claim 15, wherein the collective set of supported voice interactions comprises a union of the supported voice interactions in the first set and the second set.
  - 18. The computer of claim 15, wherein the collective set of supported voice interactions includes less than all of the supported voice interactions in the first set and the second set.
  - 19. The computer of claim 15, wherein determining the first set of supported voice interactions comprises determining an identity of the first web application.
  - 20. The computer of claim 19, wherein determining the identity of the first web application is performed based, at least in part, on an identifier for a web page of the first web application.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Hardy, Christopher, Reich, David E.
Primary Examiner(s)
THOMAS-HOMESCU, ANNE L

Application Number

US13/565,216
Publication Number

US 20140039885A1
Time in Patent Office

1,888 Days
Field of Search

7042701, 704275, 704260, 704201, 704254, 704234, 704 9, 704246, 704270, 704258, 704200, 704 3, 704235, 704231, 715234, 715728, 715753, 715733, 715239, 715273, 715747, 715236
US Class Current
CPC Class Codes

G10L 2015/228   of application context

H04M 2203/251   where a voice mode or a vis...

H04M 3/4938   comprising a voice browser ...

Methods and apparatus for voice-enabling a web application

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and apparatus for voice-enabling a web application

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links