Methods and apparatus for voice-enabling a web application

US 10,157,612 B2
Filed: 08/02/2012
Issued: 12/18/2018
Est. Priority Date: 08/02/2012
Status: Active Grant

First Claim

Patent Images

1. A method of enabling voice interaction for at least one capability of a web application, wherein the web application includes a plurality of web pages rendered by a web browser, the method comprising:

executing, with at least one computer processor, an agent for the web application, wherein the agent is configured to determine an identity of the web application;

determining, by the agent, whether the web application is in a first context or a second context by using Document Object Model (DOM) events in the web browser to identify at least one marker on a web page of the web application that identifies the web application as being in the first context or the second context, wherein the first context corresponds to a first state of the web application in which a first set of user interface elements is displayed on a first web page of the plurality of web pages of the web application and the second context corresponds to a second state of the web application in which a second set of user interface elements is displayed on a second web page of the plurality of web pages of the web application;

receiving first voice input;

enabling, when it is determined that the web application is in the first context, voice interaction for the at least one capability of the web application, wherein the at least one capability is not exposed by the web browser;

recognizing, by a voice application, one or more voice commands in the received first voice input when the voice interaction for the at least one capability of the web application is enabled, wherein the one or more voice commands are associated with the first context; and

performing at least one first action based, at least in part, on the one or more recognized voice commands.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and apparatus for voice-enabling a web application, wherein the web application includes one or more web pages rendered by a web browser on a computer. At least one information source external to the web application is queried to determine whether information describing a set of one or more supported voice interactions for the web application is available, and in response to determining that the information is available, the information is retrieved from the at least one information source. Voice input for the web application is then enabled based on the retrieved information.

Citations

20 Claims

1. A method of enabling voice interaction for at least one capability of a web application, wherein the web application includes a plurality of web pages rendered by a web browser, the method comprising:
- executing, with at least one computer processor, an agent for the web application, wherein the agent is configured to determine an identity of the web application;
  
  determining, by the agent, whether the web application is in a first context or a second context by using Document Object Model (DOM) events in the web browser to identify at least one marker on a web page of the web application that identifies the web application as being in the first context or the second context, wherein the first context corresponds to a first state of the web application in which a first set of user interface elements is displayed on a first web page of the plurality of web pages of the web application and the second context corresponds to a second state of the web application in which a second set of user interface elements is displayed on a second web page of the plurality of web pages of the web application;
  
  receiving first voice input;
  
  enabling, when it is determined that the web application is in the first context, voice interaction for the at least one capability of the web application, wherein the at least one capability is not exposed by the web browser;
  
  recognizing, by a voice application, one or more voice commands in the received first voice input when the voice interaction for the at least one capability of the web application is enabled, wherein the one or more voice commands are associated with the first context; and
  
  performing at least one first action based, at least in part, on the one or more recognized voice commands.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein executing the agent comprises launching the agent in the web browser in response to detecting a page load event for one of the plurality of web pages of the web application.
  - 3. The method of claim 1, further comprising:
    - determining, by the agent, the identity of the web application; and
      
      injecting a script into the web browser based, at least in part, on the determined identity of the web application.
  - 4. The method of claim 1, wherein the at least one marker comprises a user interface element included in the first set of user interface elements or the second set of user interface elements.
  - 5. The method of claim 1, further comprising:
    - determining that the web application has changed from the first context to the second context;
      
      receiving second voice input; and
      
      recognizing, in response to determining that the web application has changed from the first context to the second context, one or more voice commands in the second voice input, wherein the one or more voice commands are associated with the second context.
  - 6. The method of claim 5, wherein the one or more voice commands associated with the second context are specified in a data structure accessible to the agent.
  - 7. The method of claim 5, further comprising:
    - receiving information from the voice application that a user has input one of the one or more voice commands associated with the second context; and
      
      performing at least one second action based, at least in part, on the information received from the voice application.
  - 8. The method of claim 7, further comprising:
    - determining the at least one second action to perform based, at least in part, on information in a data structure specifying a link between voice input and the at least one second action; and
      
      wherein performing the at least one second action comprises performing the at least one second action based, at least in part, on the information in the data structure.

9. A non-transitory computer-readable storage medium encoded with a plurality of instructions that, when executed by a computer, perform a method of enabling voice interaction for at least one capability of a web application, wherein the web application includes a plurality of web pages rendered by a web browser, the method comprising:
- executing an agent for the web application, wherein the agent is configured to determine an identity of the web application;
  
  determining, by the agent, whether the web application is in a first context or a second context by using Document Object Model (DOM) events in the web browser to identify at least one marker on a web page of the web application that identifies the web application as being in the first context or the second context, wherein the first context corresponds to a first state of the web application in which a first set of user interface elements is displayed on a first web page of the plurality of web pages of the web application and the second context corresponds to a second state of the web application in which a second set of user interface elements is displayed on a second web page of the plurality of web pages of the web application;
  
  receiving first voice input;
  
  enabling, when it is determined that the web application is in the first context, voice interaction for the at least one capability of the web application, wherein the at least one capability is not exposed by the web browser;
  
  recognizing, by a voice application, one or more voice commands in the received first voice input when the voice interaction for the at least one capability is enabled, wherein the one or more voice commands are associated with the first context; and
  
  performing at least one first action based, at least in part, on the one or more recognized voice commands.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The computer-readable storage medium of claim 9, wherein executing the agent comprises launching the agent in the web browser in response to detecting a page load event for one of the plurality of web pages for the web application.
  - 11. The computer-readable storage medium of claim 9, wherein the method further comprises:
    - determining, by the agent, the identity of the web application; and
      
      injecting a script into the web browser based, at least in part, on the determined identity of the web application.
  - 12. The computer-readable storage medium of claim 9, wherein the method further comprises:
    - determining that the web application has changed from the first context to the second context;
      
      receiving second voice input; and
      
      recognizing, in response to determining that the web application has changed from the first context to the second context, one or more voice commands in the second voice input, wherein the one or more voice commands are associated with the second context.
  - 13. The computer-readable storage medium of claim 12, wherein the one or more voice commands associated with the second context are specified in a data structure accessible to the agent.
  - 14. The computer-readable storage medium of claim 13, wherein the method further comprises:
    - receiving information from the voice application that a user has input one of the one or more voice commands associated with the second context; and
      
      performing at least one second action based, at least in part, on the information received from the voice application.
  - 15. The computer-readable storage medium of claim 14, wherein the method further comprises:
    - determining the at least one second action to perform based, at least in part, on information in a data structure specifying a link between voice input and the at least one second action; and
      
      wherein performing the at least one second action comprises performing the at least one second action based, at least in part, on the information in the data structure.

16. A computer for enabling voice interaction for at least one capability of a web application, wherein the web application includes a plurality of web pages rendered by a web browser, the computer comprising:
- a voice interface configured to receive first voice input; and
  
  at least one processor programmed to;
  
  execute an agent for the web application, wherein the agent is configured to determine an identity of the web application;
  
  determine, by the agent, whether the web application is in the first context or the second context by using Document Object Model (DOM) events in the web browser to identify at least one marker on a web page of the web application that identifies the web application as being in the first context or the second context, wherein the first context corresponds to a first state of the web application in which a first set of user interface elements is displayed on a first web page of the plurality of web pages of the web application and the second context corresponds to a second state of the web application in which a second set of user interface elements is displayed on a second web page of the plurality of web pages of the web application;
  
  enable, when it is determined that the web application is in the first context, voice interaction for the at least one capability of the web application, wherein the at least one capability is not exposed by the web browser;
  
  recognize, by a voice application, one or more voice commands in the received first voice input when the voice interaction for the at least one capability is enabled, wherein the one or more voice commands are associated with the first context; and
  
  perform at least one first action based, at least in part, on the one or more recognized voice commands.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computer of claim 16, wherein executing the agent comprises launching the agent in the web browser in response to detecting a page load event for one of the plurality of web pages of the web application.
  - 18. The computer of claim 16, wherein the at least one processor is further programmed to:
    - determine that the web application has changed from the first context to the second context; and
      
      recognize, in response to determining that the web application has changed from the first context to the second context, one or more voice commands in second voice input, wherein the one or more voice commands are associated with the second context.
  - 19. The computer of claim 18, wherein the one or more voice commands associated with the second context are specified in a data structure accessible to the agent.
  - 20. The computer of claim 19, wherein the at least one processor is further programmed to:
    - receive information from the voice application that a user has input one of the one or more voice commands associated with the second context;
      
      perform at least one second action based, at least in part, on the information received from the voice application; and
      
      determine the at least one second action to perform based, at least in part, on information in a data structure specifying a link between voice input and the at least one second action, wherein performing the at least one second action comprises performing the at least one second action based, at least in part, on the information in the data structure.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Reich, David E., Hardy, Christopher
Primary Examiner(s)
ORTIZ SANCHEZ, MICHAEL

Application Number

US13/565,234
Publication Number

US 20140039898A1
Time in Patent Office

2,329 Days
Field of Search

704275
US Class Current
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 2015/228   of application context

H04L 67/02   based on web technology, e....

Methods and apparatus for voice-enabling a web application

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Methods and apparatus for voice-enabling a web application

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links