Synchronizing visual and speech events in a multimodal application
First Claim
1. A method for synchronizing visual and speech events in a multimodal application, the method comprising:
- calling a voice form of the multimodal application, wherein the multimodal application is run using at least one computer processor, wherein the multimodal application provides a multimodal web page to a client device over a network;
receiving speech from a user;
determining a semantic interpretation of the speech;
calling a global application update handler of the multimodal application including exiting a voice form;
identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation;
executing the additional processing function;
updating a visual element after executing the additional processing function;
updating a voice form after executing the additional processing function; and
restarting the voice form after executing the additional processing function,wherein determining a semantic interpretation of the speech comprises determining a plurality of semantic interpretations of the speech, andwherein identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation comprises identifying, by the global application update handler, an additional processing function for each of the plurality of semantic interpretations.
3 Assignments
0 Petitions
Accused Products
Abstract
Exemplary methods, systems, and products are disclosed for synchronizing visual and speech events in a multimodal application, including receiving from a user speech; determining a semantic interpretation of the speech; calling a global application update handler; identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation; and executing the additional function. Typical embodiments may include updating a visual element after executing the additional function. Typical embodiments may include updating a voice form after executing the additional function. Typical embodiments also may include updating a state table after updating the voice form. Typical embodiments also may include restarting the voice form after executing the additional function.
-
Citations
18 Claims
-
1. A method for synchronizing visual and speech events in a multimodal application, the method comprising:
-
calling a voice form of the multimodal application, wherein the multimodal application is run using at least one computer processor, wherein the multimodal application provides a multimodal web page to a client device over a network; receiving speech from a user; determining a semantic interpretation of the speech; calling a global application update handler of the multimodal application including exiting a voice form; identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation; executing the additional processing function; updating a visual element after executing the additional processing function; updating a voice form after executing the additional processing function; and restarting the voice form after executing the additional processing function, wherein determining a semantic interpretation of the speech comprises determining a plurality of semantic interpretations of the speech, and wherein identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation comprises identifying, by the global application update handler, an additional processing function for each of the plurality of semantic interpretations. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system for synchronizing visual and speech events in a multimodal application, the system comprising at least one computer processor, at least one computer memory operatively coupled to the computer processor, and computer program instructions disposed within the computer memory configured for:
-
calling a voice form of the multimodal application, wherein the multimodal application provides a multimodal web page to a client device over a network; receiving speech from a user; determining a semantic interpretation of the speech; calling a global application update handler of the multimodal application including exiting a voice form; identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation; executing the additional processing function; updating a visual element after executing the additional processing function; updating a voice form after executing the additional processing function; and restarting the voice form after executing the additional processing function, wherein determining a semantic interpretation of the speech comprises determining a plurality of semantic interpretations of the speech, and wherein identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation comprises identifying, by the global application update handler, an additional processing function for each of the plurality of semantic interpretations. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable storage medium comprising computer program instructions that, when executed on at least one processor in a computer, perform a method of synchronizing visual and speech events in a multimodal application, the method comprising:
-
calling a voice form of the multimodal application, wherein the multimodal application provides a multimodal web page to a client device over a network; receiving speech from a user; determining a semantic interpretation of the speech; calling a global application update handler of the multimodal application including exiting a voice form; identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation; executing the additional processing function; updating a visual element after executing the additional processing function; updating a voice form after executing the additional processing function; and restarting the voice form after executing the additional processing function, wherein determining a semantic interpretation of the speech comprises determining a plurality of semantic interpretations of the speech, and wherein identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation comprises identifying, by the global application update handler, an additional processing function for each of the plurality of semantic interpretations. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification