System and method for generating and presenting multi-modal applications from intent-based markup scripts
First Claim
1. A method for presenting an application in a plurality of modalities, comprising the steps of:
- retrieving a modality-independent document from one of local and remote storage;
parsing the modality-independent document using parsing rules obtained from one of local or remote storage;
converting the modality-independent document to a first intermediate representation that can be rendered by a speech user interface modality;
converting the modality-independent document to a second intermediate representation that can be rendered by a GUI (graphical user interface) modality;
building a cross-reference table by which the speech user interface can access components comprising the second intermediate representation;
rendering the first and second intermediate representations in their respective modality; and
receiving a user input in one of the GUI and speech user interface modalities to enable multi-modal interaction and control the document presentation.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for rendering modality-independent scripts (e.g., intent-based markup scripts) in a multi-modal environment and, in particular, to a multi-modal user interface for an application, whereby the user can interact with the application using a plurality of modalities (e.g., speech and GUI). The multi-modal interface automatically synchronizes I/O events over the plurality of modalities presented. In one aspect, the system provides immediate, synchronized rendering of the modality-independent document in each of the supported modalities. In another aspect, the system provides deferred rendering and presentation of intent-based scripts to an end user, wherein the system comprises a transcoder for generating a speech markup language script (such as a VoiceXML document) from the modality-independent script and rendered (via, e.g., VoiceXML browser) at a later time.
112 Citations
33 Claims
-
1. A method for presenting an application in a plurality of modalities, comprising the steps of:
-
retrieving a modality-independent document from one of local and remote storage;
parsing the modality-independent document using parsing rules obtained from one of local or remote storage;
converting the modality-independent document to a first intermediate representation that can be rendered by a speech user interface modality;
converting the modality-independent document to a second intermediate representation that can be rendered by a GUI (graphical user interface) modality;
building a cross-reference table by which the speech user interface can access components comprising the second intermediate representation;
rendering the first and second intermediate representations in their respective modality; and
receiving a user input in one of the GUI and speech user interface modalities to enable multi-modal interaction and control the document presentation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method for providing global help information when presenting a modality-independent document, the method comprising the steps of:
-
preparing an internal representation of a structure and component attributes of the modality-independent document;
building a grammar comprising rules for resolving specific spoken requests;
processing a spoken request utilizing the grammar rules; and
presenting an aural description of the modality-independent document in response to the spoken request. - View Dependent Claims (15, 16)
-
-
17. A method for providing contextual help information when presenting a modality-independent document, the method comprising the steps of:
-
preparing an internal representation of a structure and component attributes of the modality-independent document;
building a grammar comprising rules for resolving specific spoken requests;
processing a spoken request utilizing the grammar rules; and
presenting an aural description of the components, attributes, and methods of interaction of the modality-independent document in response to the spoken request. - View Dependent Claims (18, 19)
-
-
20. A method for providing feedback information when presenting a modality-independent document, the method comprising the steps of:
-
preparing an internal representation of the structure and component attributes of the modality-independent document;
building a grammar comprising rules for resolving specific spoken requests;
processing a spoken request and resolving the spoken request utilizing the grammar rules;
obtaining state and value information regarding specified components of the document from the internal representation of the document; and
presenting an aural description of the content values associated with document components in response to the spoken request. - View Dependent Claims (21, 22)
-
-
23. A method for aurally spelling out content values associated with components of a modality-independent document, the method comprising the steps of:
-
preparing an internal representation of a structure and component attributes of the modality-independent document;
building a grammar comprising rules for resolving specific spoken requests;
processing a spoken request utilizing the grammar rules;
obtaining state and content value information regarding specified components of the document from the internal representation of the document; and
presenting each character of the content value information requested in response to the spoken request. - View Dependent Claims (24, 25, 26)
-
-
27. A system for presenting an application in a plurality of modalities, comprising:
-
a multi-modal manager for parsing a modality-independent document to generate a traversal model that maps components of the modality-independent document to at least a first and second modality-specific representation;
a speech user interface manager for rendering and presenting the first modality-specific representation in a speech modality;
a GUI (graphical user interface) manager for rendering and presenting the second modality-specific representation in a GUI modality;
an event queue monitor for detecting GUI events;
an event queue for storing captured GUI events; and
a plurality of methods, that are called by the speech user interface manager, for synchronizing I/O (input/output) events across the speech and GUI modalities. - View Dependent Claims (28, 29, 30, 31, 32, 33)
-
Specification