Speech-centric multimodal user interface design in mobile technology
First Claim
Patent Images
1. A computer-implemented interface, comprising:
- a set of parsers configured to parse information received from a plurality of sources including a mixed modality of inputs;
a discourse manager configured to;
identify correlations in the information;
interpret the mixed modality of inputs based on environmental data associated with at least one of the mixed modality of inputs;
based on the identified correlations and the interpreted mixed modality of inputs, at least one of determine or infer an intent associated with the information; and
generate a confidence level for the intent as a function of the environmental data; and
a response manager configured to;
evaluate a first input of the mixed modality of inputs, the first input having a first modality initially employed as a primary modality;
based on the generated confidence level, provide feedback to request a second input having a second modality different from the first modality; and
substitute the second modality for the first modality as the primary modality until the environmental data changes.
2 Assignments
0 Petitions
Accused Products
Abstract
A multi-modal human computer interface (HCI) receives a plurality of available information inputs concurrently, or serially, and employs a subset of the inputs to determine or infer user intent with respect to a communication or information goal. Received inputs are respectively parsed, and the parsed inputs are analyzed and optionally synthesized with respect to one or more of each other. In the event sufficient information is not available to determine user intent or goal, feedback can be provided to the user in order to facilitate clarifying, confirming, or augmenting the information inputs.
-
Citations
19 Claims
-
1. A computer-implemented interface, comprising:
-
a set of parsers configured to parse information received from a plurality of sources including a mixed modality of inputs; a discourse manager configured to; identify correlations in the information; interpret the mixed modality of inputs based on environmental data associated with at least one of the mixed modality of inputs; based on the identified correlations and the interpreted mixed modality of inputs, at least one of determine or infer an intent associated with the information; and generate a confidence level for the intent as a function of the environmental data; and a response manager configured to; evaluate a first input of the mixed modality of inputs, the first input having a first modality initially employed as a primary modality; based on the generated confidence level, provide feedback to request a second input having a second modality different from the first modality; and substitute the second modality for the first modality as the primary modality until the environmental data changes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-readable storage medium storing instructions, the instructions when executed by a computing device causing the computing device to perform operations comprising:
-
receiving an input in a first modality as a primary modality; dynamically generating a first confidence level as a function of environmental data associated with the input, the environmental data comprising at least one of;
a user state, a device state, a context of a computer-implemented interface session, historical or current extrinsic information about the input or a source of the input, or a device capability;attributing a first weight to the input as a function of the first confidence level; based on the first weight, determining that the first modality is insufficient as an input and receiving at least one other input in a second modality different from the first modality as the primary modality; dynamically generating a second confidence level as a function of updated environmental data associated with the input; attributing a second weight to the input as a function of the second confidence level; based on the second weight, determining that the first modality has become sufficient and re-engaging the input in the first modality as the primary modality; analyzing the input and the at least one other input; at least one of determining or inferring an intent associated with the input and the at least one other input based on the analyzing; and performing late fusion on the input and the at least one other input to integrate the input and the at least one other input at a semantic level.
-
-
13. A method comprising:
- parsing inputs received from a plurality of sources into surface semantics represented in a semantic representation by utilizing a language model, each of the plurality of sources corresponding to a different modality;
providing environmental data associated with at least one of the inputs, the data comprising one or both of current data or historical data; adapting the language model to enhance accuracy of the parsing by utilizing the environmental data to compute at least one environmentally-specific conditional probability of at least one phrase of the inputs received from the plurality of sources; utilizing the semantic representation to generate discourse semantics; utilizing the discourse semantics to synthesize one or more responses to the inputs received from the plurality of sources; further comprising; generating, as a function of the environmental data, a confidence level for an intent associated with the inputs received from the plurality of sources; evaluating a first input of the inputs, the first input having a first modality initially employed as a primary modality; based on the generated confidence level, providing feedback to request a second input of the inputs having a second modality different from the first modality, and substituting the second modality for the first modality as the primary modality until the environmental data changes. - View Dependent Claims (14, 15, 16, 17, 18, 19)
- parsing inputs received from a plurality of sources into surface semantics represented in a semantic representation by utilizing a language model, each of the plurality of sources corresponding to a different modality;
Specification