System and process for controlling electronic components in a ubiquitous computing environment using multimodal integration
First Claim
1. A multimodal electronic component control system comprising:
- an object selection subsystem;
a gesture recognition subsystem;
a speech control subsystem; and
an integration subsystem into which the obiect selection, gesture recognition and speech control subsystems provide inputs, said integration subsystem integrating said inputs to arrive at a unified interpretation of what component a user wants to control and what control action is desired, and wherein the integration subsystem comprises,a dynamic Bayes network which determines from the individual inputs of the obiect selection, gesture recognition, and speech control subsystems, the identity of a component the user wants to control (i.e., the referent), a command that the user wishes to implement (i.e., the command), and the appropriate control action to be taken to affect the identified referent in view of the command, said dynamic Bayes network comprising input, referent, command and action nodes, wherein the input nodes include said individual inputs which provide information as to their state to at least one of a referent, command, or action node, said inputs determining the state of the referent and command nodes, and wherein the states of the referent and command nodes are fed into an action node whose state indicates the action that is to be implemented to affect the referent, and wherein said referent, command and action node states comprise probability distributions indicating the probability that each possible referent, command and action is the respective referent, command and action.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention is directed toward a system and process that controls a group of networked electronic components using a multimodal integration scheme in which inputs from a speech recognition subsystem, gesture recognition subsystem employing a wireless pointing device and pointing analysis subsystem also employing the pointing device, are combined to determine what component a user wants to control and what control action is desired. In this multimodal integration scheme, the desired action concerning an electronic component is decomposed into a command and a referent pair. The referent can be identified using the pointing device to identify the component by pointing at the component or an object associated with it, by using speech recognition, or both. The command may be specified by pressing a button on the pointing device, by a gesture performed with the pointing device, by a speech recognition event, or by any combination of these inputs.
381 Citations
43 Claims
-
1. A multimodal electronic component control system comprising:
-
an object selection subsystem; a gesture recognition subsystem; a speech control subsystem; and an integration subsystem into which the obiect selection, gesture recognition and speech control subsystems provide inputs, said integration subsystem integrating said inputs to arrive at a unified interpretation of what component a user wants to control and what control action is desired, and wherein the integration subsystem comprises, a dynamic Bayes network which determines from the individual inputs of the obiect selection, gesture recognition, and speech control subsystems, the identity of a component the user wants to control (i.e., the referent), a command that the user wishes to implement (i.e., the command), and the appropriate control action to be taken to affect the identified referent in view of the command, said dynamic Bayes network comprising input, referent, command and action nodes, wherein the input nodes include said individual inputs which provide information as to their state to at least one of a referent, command, or action node, said inputs determining the state of the referent and command nodes, and wherein the states of the referent and command nodes are fed into an action node whose state indicates the action that is to be implemented to affect the referent, and wherein said referent, command and action node states comprise probability distributions indicating the probability that each possible referent, command and action is the respective referent, command and action. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-implemented multimodal electronic component control process comprising:
-
a pointer-based obiect selection process module; a gesture recognition process module; a speech control process module; and a dynamic Bayes network into which the obiect selection, gesture recognition and speech control process modules provide inputs, said dynamic Bayes network integrating the inputs to arrive at a unified interpretation of what component a user wants to control and what control action is desired by determining from the individual inputs of the obiect selection, gesture recognition and speech control process modules, the identity of a component the user wants to control (i.e., the referent), a command that the user wishes to implement (i.e., the command), and the appropriate control action to be taken to affect the identified referent in view of the command, wherein the dynamic Bayes network has a process flow architecture comprising a series of input nodes including said individual inputs which provide information as to their state to at least one of a referent node, a command node and an action node, said inputs determining the state of the referent and command nodes, and wherein the state of the referent and command node is fed into an action node whose state is determined by the input from the referent, command and input nodes, and whose state indicates the action that is to be implemented to affect the referent, and wherein said referent, command and action node states comprise probability distributions indicating the probability that each possible referent, command and action is the respective referent, command and action. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A computer-implemented process for controlling a user-selected electronic component within an environment using a pointing device, comprising using a computer to perform the following process actions:
-
computing a similarity between an input sequence of sensor values output by the pointing device and recorded over a prescribed period of time and at least one stored prototype sequence, wherein each prototype sequence represents the sequence of said sensor values that are generated if the user performs a unique gesture representing a different control action for the selected electronic component using the pointing device; determining if the computed similarity between the input sequence and any prototype sequence exceeds a prescribed similarity threshold; and whenever it is determined that one of the computed similarities exceeds the similarity threshold, implementing a command represented by the gesture. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A system for controlling a user-selected electronic component within an environment using a pointing device, comprising:
-
a general purpose computing device; a computer program comprising program modules executable by the computing device, wherein the computing device is directed by the program modules of the computer program to, input orientation messages transmitted by the pointing device, said orientation messages comprising orientation sensor readings generated by orientation sensors of the pointing device, generate and store at least one prototype sequence in a training phase, each prototype sequence comprising prescribed one or ones of the orientation readings that are generated by the pointing device during the time a user moves the pointing device in a gesture representing a particular command for controlling the selected electronic component, wherein each prototype sequence is associated with a different gesture and a different command, record said prescribed one or ones of the orientation sensor readings from each orientation message inputted for a prescribed period of time to create an input sequence of sensor readings, for each stored prototype sequence, compute a similarity indicator between the input sequence and each prototype sequence under consideration, wherein the similarity indicator is a measure the similarity between sequences, identify the largest of the computed similarity indicators, determine if identified largest computed similarity indicator exceeds a prescribed similarity threshold, whenever the identified similarity indicator exceeds the prescribed similarity threshold, designate that the user has performed the gesture corresponding to the prototype sequence associated with that similarity indicator, and implement the command represented by the gesture that the user was designated to have performed. - View Dependent Claims (27)
-
-
28. A computer-implemented process for controlling a user-selected electronic component within an environment using a pointing device, comprising using a computer to perform the following process actions:
-
inputting orientation messages transmitted by the pointing device, said orientation messages comprising orientation sensor readings generated by orientation sensors of the pointing device; recording a prescribed one or ones of the pointing device sensor outputs taken from an orientation message inputted whenever a user indicates performing a gesture representing a control action for the selected electronic component; for each gesture threshold definition assigned to the selected electronic component, determining whether the threshold if just one, or all the thresholds if more than one, of the gesture threshold definition under consideration are exceeded by the recorded sensor output associated with the same sensor output as the threshold; whenever it is determined that the threshold if just one, or all the thresholds if more than one, of one of the gesture threshold definitions are exceeded by the recorded sensor output associated with the same sensor output, designating that the user has performed the gesture associated with that gesture threshold definition; and implementing the command represented by the gesture that the user was designated to have performed. - View Dependent Claims (29, 30, 31, 32, 33, 34)
-
-
35. A system for controlling electronic components within an environment using a pointing device, comprising:
-
a pointing device comprising a transceiver and orientation sensors, wherein the outputs of the sensors are periodically packaged as orientation messages and transmitted using the transceiver; a base station comprising a transceiver which receives orientation messages transmitted by the pointing device; a pair of imaging devices each of which is located so as to capture images of the environment from different viewpoints; a computing device which is in communication with the base station and the imaging devices so as to receive orientation messages forwarded to it by the base station and images captured by the imaging devices, and which, computes the orientation and location of the pointer from the received orientation message and captured images, determines if the pointing device is pointing at an object in the environment which corresponds to, or is associated with, an electronic component that is controllable by the computing device using the orientation and location of the pointing device, and if so selects the electronic component, affects the selected electronic component in accordance with a command received from the pointing device. - View Dependent Claims (36, 37, 38, 39, 40, 41, 42, 43)
-
Specification