SYSTEM AND METHOD FOR CONTINUOUS MULTIMODAL SPEECH AND GESTURE INTERACTION
First Claim
1. A method comprising:
- continuously monitoring an audio stream associated with a gesture input stream;
detecting a speech event in the audio stream;
identifying a temporal window associated with a time of the speech event;
analyzing, via a processor, data from the gesture input stream within the temporal window to identify a gesture event; and
processing the speech event and the gesture event to produce a multimodal command.
3 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for processing multimodal input. A system configured to practice the method continuously monitors an audio stream associated with a gesture input stream, and detects a speech event in the audio stream. Then the system identifies a temporal window associated with a time of the speech event, and analyzes data from the gesture input stream within the temporal window to identify a gesture event. The system processes the speech event and the gesture event to produce a multimodal command. The gesture in the gesture input stream can be directed to a display, but is remote from the display. The system can analyze the data from the gesture input stream by calculating an average of gesture coordinates within the temporal window.
107 Citations
20 Claims
-
1. A method comprising:
-
continuously monitoring an audio stream associated with a gesture input stream; detecting a speech event in the audio stream; identifying a temporal window associated with a time of the speech event; analyzing, via a processor, data from the gesture input stream within the temporal window to identify a gesture event; and processing the speech event and the gesture event to produce a multimodal command. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 16)
-
-
12. A system comprising:
-
a processor; a memory having stored therein instructions for controlling the processor to perform steps comprising; monitoring an audio stream associated with a gesture input stream; detecting a speech event in the audio stream; identifying a temporal window associated with a time of the speech event; analyzing, via a processor, data from the gesture input stream within the temporal window to identify a gesture event; and processing the speech event and the gesture event to produce a multimodal command. - View Dependent Claims (13, 14, 15)
-
-
17. A non-transitory computer-readable storage medium having stored therein instructions which, when executed by a computing device, cause the computing device to perform steps comprising:
-
continuously monitoring an audio stream associated with a gesture input stream; detecting a speech event in the audio stream; identifying a temporal window associated with a time of the speech event; analyzing, via a processor, data from the gesture input stream within the temporal window to identify a gesture event; and processing the speech event and the gesture event to produce a multimodal command. - View Dependent Claims (18, 19, 20)
-
Specification