SYSTEM AND METHOD FOR CONTINUOUS MULTIMODAL SPEECH AND GESTURE INTERACTION
First Claim
1. A method comprising:
- monitoring an audio stream associated with a non-tactile gesture input stream;
identifying a speech event in an audio stream;
determining a temporal window associated with a time of the speech event, wherein the temporal window extends forward and backward from the time of the speech event;
analyzing, via a processor, data from the non-tactile gesture input stream within the temporal window to identify, based on the speech event, a non-tactile gesture event; and
processing the speech event and the non-tactile gesture event to produce a multimodal command.
3 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for processing multimodal input. A system configured to practice the method continuously monitors an audio stream associated with a gesture input stream, and detects a speech event in the audio stream. Then the system identifies a temporal window associated with a time of the speech event, and analyzes data from the gesture input stream within the temporal window to identify a gesture event. The system processes the speech event and the gesture event to produce a multimodal command. The gesture in the gesture input stream can be directed to a display, but is remote from the display. The system can analyze the data from the gesture input stream by calculating an average of gesture coordinates within the temporal window.
45 Citations
1 Claim
-
1. A method comprising:
-
monitoring an audio stream associated with a non-tactile gesture input stream; identifying a speech event in an audio stream; determining a temporal window associated with a time of the speech event, wherein the temporal window extends forward and backward from the time of the speech event; analyzing, via a processor, data from the non-tactile gesture input stream within the temporal window to identify, based on the speech event, a non-tactile gesture event; and processing the speech event and the non-tactile gesture event to produce a multimodal command.
-
Specification