AUGMENTING SPEECH RECOGNITION WITH DEPTH IMAGING
First Claim
Patent Images
1. On a computing device, a method for recognizing speech of a user, comprising:
- receiving depth information of a physical space from a depth camera;
receiving audio information from one or more microphones;
identifying a set of one or more possible spoken words from the audio information;
determining a speech input for the computing device based upon comparing the set of one or more possible spoken words from the audio information and the depth information; and
taking an action on the computing device based upon the speech input determined.
3 Assignments
0 Petitions
Accused Products
Abstract
Embodiments related to the use of depth imaging to augment speech recognition are disclosed. For example, one disclosed embodiment provides, on a computing device, a method including receiving depth information of a physical space from a depth camera, receiving audio information from one or more microphones, identifying a set of one or more possible spoken words from the audio information, determining a speech input for the computing device based upon comparing the set of one or more possible spoken words from the audio information and the depth information, and taking an action on the computing device based upon the speech input determined.
-
Citations
20 Claims
-
1. On a computing device, a method for recognizing speech of a user, comprising:
-
receiving depth information of a physical space from a depth camera; receiving audio information from one or more microphones; identifying a set of one or more possible spoken words from the audio information; determining a speech input for the computing device based upon comparing the set of one or more possible spoken words from the audio information and the depth information; and taking an action on the computing device based upon the speech input determined. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. On a computing device, a method for recognizing speech of a user, comprising:
-
receiving depth image information of a physical space from a depth camera; receiving audio information from one or more microphones; identifying one or more spoken words from the audio information; identifying one or more contextual elements from the depth image information; determining whether the one or more spoken words are intended as a user input to the computing system based upon the one or more contextual elements; performing an action via the computing device if it is determined that the spoken words are intended as a user input; and not performing the action via the computing device if it is determined that the spoken words are not intended as a user input. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A method for recognizing speech of a user, comprising:
-
receiving depth information of a physical space from a depth camera; receiving audio information from one or more microphones; identifying one or more of a mouth, tongue, and throat of the user from the depth information; identifying one or more of mouth movements, tongue movements, and throat movements of the user; determining that the user is speaking based on the identified movements; responsive to the determination that the user is speaking, identifying a speech input from the received audio information; and taking an action on the computing device in response to identifying the speech input. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification