Depth based context identification
First Claim
Patent Images
1. A computer-implemented method of recognizing verbal commands, comprising:
- capturing at least one depth image by a depth camera positioned in a vehicle, each of the depth image covering at least part of a user and comprising pixels representing distances from the depth camera to the at least part of the user;
recognizing a pose or gesture of the user based on the captured depth image;
generating gesture information based on the recognized pose or gesture, the gesture information indicating a direction pointed by the user outward of the vehicle towards a point-of-interest outside the vehicle;
determining one or more devices among a plurality of devices that are likely to be targeted by the user for an operation by analyzing the gesture information and without performing speech recognition on an audio signal including an utterance by the user;
selecting a plurality of verbal commands associated with the one or more devices determined as likely being targeted;
receiving the audio signal including the utterance by the user at a time when the at least one depth image is being captured; and
determining a device command for operating the one or more devices likely being targeted by performing speech recognition on the audio signal using the selected plurality of verbal commands, the determined device command representing an action associated with the point-of-interest.
2 Assignments
0 Petitions
Accused Products
Abstract
A method or system for selecting or pruning applicable verbal commands associated with speech recognition based on a user'"'"'s motions detected from a depth camera. Depending on the depth of the user'"'"'s hand or arm, the context of the verbal command is determined and verbal commands corresponding to the determined context are selected. Speech recognition is then performed on an audio signal using the selected verbal commands. By using an appropriate set of verbal commands, the accuracy of the speech recognition is increased.
-
Citations
18 Claims
-
1. A computer-implemented method of recognizing verbal commands, comprising:
-
capturing at least one depth image by a depth camera positioned in a vehicle, each of the depth image covering at least part of a user and comprising pixels representing distances from the depth camera to the at least part of the user; recognizing a pose or gesture of the user based on the captured depth image; generating gesture information based on the recognized pose or gesture, the gesture information indicating a direction pointed by the user outward of the vehicle towards a point-of-interest outside the vehicle; determining one or more devices among a plurality of devices that are likely to be targeted by the user for an operation by analyzing the gesture information and without performing speech recognition on an audio signal including an utterance by the user; selecting a plurality of verbal commands associated with the one or more devices determined as likely being targeted; receiving the audio signal including the utterance by the user at a time when the at least one depth image is being captured; and determining a device command for operating the one or more devices likely being targeted by performing speech recognition on the audio signal using the selected plurality of verbal commands, the determined device command representing an action associated with the point-of-interest. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A command processing system for recognizing verbal commands, comprising:
-
a depth camera positioned in a vehicle and configured to capture at least one depth image by a depth camera, each of the depth image covering at least part of a user and comprising pixels representing distances from the depth camera to the at least part of the user; a gesture recognition module coupled to the depth camera, the gesture recognition module configured to recognize the pose or gesture of the user based on the captured depth image and generate gesture information based on the recognized pose or gesture, the gesture information indicating a direction pointed by the user outward of the vehicle towards a point-of-interest outside the vehicle; a command extraction module configured to; determine one or more devices among a plurality of devices that are likely to be targeted by the user for an operation by analyzing the gesture information and without performing speech recognition on an audio signal including an utterance by the user; select a plurality of verbal commands associated with the one or more devices determined as likely being targeted; receive the audio signal including the utterance by the user while the depth camera is capturing the at least one depth image; and determine a device command for operating the one or more devices likely being targeted by performing speech recognition on the audio signal using the selected plurality of verbal commands, the determined device command representing an action associated with the point-of-interest. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A non-transitory computer readable storage medium for recognizing verbal commands, the computer readable storage medium structured to store instructions, when executed, cause a processor to:
-
capture at least one depth image by a depth camera positioned in a vehicle, each of the depth image covering at least part of a user and comprising pixels representing distances from the depth camera to the at least part of the user; recognize a pose or gesture of the user based on the captured depth image; generate gesture information based on the recognized pose or gesture, the gesture information indicating a direction pointed by the user outward of the vehicle towards a point-of-interest outside the vehicle; determine one or more devices among a plurality of devices that are likely to be targeted by the user for an operation by analyzing the gesture information and without performing speech recognition on an audio signal including an utterance by the user; select a plurality of verbal commands associated with the one or more devices determined as likely being targeted; receive the audio signal including the utterance by the user while the at least one depth image is being captured; and determine a device command for operating the one or more devices likely being targeted by performing speech recognition on the audio signal using the selected plurality of verbal commands, the determined device command representing an action associated with the point-of-interest.
-
Specification