Depth based context identification

US 9,092,394 B2
Filed: 06/15/2012
Issued: 07/28/2015
Est. Priority Date: 06/15/2012
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of recognizing verbal commands, comprising:

capturing at least one depth image by a depth camera positioned in a vehicle, each of the depth image covering at least part of a user and comprising pixels representing distances from the depth camera to the at least part of the user;

recognizing a pose or gesture of the user based on the captured depth image;

generating gesture information based on the recognized pose or gesture, the gesture information indicating a direction pointed by the user outward of the vehicle towards a point-of-interest outside the vehicle;

determining one or more devices among a plurality of devices that are likely to be targeted by the user for an operation by analyzing the gesture information and without performing speech recognition on an audio signal including an utterance by the user;

selecting a plurality of verbal commands associated with the one or more devices determined as likely being targeted;

receiving the audio signal including the utterance by the user at a time when the at least one depth image is being captured; and

determining a device command for operating the one or more devices likely being targeted by performing speech recognition on the audio signal using the selected plurality of verbal commands, the determined device command representing an action associated with the point-of-interest.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method or system for selecting or pruning applicable verbal commands associated with speech recognition based on a user'"'"'s motions detected from a depth camera. Depending on the depth of the user'"'"'s hand or arm, the context of the verbal command is determined and verbal commands corresponding to the determined context are selected. Speech recognition is then performed on an audio signal using the selected verbal commands. By using an appropriate set of verbal commands, the accuracy of the speech recognition is increased.

Citations

18 Claims

1. A computer-implemented method of recognizing verbal commands, comprising:
- capturing at least one depth image by a depth camera positioned in a vehicle, each of the depth image covering at least part of a user and comprising pixels representing distances from the depth camera to the at least part of the user;
  
  recognizing a pose or gesture of the user based on the captured depth image;
  
  generating gesture information based on the recognized pose or gesture, the gesture information indicating a direction pointed by the user outward of the vehicle towards a point-of-interest outside the vehicle;
  
  determining one or more devices among a plurality of devices that are likely to be targeted by the user for an operation by analyzing the gesture information and without performing speech recognition on an audio signal including an utterance by the user;
  
  selecting a plurality of verbal commands associated with the one or more devices determined as likely being targeted;
  
  receiving the audio signal including the utterance by the user at a time when the at least one depth image is being captured; and
  
  determining a device command for operating the one or more devices likely being targeted by performing speech recognition on the audio signal using the selected plurality of verbal commands, the determined device command representing an action associated with the point-of-interest.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the at least part of the user comprises a hand or a forearm of the user.
  - 3. The method of claim 1, wherein the depth camera is installed in an overhead console in the vehicle, the depth camera overlooking the user.
  - 4. The method of claim 1, wherein the plurality of devices comprise at least a navigation system and an entertainment system in the vehicle.
  - 5. The method of claim 1, wherein the gesture information indicates whether a hand or forearm of the user is located within a distance from the depth camera or beyond the distance from the depth camera, and wherein a first set of verbal commands is selected responsive to the gesture information indicating that the hand or the forearm is located within the distance, and wherein a second set of verbal commands are selected responsive to the gesture information indicating that the hand or the forearm is located beyond the distance.
  - 6. The method of claim 5, wherein the first set of verbal commands is associated with performing navigation operations in the vehicle.
  - 7. The method of claim 6, wherein the first set of verbal commands comprises a command for identifying or setting the point-of-interest for the navigation operations.
  - 8. The method of claim 6, wherein the second set of verbal commands is associated with operating an entertainment system, a climate control system or a diagnostic system.

9. A command processing system for recognizing verbal commands, comprising:
- a depth camera positioned in a vehicle and configured to capture at least one depth image by a depth camera, each of the depth image covering at least part of a user and comprising pixels representing distances from the depth camera to the at least part of the user;
  
  a gesture recognition module coupled to the depth camera, the gesture recognition module configured to recognize the pose or gesture of the user based on the captured depth image and generate gesture information based on the recognized pose or gesture, the gesture information indicating a direction pointed by the user outward of the vehicle towards a point-of-interest outside the vehicle;
  
  a command extraction module configured to;
  
  determine one or more devices among a plurality of devices that are likely to be targeted by the user for an operation by analyzing the gesture information and without performing speech recognition on an audio signal including an utterance by the user;
  
  select a plurality of verbal commands associated with the one or more devices determined as likely being targeted;
  
  receive the audio signal including the utterance by the user while the depth camera is capturing the at least one depth image; and
  
  determine a device command for operating the one or more devices likely being targeted by performing speech recognition on the audio signal using the selected plurality of verbal commands, the determined device command representing an action associated with the point-of-interest.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
- - 10. The command processing system of claim 9, wherein the at least part of the user comprises a hand or a forearm of the user.
  - 11. The command processing system of claim 9, wherein the depth camera is installed in an overhead console in the vehicle overlooking the user.
  - 12. The command processing system of claim 11, wherein the depth camera comprises a stereovision camera feeding captured images for processing into the at least one depth image.
  - 13. The command processing system of claim 9, wherein the plurality of devices comprise at least a navigation system and an entertainment system in the vehicle.
  - 14. The command processing system of claim 9, wherein the gesture information indicates whether a hand or forearm of the user is located within a distance from the depth camera or beyond the distance from the depth camera, and wherein the command extraction module selects a first set of verbal commands responsive to the gesture information indicating that the hand or the forearm is located within the distance and selects a second set of verbal commands responsive to the gesture information indicating that the hand or the forearm is located beyond the distance.
  - 15. The command processing system of claim 14, wherein the first set of verbal commands is associated with performing navigation operations in the vehicle.
  - 16. The command processing system of claim 14, wherein the first set of verbal commands comprise a command for identifying or setting the point-of-interest for the navigation operations.
  - 17. The command processing system of claim 16, wherein the second set of verbal commands is associated with operating an entertainment system, a climate control system or a diagnostic system.

18. A non-transitory computer readable storage medium for recognizing verbal commands, the computer readable storage medium structured to store instructions, when executed, cause a processor to:
- capture at least one depth image by a depth camera positioned in a vehicle, each of the depth image covering at least part of a user and comprising pixels representing distances from the depth camera to the at least part of the user;
  
  recognize a pose or gesture of the user based on the captured depth image;
  
  generate gesture information based on the recognized pose or gesture, the gesture information indicating a direction pointed by the user outward of the vehicle towards a point-of-interest outside the vehicle;
  
  determine one or more devices among a plurality of devices that are likely to be targeted by the user for an operation by analyzing the gesture information and without performing speech recognition on an audio signal including an utterance by the user;
  
  select a plurality of verbal commands associated with the one or more devices determined as likely being targeted;
  
  receive the audio signal including the utterance by the user while the at least one depth image is being captured; and
  
  determine a device command for operating the one or more devices likely being targeted by performing speech recognition on the audio signal using the selected plurality of verbal commands, the determined device command representing an action associated with the point-of-interest.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Edge Technologies, Inc, Honda Motor Co., Ltd. (Honda Motor Company)
Original Assignee
Honda Motor Co., Ltd. (Honda Motor Company)
Inventors
Dokor, Tarek El, Holmes, James, Cluster, Jordan, Yamamoto, Stuart, Vaghefinazari, Pedram
Primary Examiner(s)
Neway, Samuel G

Application Number

US13/524,351
Publication Number

US 20130339027A1
Time in Patent Office

1,138 Days
Field of Search

704231-257, 704270-278
US Class Current

1/1
CPC Class Codes

B60R 16/0373   Voice control in general G10L

G06F 2203/0381   Multimodal input, i.e. inte...

G06F 3/017   Gesture based interaction, ...

G06F 40/00   Handling natural language d...

G09G 5/08   Cursor circuits

G10L 15/24   Speech recognition using no...

G10L 15/25   using position of the lips,...

Depth based context identification

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Depth based context identification

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links