System and method for multimodal human-vehicle interaction and belief tracking

US 9,286,029 B2
Filed: 07/03/2013
Issued: 03/15/2016
Est. Priority Date: 06/06/2013
Status: Active Grant

First Claim

Patent Images

1. A method for multimodal human-vehicle interaction, comprising:

receiving input from an occupant in a vehicle via more than one mode, wherein the input includes a speech input and a gesture input;

performing multimodal recognition of the input to determine a reference to a point of interest based on the speech input and to extract a visual point of interest based on the gesture input and the reference to the point of interest in the speech input;

augmenting at least one recognition hypothesis based on the visual point of interest;

determining a belief state of the occupant'"'"'s intent, wherein the belief state is determined based on joint probability distribution tables of probabilistic ontology trees and the probabilistic ontology trees are based on the recognition hypothesis; and

selecting an action to take based on the determined belief state.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for multimodal human-vehicle interaction including receiving input from an occupant in a vehicle via more than one mode and performing multimodal recognition of the input. The method also includes augmenting at least one recognition hypothesis based on at least one visual point of interest and determining a belief state of the occupant'"'"'s intent based on the recognition hypothesis. The method further includes selecting an action to take based on the determined belief state.

18 Citations

View as Search Results

12 Claims

1. A method for multimodal human-vehicle interaction, comprising:
- receiving input from an occupant in a vehicle via more than one mode, wherein the input includes a speech input and a gesture input;
  
  performing multimodal recognition of the input to determine a reference to a point of interest based on the speech input and to extract a visual point of interest based on the gesture input and the reference to the point of interest in the speech input;
  
  augmenting at least one recognition hypothesis based on the visual point of interest;
  
  determining a belief state of the occupant'"'"'s intent, wherein the belief state is determined based on joint probability distribution tables of probabilistic ontology trees and the probabilistic ontology trees are based on the recognition hypothesis; and
  
  selecting an action to take based on the determined belief state.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, wherein performing multimodal recognition includes speech recognition of the speech input and gesture recognition of the gesture input.
  - 3. The method of claim 2, wherein the speech recognition includes determining whether the point of interest is present in a current dialog.
  - 4. The method of claim 1, wherein extracting the visual point of interest is based on the speech input, the gesture input, and a location of the vehicle.
  - 5. The method of claim 4, wherein the gesture input includes at least an eye gaze of the occupant.

6. A method for multimodal human-vehicle interaction, comprising:
- receiving an input from an occupant of a vehicle including a first input and a second input, wherein the first and second inputs represent different modalities;
  
  performing multimodal recognition of the first and second inputs to determine a reference of a point of interest based on the first input and to extract a visual point of interest based on the second input and the reference to the point of interest in the first input;
  
  modifying a recognition hypothesis of the first input with the second input;
  
  determining a belief state of the occupant'"'"'s intent, wherein the belief state is determined based on joint probability distribution tables of probabilistic ontology trees and the probabilistic ontology trees are based on the recognition hypothesis; and
  
  selecting an action to take based on the determined belief state.
- View Dependent Claims (7, 8, 9)
- - 7. The method of claim 6, wherein extracting the visual point of interest is based on the first input, the second input, and a location of the vehicle.
  - 8. The method of claim 7, wherein the recognition hypothesis is modified based on the visual point of interest.
  - 9. The method of claim 7, including determining whether the reference to the visual point of interest is within a current dialog.

10. A system for multimodal human-vehicle interaction, comprising:
- a plurality of sensors for sensing interaction data from a vehicle occupant, wherein the interaction data includes a speech input and a gesture input;
  
  a multimodal recognition module for performing multimodal recognition of the interaction data;
  
  a point of interest identification module for determining a reference to a point of interest based on the speech input and to extract a visual point of interest based on the gesture input and the reference to the point of interest in the speech input, wherein the multimodal recognition module augments a recognition hypothesis based on the visual point of interest,a belief tracking module for determining a belief state of the occupant'"'"'s intent, wherein the belief state is determined based on joint probability distribution tables of probabilistic ontology trees and the probabilistic ontology trees are based on the recognition hypothesis; and
  
  a dialog management and action module for selecting an action to take based on the determined belief state.
- View Dependent Claims (11, 12)
- - 11. The system of claim 10, including a point of interest history database wherein the point of interest identification module utilizes the database to determine whether the reference to the point of interest is within a current dialog.
  - 12. The system of claim 10, wherein extracting the visual point of interest from the gesture input is based on the speech input, the gesture input, and a location of the vehicle.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Honda Motor Co., Ltd. (Honda Motor Company), Capio, Inc. (Twilio, Inc.)
Original Assignee
Honda Motor Co., Ltd. (Honda Motor Company), Ian Lane
Inventors
Lane, Ian, Gupta, Rakesh, Raux, Antoine
Primary Examiner(s)
Olson, Jason
Assistant Examiner(s)
Abebe, Sosina

Application Number

US13/934,396
Publication Number

US 20140361973A1
Time in Patent Office

986 Days
Field of Search

345/156, 715/863
US Class Current

1/1
CPC Class Codes

G01C 21/3629   Guidance using speech or au...

G01C 21/3664   Details of the user input i...

G06F 1/1694   the I/O peripheral being a ...

G06F 2203/0381   Multimodal input, i.e. inte...

G06F 3/01   Input arrangements or combi...

G06F 3/013   Eye tracking input arrangem...

G06F 3/017   Gesture based interaction, ...

G06F 3/167   Audio in a user interface, ...

System and method for multimodal human-vehicle interaction and belief tracking

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

18 Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for multimodal human-vehicle interaction and belief tracking

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

18 Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links