Method and apparatus for mentoring via an augmented reality assistant

US 10,573,037 B2
Filed: 12/20/2012
Issued: 02/25/2020
Est. Priority Date: 12/20/2012
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for utilizing augmented reality to assist a user in performing a real-world task comprising:

generating a scene understanding based on an automated analysis of a video input and an audio input, the video input comprising a view of the user of a real-world scene during performance of a task, the audio input comprising speech of the user during performance of the task, and the automated analysis further comprising identifying an object in the real-world scene, extracting one or more visual cues to situate the user in relation to the identified object, wherein the user is situated by tracking a head orientation of the user;

correlating the scene understanding with a knowledge database comprising at least data relating to models of respective procedures of different tasks to create a task understanding of the task in the scene understanding, wherein the task understanding comprises a set of goals relating to performance of the task in the scene understanding;

processing the task understanding along with the models of respective procedures of the different tasks from the knowledge database to determine a next step of the task;

generating a plurality of visual representations responsive to an ongoing interaction of the computer-implemented method with the user relating to the next step to achieve a goal;

presenting the plurality of visual representations on a see-through display as an augmented overlay to the user'"'"'s view of the real-world scene wherein the plurality of visual representations are rendered based on predicted head pose based on the tracked head orientation;

guiding a user to perform the next step of the task during operation of the task via visual or audio output;

analyzing actions of the user during the performance of the next task in response to the augmented overlay using the task understanding along with the models of respective procedures of the different tasks; and

if the user has not completed all tasks, modifying or creating new visual representations to be generated and presented as an augmented overlay of a second next step of the task understanding.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for training and guiding users comprising generating a scene understanding based on video and audio input of a scene of a user performing a task in the scene, correlating the scene understanding with a knowledge base to produce a task understanding, comprising one or more goals, of a current activity of the user, reasoning, based on the task understanding and a user'"'"'s current state, a next step for advancing the user towards completing one of the one or more goals of the task understanding and overlaying the scene with an augmented reality view comprising one or more visual and audio representation of the next step to the user.

Citations

17 Claims

1. A computer-implemented method for utilizing augmented reality to assist a user in performing a real-world task comprising:
- generating a scene understanding based on an automated analysis of a video input and an audio input, the video input comprising a view of the user of a real-world scene during performance of a task, the audio input comprising speech of the user during performance of the task, and the automated analysis further comprising identifying an object in the real-world scene, extracting one or more visual cues to situate the user in relation to the identified object, wherein the user is situated by tracking a head orientation of the user;
  
  correlating the scene understanding with a knowledge database comprising at least data relating to models of respective procedures of different tasks to create a task understanding of the task in the scene understanding, wherein the task understanding comprises a set of goals relating to performance of the task in the scene understanding;
  
  processing the task understanding along with the models of respective procedures of the different tasks from the knowledge database to determine a next step of the task;
  
  generating a plurality of visual representations responsive to an ongoing interaction of the computer-implemented method with the user relating to the next step to achieve a goal;
  
  presenting the plurality of visual representations on a see-through display as an augmented overlay to the user'"'"'s view of the real-world scene wherein the plurality of visual representations are rendered based on predicted head pose based on the tracked head orientation;
  
  guiding a user to perform the next step of the task during operation of the task via visual or audio output;
  
  analyzing actions of the user during the performance of the next task in response to the augmented overlay using the task understanding along with the models of respective procedures of the different tasks; and
  
  if the user has not completed all tasks, modifying or creating new visual representations to be generated and presented as an augmented overlay of a second next step of the task understanding.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 further comprising:
    - training a user to perform the task understanding as a training exercise.
  - 3. The method of claim 1 wherein the visual representations of a next step of the task understanding are aligned with the real-world scene based on a pose of the user.
  - 4. The method of claim 1 further comprising:
    - analyzing the audio input to generate a language understanding based on natural language input in the audio input; and
      
      identifying one or more speakers in the audio input where a topic of conversation in the real-world scene is bounded to a specific domain.
  - 5. The method of claim 4 further comprising:
    - distinguishing acoustic realizations of sounds in the audio input using a statistical model to discriminate between a set of words.
  - 6. The method of claim 5 further comprising:
    - determining a goal of the user in a given utterance in the audio input; and
      
      extracting a set of arguments associated with the goal.
  - 7. The method of claim 1 further comprising:
    - producing the task understanding based on prior scene understanding information stored in the knowledge database and based on a semantic frame representing a user goal up to a current time.
  - 8. The method of claim 1 further comprising:
    - prioritizing among each of the goals in the set of goals for the task understanding; and
      
      suggesting a next step to the user towards completion of the real-world task, based on said prioritizing.
  - 9. The method of claim 1, wherein at least one of the goals in the task understanding further comprises a set of sub-goals.

10. An apparatus for utilizing augmented reality in assisting users in completing a complex physical task, the apparatus comprising:
- at least one processor;
  
  at least one input device; and
  
  at least one storage device storing processor-executable instructions which, when executed by the at least one processor, perform a method comprising;
  
  generating a scene understanding based on an automated analysis of a video input and an audio input, the video input comprising a view of the user of a real-world scene during performance of a task, the audio input comprising speech of the user during performance of the task, and the automated analysis further comprising identifying an object in the real-world scene, extracting one or more visual cues to situate the user in relation to the identified object, wherein the user is situated by tracking a head orientation of the user;
  
  correlating the scene understanding with a knowledge database comprising at least data relating to models of respective procedures of different tasks to create a task understanding of the task in the scene understanding, wherein the task understanding comprises a set of goals relating to performance of the task in the scene understanding;
  
  processing the task understanding along with the models of respective procedures of different tasks from the knowledge database to determine a next step of the task;
  
  generating a plurality of visual representations responsive to an ongoing interaction of the apparatus with the user relating to the next step to achieve a goal, wherein the plurality of visual representations are rendered based on predicted head pose based on the tracked head orientation;
  
  presenting the plurality of visual representations on a see-through display as an augmented overlay to the user'"'"'s view of the real-world scene;
  
  analyzing actions of the user during the performance of the next task in response to the augmented overlay using the task understanding along with the models of respective procedures of the different tasks; and
  
  if the user has not completed all tasks, modifying or creating new visual representations to be generated and presented as an augmented overlay of a second next step of the task understanding.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The apparatus of claim 10 wherein generating a scene understanding comprises observing visual cues and scene characteristics to generate the scene understanding.
  - 12. The apparatus of claim 10 wherein the method further aligns the visual representations of a next step in the set of goals with the real-world scene based on a pose of the user.
  - 13. The apparatus of claim 10, wherein the method further comprises:
    - analyzing the audio input to generate a language understanding based on natural language input in the audio input; and
      
      identifying one or more speakers in the audio input where a topic of conversation is bounded to a specific domain.
  - 14. The apparatus of claim 13, wherein the method further comprises distinguishing acoustic realizations of sounds in the audio input using a statistical model to discriminate between a set of words.
  - 15. The apparatus of claim 14, wherein the method further comprises:
    - determining a goal of the user in a given utterance in the audio input; and
      
      extracting a set of arguments associated with the goal.
  - 16. The apparatus of claim 10 wherein the method further comprises:
    - producing the task understanding based on prior scene understanding information stored in the knowledge database and a semantic frame representing a user goal up to a current time.

17. A computer-implemented method for utilizing augmented reality to assist a user in performing a real-world task, the method comprising:
- generating a scene understanding based on an automated analysis of a video input and an audio input, the video input comprising a view of the user of a real-world scene during performance of the task, the audio input comprising speech of the user during performance of the task, and the automated analysis further comprising identifying an object in the real-world scene, extracting one or more visual cues to situate the user in relation to the identified object, wherein the user is situated by tracking a head orientation of the user;
  
  correlating the scene understanding with a knowledge database comprising at least data relating to models of respective procedures of different tasks to create a task understanding of the task in the scene understanding, wherein the task understanding comprises a set of goals relating to performance of the task in the scene understanding;
  
  generating a plurality of visual representations responsive to an ongoing interaction of the computer-implemented method with the user relating to the next step to achieve a goal, the ongoing interaction comprising at least audible interactions with the user based on natural language utterances interpreted from the speech received from the user, each of the visual representations relating one or more of the natural language utterances to one or more objects recognized in the real-world scene;
  
  computing a head pose of the user and a set of visual occlusions in the scene as viewed from a viewpoint of the user;
  
  presenting the plurality of visual representations on a see-through display as an augmented overlay to the user'"'"'s view of the one or more objects in the real-world scene, wherein the plurality of visual representations are rendered based on predicted head pose based on the tracked head orientation;
  
  analyzing actions of the user during the performance of the next task in response to the augmented overlay using the task understanding along with the models of respective procedures of the different tasks; and
  
  if the user has not completed all tasks, modifying or creating new visual representations to be generated and presented as an augmented overlay of a second next step of the task understanding.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SRI International, Inc.
Original Assignee
SRI International, Inc.
Inventors
Kumar, Rakesh, Samarasekera, Supun, Acharya, Girish, Wolverton, Michael John, Ayan, Necip Fazil, Zhu, Zhiwei, Villamil, Ryan
Primary Examiner(s)
He, Yingchun

Application Number

US13/721,276
Publication Number

US 20140176603A1
Time in Patent Office

2,623 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 3/011 Arrangements for interactio...

G06T 11/60 Editing figures and text; C...

Method and apparatus for mentoring via an augmented reality assistant

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for mentoring via an augmented reality assistant

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links