Method and apparatus for mentoring via an augmented reality assistant
First Claim
1. A computer-implemented method for utilizing augmented reality to assist a user in performing a real-world task comprising:
- generating a scene understanding based on an automated analysis of a video input and an audio input, the video input comprising a view of the user of a real-world scene during performance of a task, the audio input comprising speech of the user during performance of the task, and the automated analysis further comprising identifying an object in the real-world scene, extracting one or more visual cues to situate the user in relation to the identified object, wherein the user is situated by tracking a head orientation of the user;
correlating the scene understanding with a knowledge database comprising at least data relating to models of respective procedures of different tasks to create a task understanding of the task in the scene understanding, wherein the task understanding comprises a set of goals relating to performance of the task in the scene understanding;
processing the task understanding along with the models of respective procedures of the different tasks from the knowledge database to determine a next step of the task;
generating a plurality of visual representations responsive to an ongoing interaction of the computer-implemented method with the user relating to the next step to achieve a goal;
presenting the plurality of visual representations on a see-through display as an augmented overlay to the user'"'"'s view of the real-world scene wherein the plurality of visual representations are rendered based on predicted head pose based on the tracked head orientation;
guiding a user to perform the next step of the task during operation of the task via visual or audio output;
analyzing actions of the user during the performance of the next task in response to the augmented overlay using the task understanding along with the models of respective procedures of the different tasks; and
if the user has not completed all tasks, modifying or creating new visual representations to be generated and presented as an augmented overlay of a second next step of the task understanding.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for training and guiding users comprising generating a scene understanding based on video and audio input of a scene of a user performing a task in the scene, correlating the scene understanding with a knowledge base to produce a task understanding, comprising one or more goals, of a current activity of the user, reasoning, based on the task understanding and a user'"'"'s current state, a next step for advancing the user towards completing one of the one or more goals of the task understanding and overlaying the scene with an augmented reality view comprising one or more visual and audio representation of the next step to the user.
-
Citations
17 Claims
-
1. A computer-implemented method for utilizing augmented reality to assist a user in performing a real-world task comprising:
-
generating a scene understanding based on an automated analysis of a video input and an audio input, the video input comprising a view of the user of a real-world scene during performance of a task, the audio input comprising speech of the user during performance of the task, and the automated analysis further comprising identifying an object in the real-world scene, extracting one or more visual cues to situate the user in relation to the identified object, wherein the user is situated by tracking a head orientation of the user; correlating the scene understanding with a knowledge database comprising at least data relating to models of respective procedures of different tasks to create a task understanding of the task in the scene understanding, wherein the task understanding comprises a set of goals relating to performance of the task in the scene understanding; processing the task understanding along with the models of respective procedures of the different tasks from the knowledge database to determine a next step of the task; generating a plurality of visual representations responsive to an ongoing interaction of the computer-implemented method with the user relating to the next step to achieve a goal; presenting the plurality of visual representations on a see-through display as an augmented overlay to the user'"'"'s view of the real-world scene wherein the plurality of visual representations are rendered based on predicted head pose based on the tracked head orientation; guiding a user to perform the next step of the task during operation of the task via visual or audio output; analyzing actions of the user during the performance of the next task in response to the augmented overlay using the task understanding along with the models of respective procedures of the different tasks; and if the user has not completed all tasks, modifying or creating new visual representations to be generated and presented as an augmented overlay of a second next step of the task understanding. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An apparatus for utilizing augmented reality in assisting users in completing a complex physical task, the apparatus comprising:
-
at least one processor; at least one input device; and at least one storage device storing processor-executable instructions which, when executed by the at least one processor, perform a method comprising; generating a scene understanding based on an automated analysis of a video input and an audio input, the video input comprising a view of the user of a real-world scene during performance of a task, the audio input comprising speech of the user during performance of the task, and the automated analysis further comprising identifying an object in the real-world scene, extracting one or more visual cues to situate the user in relation to the identified object, wherein the user is situated by tracking a head orientation of the user; correlating the scene understanding with a knowledge database comprising at least data relating to models of respective procedures of different tasks to create a task understanding of the task in the scene understanding, wherein the task understanding comprises a set of goals relating to performance of the task in the scene understanding; processing the task understanding along with the models of respective procedures of different tasks from the knowledge database to determine a next step of the task; generating a plurality of visual representations responsive to an ongoing interaction of the apparatus with the user relating to the next step to achieve a goal, wherein the plurality of visual representations are rendered based on predicted head pose based on the tracked head orientation; presenting the plurality of visual representations on a see-through display as an augmented overlay to the user'"'"'s view of the real-world scene; analyzing actions of the user during the performance of the next task in response to the augmented overlay using the task understanding along with the models of respective procedures of the different tasks; and if the user has not completed all tasks, modifying or creating new visual representations to be generated and presented as an augmented overlay of a second next step of the task understanding. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A computer-implemented method for utilizing augmented reality to assist a user in performing a real-world task, the method comprising:
-
generating a scene understanding based on an automated analysis of a video input and an audio input, the video input comprising a view of the user of a real-world scene during performance of the task, the audio input comprising speech of the user during performance of the task, and the automated analysis further comprising identifying an object in the real-world scene, extracting one or more visual cues to situate the user in relation to the identified object, wherein the user is situated by tracking a head orientation of the user; correlating the scene understanding with a knowledge database comprising at least data relating to models of respective procedures of different tasks to create a task understanding of the task in the scene understanding, wherein the task understanding comprises a set of goals relating to performance of the task in the scene understanding; generating a plurality of visual representations responsive to an ongoing interaction of the computer-implemented method with the user relating to the next step to achieve a goal, the ongoing interaction comprising at least audible interactions with the user based on natural language utterances interpreted from the speech received from the user, each of the visual representations relating one or more of the natural language utterances to one or more objects recognized in the real-world scene; computing a head pose of the user and a set of visual occlusions in the scene as viewed from a viewpoint of the user; presenting the plurality of visual representations on a see-through display as an augmented overlay to the user'"'"'s view of the one or more objects in the real-world scene, wherein the plurality of visual representations are rendered based on predicted head pose based on the tracked head orientation; analyzing actions of the user during the performance of the next task in response to the augmented overlay using the task understanding along with the models of respective procedures of the different tasks; and if the user has not completed all tasks, modifying or creating new visual representations to be generated and presented as an augmented overlay of a second next step of the task understanding.
-
Specification