Real-time mobile device capture and generation of art-styled AR/VR content
First Claim
1. A method for generating a three-dimensional (3D) projection of an object in a virtual reality or augmented reality environment, the method including:
- obtaining a sequence of images using a single lens camera, the sequence of images being captured along a camera translation, wherein each image in the sequence of images contains at least a portion of overlapping subject matter, the subject matter including the object;
segmenting the object from the sequence of images using a trained segmenting neural network to form a sequence of segmented object images wherein the trained neural network is configured to aggregate a plurality of feature maps from different layers of the trained neural network in order to allow usage of both finer scale and coarser scale details to produce probability maps corresponding to the sequence of segmented object images, wherein the trained neural network is trained to label every pixel in each image in the sequence of images with a particular category label;
refining the sequence of segmented object images using fine-grained segmentation, wherein refining the sequence of segmented object images includes passing each probability map onto a temporal dense conditional random field (CRF) smoothing system to produce a binary mask for every segmented object image, wherein the binary masks are temporally consistent and sharply aligned at boundaries to each other;
applying an art-style transfer to the sequence of segmented object images using a trained transfer neural network;
computing on-the-fly interpolation parameters;
generating stereoscopic pairs from the sequence of segmented object images for displaying the object as a 3D projection in a virtual reality or augmented reality environment, the stereoscopic pairs being generated for one or more points along the camera translation; and
mapping segmented image indices to a rotation range for display in the virtual reality or augmented reality environment.
1 Assignment
0 Petitions
Accused Products
Abstract
Various embodiments describe systems and processes for generating AR/VR content. In one aspect, a method for generating a 3D projection of an object in a virtual reality or augmented reality environment comprises obtaining a sequence of images along a camera translation using a single lens camera. Each image contains a portion of overlapping subject matter, including the object. The object is segmented from the sequence of images using a trained segmenting neural network to form a sequence of segmented object images, to which an art-style transfer is applied using a trained transfer neural network. On-the-fly interpolation parameters are computed and stereoscopic pairs are generated for points along the camera translation from the refined sequence of segmented object images for displaying the object as a 3D projection in a virtual reality or augmented reality environment. Segmented image indices are mapped to a rotation range for display in the virtual reality or augmented reality environment.
101 Citations
16 Claims
-
1. A method for generating a three-dimensional (3D) projection of an object in a virtual reality or augmented reality environment, the method including:
-
obtaining a sequence of images using a single lens camera, the sequence of images being captured along a camera translation, wherein each image in the sequence of images contains at least a portion of overlapping subject matter, the subject matter including the object; segmenting the object from the sequence of images using a trained segmenting neural network to form a sequence of segmented object images wherein the trained neural network is configured to aggregate a plurality of feature maps from different layers of the trained neural network in order to allow usage of both finer scale and coarser scale details to produce probability maps corresponding to the sequence of segmented object images, wherein the trained neural network is trained to label every pixel in each image in the sequence of images with a particular category label; refining the sequence of segmented object images using fine-grained segmentation, wherein refining the sequence of segmented object images includes passing each probability map onto a temporal dense conditional random field (CRF) smoothing system to produce a binary mask for every segmented object image, wherein the binary masks are temporally consistent and sharply aligned at boundaries to each other; applying an art-style transfer to the sequence of segmented object images using a trained transfer neural network; computing on-the-fly interpolation parameters; generating stereoscopic pairs from the sequence of segmented object images for displaying the object as a 3D projection in a virtual reality or augmented reality environment, the stereoscopic pairs being generated for one or more points along the camera translation; and mapping segmented image indices to a rotation range for display in the virtual reality or augmented reality environment. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for generating a three-dimensional (3D) projection of an object in a virtual reality or augmented reality environment, the system comprising:
-
a single lens camera for obtaining a sequence of images, the sequence of images being captured along a camera translation, wherein each image in the sequence of images contains at least a portion of overlapping subject matter, the subject matter including the object; a display module; a processor, and memory storing one or more programs configured for execution by the processor, the one or more programs comprising instructions for; segmenting the object from the sequence of images using a trained segmenting neural network to form a sequence of segmented object images wherein the trained neural network is configured to aggregate a plurality of feature maps from different layers of the trained neural network in order to allow usage of both finer scale and coarser scale details to produce probability maps corresponding to the sequence of segmented object images, wherein the trained neural network is trained to label every pixel in each image in the sequence of images with a particular category label; refining the sequence of segmented object images using fine-grained segmentation, wherein refining the sequence of segmented object images includes passing each probability map onto a temporal dense conditional random field (CRF) smoothing system to produce a binary mask for every segmented object image, wherein the binary masks are temporally consistent and sharply aligned at boundaries to each other; applying an art-style transfer to the sequence of segmented object images using a trained transfer neural network; computing on-the-fly interpolation parameters; generating stereoscopic pairs from the sequence of segmented object images for displaying the object as a 3D projection in a virtual reality or augmented reality environment, the stereoscopic pairs being generated for one or more points along the camera translation; and mapping segmented image indices to a rotation range for display in the virtual reality or augmented reality environment. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A non-transitory computer readable medium storing one or more programs configured for execution by a computer, the one or more programs comprising instructions for:
-
obtaining a sequence of images using a single lens camera, the sequence of images being captured along a camera translation, wherein each image in the sequence of images contains at least a portion of overlapping subject matter, the subject matter including the object; segmenting the object from the sequence of images using a trained segmenting neural network to form a sequence of segmented object images wherein the trained neural network is configured to aggregate a plurality of feature maps from different layers of the trained neural network in order to allow usage of both finer scale and coarser scale details to produce probability maps corresponding to the sequence of segmented object images, wherein the trained neural network is trained to label every pixel in each image in the sequence of images with a particular category label; refining the sequence of segmented object images using fine-grained segmentation, wherein refining the sequence of segmented object images includes passing each probability map onto a temporal dense conditional random field (CRF) smoothing system to produce a binary mask for every segmented object image, wherein the binary masks are temporally consistent and sharply aligned at boundaries to each other; applying an art-style transfer to the sequence of segmented object images using a trained transfer neural network; computing on-the-fly interpolation parameters; generating stereoscopic pairs from the sequence of segmented object images for displaying the object as a 3D projection in a virtual reality or augmented reality environment, the stereoscopic pairs being generated for one or more points along the camera translation; and mapping segmented image indices to a rotation range for display in the virtual reality or augmented reality environment.
-
Specification