Method and apparatus for 3-D auto tagging

US 10,592,747 B2
Filed: 11/02/2018
Issued: 03/17/2020
Est. Priority Date: 04/26/2018
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

on a mobile device including a processor, a memory, a camera, a plurality of sensors, a microphone and a display and a touch screen sensor, receiving via an input interface on the mobile device a request to generate a multi-view interactive digital media representation (MVIDMR) of an object;

recording a first plurality of frames from the camera on the mobile device from a live video stream as the mobile device moves along a trajectory such that different views of the object are captured in the first plurality of frames;

generating the MVIDMR of the object including a second plurality of frames from the first plurality of frames wherein the different views of the object are included in each of the second plurality of frames;

using a machine learning algorithm on the second plurality of frames to generate heatmaps and part affinity fields associated with possible 2-D pixel locations of a plurality of landmarks on the object wherein the machine learning algorithm is trained to recognize the plurality of landmarks;

based upon the heatmaps and part affinity fields, determining a skeleton for the object wherein the plurality of landmarks form joints of the skeleton and wherein determining the skeleton includes determining the 2-D pixel locations of the joints;

rendering a first selectable tag into the second plurality of frames to form a third plurality of frames associated with a tagged MVIDMR wherein the first selectable tag is associated with a first landmark positioned at a first joint within the skeleton and wherein the first selectable tag is rendered into the second plurality frames relative to first 2-D pixel locations determined for the first joint in the second plurality of frames;

receiving media content associated with the first selectable tag;

outputting a first frame from the third plurality of frames of the tagged MVIDMR that includes the first selectable tag;

receiving input from the touch screen sensor indicating the first selectable tag is selected in the first frame from the tagged MVIDMR; and

in response, outputting the media content associated with the first selectable tag to the display.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A multi-view interactive digital media representation (MVIDMR) of an object can be generated from live images of an object captured from a camera. Selectable tags can be placed at locations on the object in the MVIDMR. When the selectable tags are selected, media content can be output which shows details of the object at location where the selectable tag is placed. A machine learning algorithm can be used to automatically recognize landmarks on the object in the frames of the MVIDMR and a structure from motion calculation can be used to determine 3-D positions associated with the landmarks. A 3-D skeleton associated with the object can be assembled from the 3-D positions and projected into the frames associated with the MVIDMR. The 3-D skeleton can be used to determine the selectable tag locations in the frames of the MVIDMR of the object.

Citations

26 Claims

1. A method comprising:
- on a mobile device including a processor, a memory, a camera, a plurality of sensors, a microphone and a display and a touch screen sensor, receiving via an input interface on the mobile device a request to generate a multi-view interactive digital media representation (MVIDMR) of an object;
  
  recording a first plurality of frames from the camera on the mobile device from a live video stream as the mobile device moves along a trajectory such that different views of the object are captured in the first plurality of frames;
  
  generating the MVIDMR of the object including a second plurality of frames from the first plurality of frames wherein the different views of the object are included in each of the second plurality of frames;
  
  using a machine learning algorithm on the second plurality of frames to generate heatmaps and part affinity fields associated with possible 2-D pixel locations of a plurality of landmarks on the object wherein the machine learning algorithm is trained to recognize the plurality of landmarks;
  
  based upon the heatmaps and part affinity fields, determining a skeleton for the object wherein the plurality of landmarks form joints of the skeleton and wherein determining the skeleton includes determining the 2-D pixel locations of the joints;
  
  rendering a first selectable tag into the second plurality of frames to form a third plurality of frames associated with a tagged MVIDMR wherein the first selectable tag is associated with a first landmark positioned at a first joint within the skeleton and wherein the first selectable tag is rendered into the second plurality frames relative to first 2-D pixel locations determined for the first joint in the second plurality of frames;
  
  receiving media content associated with the first selectable tag;
  
  outputting a first frame from the third plurality of frames of the tagged MVIDMR that includes the first selectable tag;
  
  receiving input from the touch screen sensor indicating the first selectable tag is selected in the first frame from the tagged MVIDMR; and
  
  in response, outputting the media content associated with the first selectable tag to the display.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 2. The method of claim 1, wherein the object is car.
  - 3. The method of claim 1, wherein the skeleton is a 2-D skeleton.
  - 4. The method of claim 1, wherein the skeleton is a 3-D skeleton.
  - 5. The method of claim 4, further comprising based upon a structure from motion calculation, determining 3-D positions of the joints of the 3-D skeleton.
  - 6. The method of claim 5, wherein the structure from motion calculation includes a bundle adjustment calculation.
  - 7. The method of claim 5, further comprising determining the 2-D pixel locations associated with the 3-D positions of the joints of the 3-D skeleton.
  - 8. The method of claim 4, further comprising rendering the 3-D skeleton into a first frame from among the second plurality of frames associated with the MVIDMR.
  - 9. The method of claim 8, wherein the 3-D skeleton is rendered over the object.
  - 10. The method of claim 8, wherein the 3-D skeleton is rendered off-set from the object.
  - 11. The method of claim 8, wherein the rendering includes projecting 3-D positions associated with joints of the 3-D skeleton into a 2-D pixel coordinate system associated with the first frame.
  - 12. The method of claim 8, wherein a first portion of the joints of the 3-D skeleton rendered into the first frame are associated with first landmarks visible on the object in the first frame and wherein a second portion of the joints of the 3-D skeleton rendered into the first frame are associated with second landmarks occluded on the object in the first frame.
  - 13. The method of claim 1 wherein the machine learning algorithm includes a neural net.
  - 14. The method of claim 1 wherein a first plurality of selectable tags are associated with a first plurality of joints of the skeleton.
  - 15. The method of claim 1 further comprising determining the 2-D pixel locations of the joints of the skeleton in a first frame of the second plurality of frames and extrapolating the joints of the skeleton to a second frame of the second plurality of frames wherein the extrapolation includes determining the 2-D pixel locations of the joints of the skeleton in the second frame.
  - 16. The method of claim 15 wherein the extrapolation is performed using key point tracking from the first frame to the second frame.
  - 17. The method of claim 15 wherein the extrapolation is performed using IMU data associated with the first frame and the second frame.
  - 18. The method of claim 1 further comprising determining the 2-D pixel locations of the joints of the skeleton in a first frame and a third frame of the second plurality of frames and interpolating the skeleton to a second frame of the second plurality of frames between the first frame and the third frame wherein the interpolation of the skeleton to the second frame includes determining the 2-D pixel locations of the joints of the skeleton in the second frame based upon the 2-D pixel locations of the joints in the first frame and the third frame.
  - 19. The method of claim 1, wherein the media content shows one or more close-up views of the first landmark on the object associated with the first selectable tag.
  - 20. The method of claim 1, wherein the media content is one of a photo showing a close-up view of the first landmark on the object or a second MVIDMR showing close-up views of the first landmark on the object.
  - 21. The method of claim 1, further comprising applying stabilization and smoothing to the first plurality of frames to generate the second plurality of frames.

22. A method comprising:
- on a mobile device including a processor, a memory, a camera, a plurality of sensors, a microphone and a display and a touch screen sensor, receiving via an input interface on the mobile device a request to generate a multi-view interactive digital media representation (MVIDMR) of car;
  
  recording a first plurality of frames from the camera on the mobile device from a live video stream as the mobile device moves along a trajectory such that different views of the car are captured in the first plurality of frames;
  
  generating the MVIDMR of the car including a second plurality of frames from the first plurality of frames wherein the different views of the car are included in each of the second plurality of frames;
  
  using a machine learning algorithm on the second plurality of frames to generate heatmaps and part affinity fields associated with possible 2-D pixel locations of a plurality of landmarks on the car wherein the machine learning algorithm is trained to recognize the plurality of landmarks;
  
  based upon the heatmaps and part affinity fields, determining a 3-D skeleton for the car wherein the plurality of landmarks form joints of the 3-D skeleton and wherein determining the 3-D skeleton includes determining the 3-D positions of the joints;
  
  rendering a first selectable tag into the second plurality of frames to form a third plurality of frames associated with a tagged MVIDMR wherein the first selectable tag is associated with a first landmark positioned at a first joint in the 3-D skeleton and wherein the first selectable tag is rendered into the second plurality frames relative to first 2-D pixel locations determined for the first joint in the second plurality of frames from first 3-D positions associated with the first joint;
  
  receiving media content associated with the first selectable tag;
  
  outputting a first frame from the third plurality of frames of the tagged MVIDMR of the car that includes the first selectable tag;
  
  receiving input from the touch screen sensor indicating the first selectable tag is selected in the first frame from the tagged MVIDMR; and
  
  in response outputting the media content associated with the first selectable tag to the display.
- View Dependent Claims (23, 24, 25, 26)
- - 23. The method of claim 22, wherein the landmarks are selected from the group consisting of a location on a roof of the car, a location on a side mirror on the car, a location on a tail light of the car, a location on the tires of the car and a location headlights on the car.
  - 24. The method of claim 22, wherein the first selectable tag is associated with a damaged location on the car and wherein the media content shows one or more close-up views of the damaged location.
  - 25. The method of claim 22, wherein the first selectable tag is associated with a component or a region of the car and wherein the media content shows one or more close-up views of the component or the region of the car.
  - 26. The method of claim 22, wherein the MVIDMR shows an interior of the car.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fyusion, Inc. (Cox Enterprises Incorporated)
Original Assignee
Fyusion, Inc. (Cox Enterprises Incorporated)
Inventors
Beall, Chris, Kar, Abhishek, Holzer, Stefan Johannes Josef, Rusu, Radu Bogdan, Hanchar, Pavel
Primary Examiner(s)
Kim, Hee-Yong

Application Number

US16/179,702
Publication Number

US 20190332866A1
Time in Patent Office

501 Days
Field of Search

348 48
US Class Current
CPC Class Codes

G01B 11/24   for measuring contours or c...

G01B 2210/54   Revolving an optical measur...

G06T 17/00   Three dimensional [3D] mode...

G06T 17/30   Polynomial surface description

G06T 19/003   Navigation within 3D models...

G06V 10/422   for representing the struct...

G06V 10/772   Determining representative ...

G06V 20/10   Terrestrial scenes scenes u...

G06V 20/20   in augmented reality scenes

G06V 20/64   Three-dimensional objects

G06V 20/70   Labelling scene content, e....

G06V 2201/08   Detecting or categorising v...

H04N 21/816   involving special video dat...

H04N 21/854   Content authoring

H04N 21/858   Linking data to content, e....

H04N 23/683   performed by a processor, e...

H04N 23/698   for achieving an enlarged f...

Method and apparatus for 3-D auto tagging

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for 3-D auto tagging

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links