Systems, methods and devices for augmenting video content

US 8,477,246 B2
Filed: 07/09/2009
Issued: 07/02/2013
Est. Priority Date: 07/11/2008
Status: Expired due to Fees

First Claim

Patent Images

1. A method for generating video with embedded image content, said method comprising:

receiving a selection input for a candidate location in a video frame of the video;

tracking the candidate location in subsequent video frames of the video byapproximating three-dimensional camera motion between two frames using a model that compensates for camera rotations, camera translations and zooming,statistically modeling three-dimensional camera motion between the video frames by estimating and using parameters of a transformation matrix that represents a projective transformation of images in the frame caused by movement of the camera, the projective transformation being based upon the composition of a pair of perspective projections of an image in the video frames, andoptimizing the approximation using the statistical modeling; and

embedding image content in the candidate location in the subsequent video frames of the video based upon the tracking thereof.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, products and devices are implemented for editing video image frames. According to one such method, image content is embedded into video. A selection input is received for a candidate location in a video frame of the video. The candidate location is traced in subsequent video frames of the video by approximating three-dimensional camera motion between two frames using a model that compensates for camera rotations, camera translations and zooming, and by optimizing the approximation using statistical modeling of three-dimensional camera motion between video frames. Image content is embedded in the candidate location in the subsequent video frames of the video based upon the tracking thereof.

Citations

21 Claims

1. A method for generating video with embedded image content, said method comprising:
- receiving a selection input for a candidate location in a video frame of the video;
  
  tracking the candidate location in subsequent video frames of the video byapproximating three-dimensional camera motion between two frames using a model that compensates for camera rotations, camera translations and zooming,statistically modeling three-dimensional camera motion between the video frames by estimating and using parameters of a transformation matrix that represents a projective transformation of images in the frame caused by movement of the camera, the projective transformation being based upon the composition of a pair of perspective projections of an image in the video frames, andoptimizing the approximation using the statistical modeling; and
  
  embedding image content in the candidate location in the subsequent video frames of the video based upon the tracking thereof.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein the statistical modeling includes directed probabilities that a camera position is correct, the directed probabilities being directed from a previous video frame, a current video frame and a previous camera position.
  - 3. The method of claim 1, wherein the statistical modeling includes directed probabilities that a camera position is correct, the directed probabilities being directed from a current video frame and at least one of a subsequent camera position and a subsequent video frame.
  - 4. The method of claim 1, further including the steps ofrepeating the step of tracking the candidate location in subsequent video frames of the video at different video resolutions;
    - andcombining results of the tracking steps at different resolutions to determine the candidate location.
  - 5. The method of claim 1, further including the steps ofdetermining a mask for the embedded image content as a function of pixel differences between temporally disparate image frames, the pixels being located within the candidate location;
    - andobscuring portions of the embedded image content in response to the determined mask.
  - 6. The method of claim 1, further including the steps of matching an appearance of the subsequent video frames by blending one or more global scene properties onto the embedded image content and blending one or more local scene properties onto the embedded image content.
  - 7. The method of claim 1, wherein the statistical modeling is based on machine learning and the projective transformation.
  - 8. The method of claim 1, further including blending one or more global scene properties onto said image content and blending one or more local scene properties onto said image content.
  - 9. The method of claim 1, wherein the step of approximating three-dimensional camera motion between two frames includes tracking scale-invariant feature transform features between video frames.
  - 10. The method of claim 1, wherein tracking of the candidate location is implemented without prior knowledge of physical markers placed within the video images.
  - 11. The method of claim 1, wherein the embedding image content is a video image-sequence.
  - 12. The method of claim 1, wherein the image content is accessed from a remote file location that can be dynamically modified and wherein modified versions of the image content can be embedded using results of the tracked candidate location that are independent of the dynamic modification.

13. An apparatus comprising:
- an electronic circuit configured and arranged to;
  
  receive a selection input for a candidate location in a first video frame of the video;
  
  track the candidate location in subsequent video frames of the video byapproximating three-dimensional camera motion between two frames,statistically modeling the three-dimensional camera motion between the video frames by estimating and using parameters of a transformation matrix that represents a projective transformation of images in the first video frame caused by movement of the camera, the projective transformation being based upon the composition of a pair of perspective projections of an image in the video frames, andoptimizing the approximation using the statistical modeling of three-dimensional camera motion between video frames; and
  
  embed image content in the candidate location in the subsequent video frames of the video.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The apparatus of claim 13, wherein the circuit is further configured and arranged to track users that view the video frames with embedded image content and track purchases made for goods or services advertised by the embedded image content.
  - 15. The apparatus of claim 13, wherein the circuit is further configured and arranged to generate the projective transformation that updates the candidate location for a subsequent video frame, the generation including the steps ofinitializing the projective transformation based upon a comparison of image features common to the first video frame and the subsequent video frame;
    - optimizing the projective transformation as a function of a normalized cross correlation and a relative deviation from the initial projective transformation;
      
      increasing the resolution of the first video frame and the subsequent video frame; and
      
      repeating the step of optimizing using the frames at the increased resolution.
  - 16. The apparatus of claim 13, wherein the circuit is further configured and arranged to generate an occlusion mask bycorrelating pixels in the first video frame and the subsequent video frame as a function of the tracked candidate location;
    - calculating, in a color space, a distance between the correlated pixels; and
      
      determining occluded pixels based upon a threshold distance.
  - 17. The apparatus of claim 13, further including a display circuit for providing a visual display of the video with the embedded image content.

18. A computer product comprising:
- non-transitory computer readable medium storing instructions that when executed perform the steps of;
  
  receiving a selection input for a candidate location in a video frame of a video;
  
  tracking the candidate location in subsequent video frames of the video byapproximating three-dimensional camera motion between two frames using a model that compensates for camera rotations, camera translations and zooming,statistically modeling three-dimensional camera motion between the video frames by estimating and using parameters of a transformation matrix that represents a projective transformation of images in the frame caused by movement of the camera, the projective transformation being based upon the composition of a pair of perspective projections of an image in the video frames, andoptimizing the approximation using the statistical modeling of three-dimensional camera motion between video frames; and
  
  embedding image content in the candidate location in the subsequent video frames of the video based upon the tracking thereof.
- View Dependent Claims (19, 20)
- - 19. The product of claim 18, wherein the computer readable medium further includes instructions for rendering the embedding image content bydetermining a mean and variance of color intensity in the candidate location without the embedded image content;
    - adjusting the color intensity of the embedded image content towards the determined mean and variance; and
      
      multiplicatively combining the luminosities of the embedded image content and in the candidate location without the embedded image content.
  - 20. The product of claim 18, wherein the computer readable medium is a non-volatile memory circuit.

21. A method for generating video with embedded image content, the video including a plurality of temporally-arranged video frames captured by a camera, said method comprising:
- receiving a selection input that identifies the position of a candidate location within a first one of the video frames;
  
  tracking the position of the candidate location in video frames that are temporally subsequent to the first one of the video frames bygenerating approximation data that approximates three-dimensional motion of the camera between two of the video frames by compensating for rotation, translation and zooming of the camera,statistically modeling three-dimensional camera motion between the video frames by estimating and using parameters of a transformation matrix that represents a projective transformation of images in the frame caused by movement of the camera, the projective transformation being based upon the composition of a pair of perspective projections of an image in the video frames,modifying the approximation data based on the statistic modeling, andusing the modified approximation data to determine the position of the candidate location in each of the subsequent video frames; and
  
  embedding image content in the determined position of the candidate location in the subsequent video frames.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Board of Trustees of the Leland Stanford Junior University (Stanford University)
Original Assignee
Board of Trustees of the Leland Stanford Junior University (Stanford University)
Inventors
Saxena, Ashutosh, Batra, Siddharth, Ng, Andrew Y.
Primary Examiner(s)
HARVEY, DAVID E

Application Number

US12/500,416
Publication Number

US 20100067865A1
Time in Patent Office

1,454 Days
Field of Search

386/280, 348/578, 348/584, 348/586, 345/630, 345/632, 345/634, 345/637, 345/645
US Class Current

348/586
CPC Class Codes

G06T 2207/10016   Video; Image sequence

G06T 2207/20016   Hierarchical, coarse-to-fin...

G06T 7/207   for motion estimation over ...

G06T 7/246   using feature-based methods...

G11B 27/034   on discs G11B27/036, G11B27...

G11B 27/28   by using information signal...

H04N 5/272   Means for inserting a foreg...

Systems, methods and devices for augmenting video content

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Systems, methods and devices for augmenting video content

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links