Simultaneous localization and mapping for video coding

US 10,484,697 B2
Filed: 09/03/2015
Issued: 11/19/2019
Est. Priority Date: 09/09/2014
Status: Active Grant

First Claim

Patent Images

1. A method of decoding video data, the method comprising:

generating a synthetic image based on a composite image and a scene structure map, wherein the composite image is constructed from one or more images that were previously decoded, wherein the scene structure map comprises a scene structure map of a current image of the video data or a scene structure map of an image of the video data that was previously decoded, wherein the scene structure map includes coordinate values for three-dimensional points, which indicate positions and relative depth of the points, within the current image or the image that was previously decoded, wherein generating the synthetic image comprises utilizing camera position and orientation information of the current image to render the synthetic image such that camera position and orientation for the synthetic image and the current image is the same, and wherein generating the synthetic image further comprises;

interconnecting points of the scene structure map to form a proxy geometry;

texture mapping the composite image to the proxy geometry to form an image-based model; and

rendering the image-based model to generate the synthetic image;

determining a residual image, wherein the residual image is indicative of a difference between the current image and the synthetic image, and wherein determining the residual image comprises determining the residual image based on one or more portions of the current image including a background static portion; and

reconstructing the current image based on the synthetic image and the residual image.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Video encoding and decoding techniques are described in which a predictive image s formed from texture mapping a composite image to a proxy geometry that provides an approximation of a three-dimensional structure of a current image or a previously encoded or decoded image. A residual between the predictive image and the current image is used to encode or decode the current image.

19 Citations

View as Search Results

23 Claims

1. A method of decoding video data, the method comprising:
- generating a synthetic image based on a composite image and a scene structure map, wherein the composite image is constructed from one or more images that were previously decoded, wherein the scene structure map comprises a scene structure map of a current image of the video data or a scene structure map of an image of the video data that was previously decoded, wherein the scene structure map includes coordinate values for three-dimensional points, which indicate positions and relative depth of the points, within the current image or the image that was previously decoded, wherein generating the synthetic image comprises utilizing camera position and orientation information of the current image to render the synthetic image such that camera position and orientation for the synthetic image and the current image is the same, and wherein generating the synthetic image further comprises;
  
  interconnecting points of the scene structure map to form a proxy geometry;
  
  texture mapping the composite image to the proxy geometry to form an image-based model; and
  
  rendering the image-based model to generate the synthetic image;
  
  determining a residual image, wherein the residual image is indicative of a difference between the current image and the synthetic image, and wherein determining the residual image comprises determining the residual image based on one or more portions of the current image including a background static portion; and
  
  reconstructing the current image based on the synthetic image and the residual image.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, wherein determining the residual image comprises receiving the residual image of the current image.
  - 3. The method of claim 1, wherein texture mapping comprises texture mapping the composite image to the proxy geometry based on camera position and orientation information of one or more previously decoded images.
  - 4. The method of claim 1, further comprising:
    - receiving one of information of the scene structure map of the current image or information indicative of a difference between the scene structure map of the current image and the scene structure map of the image that was previously decoded; and
      
      generating the scene structure map based on the received information,wherein generating the synthetic image comprises generating the synthetic image based on the composite image and the generated scene structure map.
  - 5. The method of claim 1, further comprising:
    - receiving one or both of a camera position and a camera orientation for the one or more previously decoded images used to construct the composite image and the current image,wherein generating the synthetic image comprises generating the synthetic image based on one or both of the camera position and the camera orientation of the one or more previously decoded images and the current image.

6. A method of encoding video data, the method comprising:
- generating a synthetic image based on a composite image and a scene structure map, wherein the composite image is constructed from one or more images that were previously encoded, wherein the scene structure map comprises a scene structure map of a current image of the video data or a scene structure map of an image of the video data that was previously encoded, wherein the scene structure map includes coordinate values for three-dimensional points, which indicate positions and relative depth of the points, within the current image or the image that was previously encoded, wherein generating the synthetic image comprises utilizing camera position and orientation information of the current image to render the synthetic image such that camera position and orientation for the synthetic image and the current image is the same, and wherein generating the synthetic image further comprises;
  
  interconnecting points of the scene structure map to form a proxy geometry;
  
  texture mapping the composite image to the proxy geometry to form an image-based model; and
  
  rendering the image-based model to generate the synthetic image;
  
  determining a residual image based on the synthetic image and the current image, wherein the residual image is indicative of a difference between the current image and the synthetic image, and wherein determining the residual image comprises determining the residual image based on one or more portions of the current image including a background static portion; and
  
  outputting information indicative of the residual image to encode the current image of the video data.
- View Dependent Claims (7, 8, 9, 10, 11, 12)
- - 7. The method of claim 6, further comprising:
    - determining camera position and orientation information for the one or more previously encoded images used to construct the composite image and a camera position and orientation information for the current image,wherein texture mapping comprises texture mapping the composite image to the proxy geometry based on the camera position and orientation information of the one or more previously encoded images.
  - 8. The method of claim 7, further comprising:
    - determining the camera position of the current image utilizing simultaneous localization and mapping (SLAM) techniques,wherein determining the camera position for the one or more previously encoded images comprises determining the camera position for the one or more previously encoded images utilizing the SLAM techniques.
  - 9. The method of claim 6, further comprising:
    - generating the scene structure map utilizing simultaneous localization and mapping (SLAM) techniques.
  - 10. The method of claim 6, further comprising:
    - determining one of information of the scene structure map of the current image, or information indicative of a difference between the scene structure map of the current image and the scene structure map of the image that was previously encoded; and
      
      outputting the determined information.
  - 11. The method of claim 6, wherein the synthetic image comprises a first synthetic image, the method further comprising:
    - outputting one or both of a camera position and a camera orientation for the one or more previously encoded images used to construct the composite image and for the current image, wherein the one or both of the camera position and the camera orientation is used by a decoder processor to generate a second synthetic image that is same as the first synthetic image.
  - 12. The method of claim 6, further comprising:
    - determining a foreground non-static portion of the current image and the background static portion of the current image,wherein outputting the residual image comprises outputting the residual image in a first layer different from a second layer that includes residual data for the foreground non-static portion of the current image.

13. A device for coding video data, the device comprising:
- a video memory configured to store one or more images that were previously coded and that are used to construct a composite image; and
  
  a coder processor configured to;
  
  generate a synthetic image based on the composite image and a scene structure map, wherein the scene structure map comprises a scene structure map of a current image of the video data or a scene structure map of an image of the video data that was previously coded, wherein the scene structure map includes coordinate values for three-dimensional points, which indicates positions and relative depths of the points, within the current image or the image that was previously coded, wherein to generate the synthetic image, the coder processor is configured to utilize camera position and orientation information of the current image to render the synthetic image such that camera position and orientation for the synthetic image and the current image is the same, and wherein to generate the synthetic image, the coder processor is further configured to;
  
  interconnect points of the scene structure map to form a proxy geometry;
  
  texture map the composite image to the proxy geometry to form an image-based model; and
  
  render the image-based model to generate the synthetic image; and
  
  code the current image based on a residual image of the current image, wherein the residual image is indicative of a difference between the current image and the synthetic image, and wherein the residual image is based on one or more portions of the current image including a background static portion.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22)
- - 14. The device of claim 13, wherein the coder processor comprises a decoder processor, and wherein the decoder processor is configured to receive the residual image of the current image, and wherein to code the current image, the decoder processor is configured to decode the current image by reconstructing the current image based on the synthetic image and the residual image.
  - 15. The device of claim 14, wherein the decoder processor is configured to:
    - receive one of information of the scene structure map of the current image, or information indicative of a difference between the scene structure map of the current image and the scene structure map of the previously coded image; and
      
      generate the scene structure map based on the received information,wherein to generate the synthetic image, the decoder processor is configured to generate the synthetic image based on the composite image and the generated scene structure map.
  - 16. The device of claim 14, wherein the decoder processor is configured to:
    - receive one or both of a camera position and a camera orientation for the one or more previously coded images used to construct the composite image and the current image,wherein to generate the synthetic image, the decoder processor is configured to generate the synthetic image based on one or both of the camera position and the camera orientation of the one or more previously coded images and the current image.
  - 17. The device of claim 13, wherein the coder processor comprises an encoder processor, wherein to code the current image, the encoder processor is configured to:
    - determine the residual image based on the synthetic image and the current image; and
      
      output information indicative of the residual image to encode the current image of the video data.
  - 18. The device of claim 17, wherein the encoder processor is configured to generate the scene structure map utilizing simultaneous localization and mapping (SLAM) techniques.
  - 19. The device of claim 17, wherein the encoder processor is configured to:
    - determine one of information of the scene structure map of the current image or information indicative of a difference between the scene structure map of the current image and the scene structure map of the previously coded image; and
      
      output the determined information.
  - 20. The device of claim 17, wherein the synthetic image comprises a first synthetic image, and wherein the encoder processor is configured to:
    - determine a camera position for the one or more previously coded images utilizing simultaneous localization and mapping (SLAM) techniques; and
      
      output one or both of the camera position and a camera orientation for one or more previously coded images used to construct the composite image and for the current image, wherein the one or both of the camera position and the camera orientation is used by a decoder processor to generate a second synthetic image that is same as the first synthetic image.
  - 21. The device of claim 13, wherein to generate the synthetic image, the coder processor comprises a graphics processing unit (GPU), wherein the GPU is configured to generate the synthetic image.
  - 22. The device of claim 13, wherein to texture map, the coder processor is configured to texture map the composite image to the proxy geometry based on camera position and orientation information of one or more previously coded images.

23. A non-transitory computer-readable storage medium having instructions stored thereon that when executed cause one or more processors for a device for coding video data to:
- generate a synthetic image based on a composite image and a scene structure map, wherein the composite image is constructed from one or more images that were previously coded, wherein the scene structure map comprises a scene structure map of a current image of the video data or a scene structure map of an image of the video data that was previously coded, wherein the scene structure map includes coordinate values, which indicates positions and relative depths of the points, for three-dimensional points within the current image or the image that was previously coded, wherein the instructions that cause the one or more processors to generate the synthetic image comprise instructions that cause the one or more processors to utilize camera position and orientation information of the current image to render the synthetic image such that camera position and orientation for the synthetic image and the current image is the same, and wherein the instructions that cause the one or more processors to generate the synthetic image comprise instructions that cause the one or more processors to;
  
  interconnect points of the scene structure map to form a proxy geometry;
  
  texture map the composite image to the proxy geometry to form an image-based model; and
  
  render the image-based model to generate the synthetic image; and
  
  code the current image based on a residual image of the current image, wherein the residual image is indicative of a difference between the current image and the synthetic image, and wherein the residual image is based on one or more portions of the current image including a background static portion.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Grasmug, Phillip, Schmalstieg, Dieter, Reitmayr, Gerhard
Primary Examiner(s)
Billah, Masum

Application Number

US14/845,076
Publication Number

US 20160073117A1
Time in Patent Office

1,538 Days
Field of Search

37524026
US Class Current
CPC Class Codes

G06T 9/001   Model-based coding, e.g. wi...

H04N 19/23   with coding of regions that...

H04N 19/27   involving both synthetic an...

H04N 19/30   using hierarchical techniqu...

Simultaneous localization and mapping for video coding

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

19 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Simultaneous localization and mapping for video coding

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

19 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links