Standards-compliant model-based video encoding and decoding

US 9,743,078 B2
Filed: 03/12/2013
Issued: 08/22/2017
Est. Priority Date: 07/30/2004
Status: Active Grant

First Claim

Patent Images

1. A method of encoding raw video data, comprising:

receiving multiple frames of raw video data;

encoding the multiple frames of the raw video data to make an H.264 macroblock encoding;

identifying, in the H.264 macroblock encoding, a groups of pels in close proximity to each other exhibiting encoding complexity, such that the group of pels of the H.264 macroblock encoding use a disproportionate amount of bandwidth computationally relative to other regions in one or more of the multiple frames of raw video;

responding to the identified group of pels by forming tracking information including;

detecting, in the identified group of pels, at least one of a feature or an object in a region of interest of at least one frame of the raw video data, the region of interest of the detected at least one feature not being aligned with the underlying macroblock grid;

modeling the detected at least one of the feature and the object using a set of parameters; and

associating any instances of the detected and modeled at least one of the feature or the object across plural frames of the raw video data providing at least one feature or object track of the associated instances, each feature or object track providing tracking information of respective associated instances;

relating the at least one feature or object track to at least one macroblock of the raw video data to be encoded;

producing an indirect model-based prediction of the at least one macroblock of the raw video data using the tracking information of the at least one related feature or object track, by using offsets between (i) the at least one macroblock of the raw video data and (ii) respective instances from the at least one related feature or object track to generate indirect predictions for the at least one macroblock of the raw video data, such that the feature or object track information is used indirectly to predict macroblocks instead of directly to predict the at least one feature or object, the indirect model-based prediction having model-based motion vectors;

comparing the compression efficiency of a standards-compliant encoding derived from the model-based motion vectors with the compression efficiency of the H.264 macroblock encoding of the groups of pels in close proximity to each other exhibiting encoding complexity;

caching the model-based motion vectors if it is determined that the standards-compliant encoding derived from the model-based motion vectors provides improved compression efficiency relative to the H.264 macroblock encoding of the groups of pels in close proximity to each other exhibiting encoding complexity; and

incorporating the model-based motion vectors into a standards-compliant bit stream such that the model-based prediction is stored as standards-compliant encoded video data.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A model-based compression codec applies higher-level modeling to produce better predictions than can be found through conventional block-based motion estimation and compensation. Computer-vision-based feature and object detection algorithms identify regions of interest throughout the video datacube. The detected features and objects are modeled with a compact set of parameters, and similar feature/object instances are associated across frames. Associated features/objects are formed into tracks and related to specific blocks of video data to be encoded. The tracking information is used to produce model-based predictions for those blocks of data, enabling more efficient navigation of the prediction search space than is typically achievable through conventional motion estimation methods. A hybrid framework enables modeling of data at multiple fidelities and selects the appropriate level of modeling for each portion of video data. A compliant-stream version of the model-based compression codec uses the modeling information indirectly to improve compression while producing bitstreams that can be interpreted by standard decoders.

Citations

23 Claims

1. A method of encoding raw video data, comprising:
- receiving multiple frames of raw video data;
  
  encoding the multiple frames of the raw video data to make an H.264 macroblock encoding;
  
  identifying, in the H.264 macroblock encoding, a groups of pels in close proximity to each other exhibiting encoding complexity, such that the group of pels of the H.264 macroblock encoding use a disproportionate amount of bandwidth computationally relative to other regions in one or more of the multiple frames of raw video;
  
  responding to the identified group of pels by forming tracking information including;
  
  detecting, in the identified group of pels, at least one of a feature or an object in a region of interest of at least one frame of the raw video data, the region of interest of the detected at least one feature not being aligned with the underlying macroblock grid;
  
  modeling the detected at least one of the feature and the object using a set of parameters; and
  
  associating any instances of the detected and modeled at least one of the feature or the object across plural frames of the raw video data providing at least one feature or object track of the associated instances, each feature or object track providing tracking information of respective associated instances;
  
  relating the at least one feature or object track to at least one macroblock of the raw video data to be encoded;
  
  producing an indirect model-based prediction of the at least one macroblock of the raw video data using the tracking information of the at least one related feature or object track, by using offsets between (i) the at least one macroblock of the raw video data and (ii) respective instances from the at least one related feature or object track to generate indirect predictions for the at least one macroblock of the raw video data, such that the feature or object track information is used indirectly to predict macroblocks instead of directly to predict the at least one feature or object, the indirect model-based prediction having model-based motion vectors;
  
  comparing the compression efficiency of a standards-compliant encoding derived from the model-based motion vectors with the compression efficiency of the H.264 macroblock encoding of the groups of pels in close proximity to each other exhibiting encoding complexity;
  
  caching the model-based motion vectors if it is determined that the standards-compliant encoding derived from the model-based motion vectors provides improved compression efficiency relative to the H.264 macroblock encoding of the groups of pels in close proximity to each other exhibiting encoding complexity; and
  
  incorporating the model-based motion vectors into a standards-compliant bit stream such that the model-based prediction is stored as standards-compliant encoded video data.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1 wherein detecting at least one of a feature or an object in a region of interest uses a detection algorithm, which is of a class of nonparametric feature detection algorithms.
  - 3. The method of claim 1, wherein the set of parameters includes information about the at least one of the feature or the object and is stored in memory.
  - 4. The method of claim 3, wherein the respective parameter of the respective feature includes a feature descriptor vector and a location of the respective feature.
  - 5. The method of claim 4, wherein the respective parameter is generated when the respective feature is detected.

6. A codec for encoding raw video data, comprising:
- an encoder encoding at least two frames of the raw video data to make an H.264 macroblock encoding;
  
  the encoder identifying, in the H.264 macroblock encoding, a groups of pixels in close proximity to each other exhibiting encoding complexity, such that the group of pixels of the H.264 macroblock encoding use a disproportionate amount of bandwidth computationally relative to other regions in one or more of the multiple frames of raw video; and
  
  the encoder responding to the group of pixels by forming tracking information by using;
  
  a feature-based detector identifying the group of pixels as instances of a feature in the at least two video frames from the raw video data, where each identified feature instance includes a plurality of pixels exhibiting encoding complexity relative to other pixels in one or more of the at least two video frames, and where feature instances are not aligned with the underlying macroblock grid;
  
  a modeler operatively coupled to the feature based detector and configured to create feature-based models modeling correspondence of the feature instances in two or more video frames, with all such feature instances related to at least one specific macroblock of video data to be encoded;
  
  a cache configured to cache the feature-based models and prioritize use of the feature-based models if it is determined that a standards-compliant encoding of associated video data that is derived from the feature-based models provides improved compression efficiency relative to the H.264 macroblock encoding of the group of pixels; and
  
  a prediction generator producing an indirect model-based prediction of the at least one specific macroblock of video data from its related feature instances, by using offsets between (i) the at least one macroblock of video data and (ii) the respective feature instances to generate indirect predictions for the at least one macroblock of video data, such that feature track information is used indirectly to predict macroblocks instead of directly to predict the feature instances, the indirect model-based prediction having model-based motion vectors, and said indirect model-based prediction including incorporating the model-based motion vectors into a standards-compliant bit stream such that the model-based prediction is stored as standards-compliant encoded video data.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
- - 7. The codec of claim 6, wherein the data complexity is determined when an encoding of the pixels by a conventional video compression technique exceeds a predetermined threshold.
  - 8. The codec of claim 6, wherein the data complexity is determined when a bandwidth amount allocated to encode the feature by conventional video compression technique exceeds a predetermined threshold.
  - 9. The codec of claim 8, wherein the predetermined threshold is at least one of:
    - a preset value, a preset value stored in a database, a value set as the average bandwidth amount allocated for previously encoded features, and a value set as the median bandwidth amount allocated for previously encoded features.
  - 10. The codec of claim 6, wherein the first video encoding process includes a motion compensation prediction process.
  - 11. The codec of claim 6, wherein the prioritization of use is determined by comparison of encoding costs for each potential solution within Competition Mode, a potential solution comprising a tracker, a primary prediction motion model, a primary prediction sampling scheme, a subtiling scheme for motion vector calculation and a reconstruction algorithm.
  - 12. The codec of claim 11, wherein the prioritization of use of the feature-based modeling initiates a use of that data complexity level of the feature instance as the threshold value, such that if a future feature instance exhibits the same or more data complexity level as the threshold value then the encoder automatically determines to initiate and use feature-based compression on the future feature instance.
  - 13. The codec of claim 6, wherein the feature detector utilizes one of an FPA tracker, an MBC tracker, and a SURF tracker.

14. A codec for encoding raw video data, comprising:
- an encoder encoding at least two frames of the raw video data to make an H.264 macroblock encoding;
  
  the encoder identifying, in the H.264 macroblock encoding, a groups of pixels in close proximity to each other exhibiting encoding complexity, such that the H.264 macroblock encoding of the group of pixels use a disproportionate amount of bandwidth computationally relative to other regions in the at least two frames of raw video;
  
  the encoder responding to the group of pixels by using;
  
  a feature-based detector identifying the group of pixels as an instance of a feature in at least two video frames of raw video data, an identified feature instance including a plurality of pixels exhibiting compression complexity relative to other pixels in at least one of the at least two video frames, with such identified feature not being aligned with the underlying macroblock grid;
  
  a modeler operatively coupled to the feature-based detector, wherein the modeler creates a feature-based model modeling correspondence of the respective identified feature instance in the at least two video frames, with all such feature instances related to at least one specific macroblock of video data to be encoded;
  
  a a cache caching the model-based motion vectors if it is determined that a standards compliant use of a respective feature-based model provides an improved compression efficiency when compared with the H.264 macroblock encoding of the group of pixels, said standards compliant use of the respective feature-based model including storing model based prediction information in an encoding stream; and
  
  a prediction generator producing an indirect model-based prediction for the at least one specific macroblock of video data from its related feature instances, by using offsets between (i) the at least one macroblock of video data and (ii) the respective feature instances to generate indirect predictions for the at least one macroblock of video data, such that feature track information is used indirectly to predict macroblocks instead of directly to predict the respective feature instances, the model-based prediction using model-based motion vectors from the cache; and
  
  said indirect model-based prediction including incorporating the model-based motion vectors into a standards-compliant bit stream such that the model-based prediction is stored as standards-compliant encoded video data.
- View Dependent Claims (15)
- - 15. The codec of claim 14, wherein the improved compression efficiency of the identified feature instance is determined by comparing the compression efficiency of the identified feature relative to one of:
    - a standards compliant encoding of the feature instance using a first video encoding process and a predetermined compression efficiency value stored in a database.

16. A method of encoding raw video data, comprising:
- encoding at least two frames of the raw video data to make an H.264 macroblock encoding;
  
  identifying, in the H.264 macroblock encoding, a groups of pixels in close proximity to each other exhibiting encoding complexity, such that the group of pixels of the H.264 macroblock encoding use a disproportionate amount of bandwidth computationally relative to other regions in one or more of the multiple frames of raw video; and
  
  identifying the group of pixels in the H.264 macroblock encoding as an instance of a feature in the at least two video frames from the raw video data, the feature instance not being aligned with the underlying macroblock grid;
  
  modeling a feature by vectorizing at least one of a feature pixel and a feature descriptor;
  
  identifying similar features not aligned with the underlying macroblock grid by;
  
  at least one of (a) minimizing means-squared error (MSE) and (b) maximizing inner products between different feature pixel vectors or feature descriptors; and
  
  applying a standard motion estimation and compensation algorithm to account for translational motion of the feature, resulting in identified similar features;
  
  associating the identified similar features with at least one specific macroblock of video data to be encoded; and
  
  from the identified similar features, generating an indirect model-based prediction for the at least one specific macroblock of video data, by using offsets between (i) the at least one macroblock of video data and (ii) the respective similar features to generate indirect predictions for the at least one macroblock of video data, such that feature track information used indirectly to predict macroblocks instead of directly to predict instances of the respective similar features, the indirect model-based prediction having model-based motion vectors, said indirect model-based prediction including;
  
  comparing the compression efficiency of a standards-compliant encoding derived from the model-based motion vectors with the compression efficiency of the H.264 macroblock encoding of the groups of pixels in close proximity to each other exhibiting encoding complexity;
  
  caching the model-based motion vectors if it is determined that the standards-compliant encoding derived from the model-based motion vectors provides improved compression efficiency relative to the H.264 macroblock encoding of the groups of pixels in close proximity to each other exhibiting encoding complexity; and
  
  incorporating the cached model-based motion vectors into a standards-compliant bit stream such that the feature modeling prediction and model-based motion vectors are stored as standards-compliant encoded video data.

17. A method of encoding raw video data, comprising:
- implementing a model-based prediction by configuring a codec to encode a target frame from raw video data;
  
  encoding a macroblock in the target frame using an H.264 macroblock encoding process, resulting in an H.264 macroblock encoding;
  
  analyzing the macroblock encoding such that the H.264 macroblock encoding is deemed to be at least one of efficient and inefficient according to a codec standard if, in the H.264 macroblock encoding, a groups of pixels in close proximity to each other are identified as exhibiting encoding complexity, such that the group of pixels of the H.264 macroblock encoding use a disproportionate amount of bandwidth computationally relative to other regions in one or more of the multiple frames of raw video;
  
  wherein if the H.264 macroblock encoding is deemed inefficient, analyzing candidate standards-compliant model-based encodings of the macroblock by generating several predictions for the macroblock based on multiple models, and applying the generated predictions, resulting in plural candidate standards-compliant model-based encodings of the macroblock including;
  
  detecting an instance of a feature in the target frame from the raw video data, the feature corresponding to the group of pixels exhibiting the encoding complexity identified in the H.264 macroblock encoding;
  
  the feature instance not being aligned with the underlying macroblock grid;
  
  modeling a feature by vectorizing at least one of a feature pixel and a feature descriptor;
  
  identifying similar features not aligned with the underlying macroblock grid by;
  
  at least one of (a) minimizing means-squared error (MSE) and (b) maximizing inner products between different feature pixel vectors or feature descriptors; and
  
  applying a standard motion estimation and compensation algorithm to account for translational motion of the feature, resulting in identified similar features;
  
  associating the identified similar features with at least one specific macroblock of video data to be encoded; and
  
  from the identified similar features, generating an indirect model-based prediction for the at least one specific macroblock of video data, by using offsets between (i) the at least one macroblock of video data and (ii) the respective similar features to generate indirect predictions for the at least one macroblock of video data, the indirect model-based prediction having model-based motion vectors, such that feature track information is used indirectly to predict macroblocks instead of directly to predict instances of the identified similar features, said indirect model-based prediction including incorporating feature modeling prediction information and model-based motion vectors from the cache into a standards-compliant bit stream such that the feature modeling prediction and model-based motion vectors are stored as one of the standards-compliant encodings of the macroblock;
  
  evaluating the resulting candidate standards-compliant model-based encodings of the macroblock according to encoding size;
  
  ranking the candidate standards-compliant model-based encodings of the macroblock a relative to the H.264 macroblock encoding of the groups of pixels;
  
  comparing the compression efficiency of the candidate standards-compliant encodings with the compression efficiency of the H.264 macroblock encoding of the groups of pixels; and
  
  encoding using the candidate standards-compliant it is determined that the candidate standards-compliant encoding provides improved compression efficiency relative to the H.264 macroblock encoding of the groups of pixels.
- View Dependent Claims (18, 19, 20, 21, 22, 23)
- - 18. The method of claim 17, wherein the conventional encoding of the macroblock is efficient if an encoding size is less than a predetermined threshold size.
  - 19. The method of claim 17, wherein the conventional encoding of the macroblock is efficient if the target macroblock is a skip macroblock.
  - 20. The method of claim 17, wherein the conventional encoding of the macroblock is inefficient if the encoding size is larger than a threshold.
  - 21. The method of claim 17, wherein if the conventional encoding of the macroblock is deemed inefficient, Competition Mode encodings for the macroblock are generated to compare their relative compression efficiencies.
  - 22. The method of claim 21, wherein the encoding algorithm for Competition Mode includes:
    - subtracting the prediction from the macroblock to generate a residual signal;
      
      transforming the residual signal using an approximation of a 2-D block-based DCT; and
      
      encoding transform coefficients using an entropy encoder.
  - 23. The method of claim 17 wherein the encoder being analyzed by generating several predictions includes generating a composite prediction that sums a primary prediction and a weighted version of a secondary prediction.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Euclid Discoveries LLC
Original Assignee
Euclid Discoveries LLC
Inventors
DeForest, Darin, Pace, Charles P., Lee, Nigel, Pizzorni, Renato
Primary Examiner(s)
Vo, Tung
Assistant Examiner(s)
Jiang, Zaihan

Application Number

US13/797,644
Publication Number

US 20130230099A1
Time in Patent Office

1,624 Days
Field of Search

37524008
US Class Current
CPC Class Codes

H04N 19/23   with coding of regions that...

H04N 19/50   using predictive coding H04...

H04N 19/51   Motion estimation or motion...

H04N 19/543   using regions

H04N 19/85   using pre-processing or pos...

Standards-compliant model-based video encoding and decoding

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Standards-compliant model-based video encoding and decoding

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links