Standards-compliant model-based video encoding and decoding
First Claim
1. A method of encoding raw video data, comprising:
- receiving multiple frames of raw video data;
encoding the multiple frames of the raw video data to make an H.264 macroblock encoding;
identifying, in the H.264 macroblock encoding, a groups of pels in close proximity to each other exhibiting encoding complexity, such that the group of pels of the H.264 macroblock encoding use a disproportionate amount of bandwidth computationally relative to other regions in one or more of the multiple frames of raw video;
responding to the identified group of pels by forming tracking information including;
detecting, in the identified group of pels, at least one of a feature or an object in a region of interest of at least one frame of the raw video data, the region of interest of the detected at least one feature not being aligned with the underlying macroblock grid;
modeling the detected at least one of the feature and the object using a set of parameters; and
associating any instances of the detected and modeled at least one of the feature or the object across plural frames of the raw video data providing at least one feature or object track of the associated instances, each feature or object track providing tracking information of respective associated instances;
relating the at least one feature or object track to at least one macroblock of the raw video data to be encoded;
producing an indirect model-based prediction of the at least one macroblock of the raw video data using the tracking information of the at least one related feature or object track, by using offsets between (i) the at least one macroblock of the raw video data and (ii) respective instances from the at least one related feature or object track to generate indirect predictions for the at least one macroblock of the raw video data, such that the feature or object track information is used indirectly to predict macroblocks instead of directly to predict the at least one feature or object, the indirect model-based prediction having model-based motion vectors;
comparing the compression efficiency of a standards-compliant encoding derived from the model-based motion vectors with the compression efficiency of the H.264 macroblock encoding of the groups of pels in close proximity to each other exhibiting encoding complexity;
caching the model-based motion vectors if it is determined that the standards-compliant encoding derived from the model-based motion vectors provides improved compression efficiency relative to the H.264 macroblock encoding of the groups of pels in close proximity to each other exhibiting encoding complexity; and
incorporating the model-based motion vectors into a standards-compliant bit stream such that the model-based prediction is stored as standards-compliant encoded video data.
3 Assignments
0 Petitions
Accused Products
Abstract
A model-based compression codec applies higher-level modeling to produce better predictions than can be found through conventional block-based motion estimation and compensation. Computer-vision-based feature and object detection algorithms identify regions of interest throughout the video datacube. The detected features and objects are modeled with a compact set of parameters, and similar feature/object instances are associated across frames. Associated features/objects are formed into tracks and related to specific blocks of video data to be encoded. The tracking information is used to produce model-based predictions for those blocks of data, enabling more efficient navigation of the prediction search space than is typically achievable through conventional motion estimation methods. A hybrid framework enables modeling of data at multiple fidelities and selects the appropriate level of modeling for each portion of video data. A compliant-stream version of the model-based compression codec uses the modeling information indirectly to improve compression while producing bitstreams that can be interpreted by standard decoders.
-
Citations
23 Claims
-
1. A method of encoding raw video data, comprising:
-
receiving multiple frames of raw video data; encoding the multiple frames of the raw video data to make an H.264 macroblock encoding; identifying, in the H.264 macroblock encoding, a groups of pels in close proximity to each other exhibiting encoding complexity, such that the group of pels of the H.264 macroblock encoding use a disproportionate amount of bandwidth computationally relative to other regions in one or more of the multiple frames of raw video; responding to the identified group of pels by forming tracking information including; detecting, in the identified group of pels, at least one of a feature or an object in a region of interest of at least one frame of the raw video data, the region of interest of the detected at least one feature not being aligned with the underlying macroblock grid; modeling the detected at least one of the feature and the object using a set of parameters; and associating any instances of the detected and modeled at least one of the feature or the object across plural frames of the raw video data providing at least one feature or object track of the associated instances, each feature or object track providing tracking information of respective associated instances; relating the at least one feature or object track to at least one macroblock of the raw video data to be encoded; producing an indirect model-based prediction of the at least one macroblock of the raw video data using the tracking information of the at least one related feature or object track, by using offsets between (i) the at least one macroblock of the raw video data and (ii) respective instances from the at least one related feature or object track to generate indirect predictions for the at least one macroblock of the raw video data, such that the feature or object track information is used indirectly to predict macroblocks instead of directly to predict the at least one feature or object, the indirect model-based prediction having model-based motion vectors; comparing the compression efficiency of a standards-compliant encoding derived from the model-based motion vectors with the compression efficiency of the H.264 macroblock encoding of the groups of pels in close proximity to each other exhibiting encoding complexity; caching the model-based motion vectors if it is determined that the standards-compliant encoding derived from the model-based motion vectors provides improved compression efficiency relative to the H.264 macroblock encoding of the groups of pels in close proximity to each other exhibiting encoding complexity; and incorporating the model-based motion vectors into a standards-compliant bit stream such that the model-based prediction is stored as standards-compliant encoded video data. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A codec for encoding raw video data, comprising:
-
an encoder encoding at least two frames of the raw video data to make an H.264 macroblock encoding; the encoder identifying, in the H.264 macroblock encoding, a groups of pixels in close proximity to each other exhibiting encoding complexity, such that the group of pixels of the H.264 macroblock encoding use a disproportionate amount of bandwidth computationally relative to other regions in one or more of the multiple frames of raw video; and the encoder responding to the group of pixels by forming tracking information by using; a feature-based detector identifying the group of pixels as instances of a feature in the at least two video frames from the raw video data, where each identified feature instance includes a plurality of pixels exhibiting encoding complexity relative to other pixels in one or more of the at least two video frames, and where feature instances are not aligned with the underlying macroblock grid; a modeler operatively coupled to the feature based detector and configured to create feature-based models modeling correspondence of the feature instances in two or more video frames, with all such feature instances related to at least one specific macroblock of video data to be encoded; a cache configured to cache the feature-based models and prioritize use of the feature-based models if it is determined that a standards-compliant encoding of associated video data that is derived from the feature-based models provides improved compression efficiency relative to the H.264 macroblock encoding of the group of pixels; and a prediction generator producing an indirect model-based prediction of the at least one specific macroblock of video data from its related feature instances, by using offsets between (i) the at least one macroblock of video data and (ii) the respective feature instances to generate indirect predictions for the at least one macroblock of video data, such that feature track information is used indirectly to predict macroblocks instead of directly to predict the feature instances, the indirect model-based prediction having model-based motion vectors, and said indirect model-based prediction including incorporating the model-based motion vectors into a standards-compliant bit stream such that the model-based prediction is stored as standards-compliant encoded video data. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
-
-
14. A codec for encoding raw video data, comprising:
-
an encoder encoding at least two frames of the raw video data to make an H.264 macroblock encoding; the encoder identifying, in the H.264 macroblock encoding, a groups of pixels in close proximity to each other exhibiting encoding complexity, such that the H.264 macroblock encoding of the group of pixels use a disproportionate amount of bandwidth computationally relative to other regions in the at least two frames of raw video; the encoder responding to the group of pixels by using; a feature-based detector identifying the group of pixels as an instance of a feature in at least two video frames of raw video data, an identified feature instance including a plurality of pixels exhibiting compression complexity relative to other pixels in at least one of the at least two video frames, with such identified feature not being aligned with the underlying macroblock grid; a modeler operatively coupled to the feature-based detector, wherein the modeler creates a feature-based model modeling correspondence of the respective identified feature instance in the at least two video frames, with all such feature instances related to at least one specific macroblock of video data to be encoded; a a cache caching the model-based motion vectors if it is determined that a standards compliant use of a respective feature-based model provides an improved compression efficiency when compared with the H.264 macroblock encoding of the group of pixels, said standards compliant use of the respective feature-based model including storing model based prediction information in an encoding stream; and a prediction generator producing an indirect model-based prediction for the at least one specific macroblock of video data from its related feature instances, by using offsets between (i) the at least one macroblock of video data and (ii) the respective feature instances to generate indirect predictions for the at least one macroblock of video data, such that feature track information is used indirectly to predict macroblocks instead of directly to predict the respective feature instances, the model-based prediction using model-based motion vectors from the cache; and
said indirect model-based prediction including incorporating the model-based motion vectors into a standards-compliant bit stream such that the model-based prediction is stored as standards-compliant encoded video data. - View Dependent Claims (15)
-
-
16. A method of encoding raw video data, comprising:
-
encoding at least two frames of the raw video data to make an H.264 macroblock encoding; identifying, in the H.264 macroblock encoding, a groups of pixels in close proximity to each other exhibiting encoding complexity, such that the group of pixels of the H.264 macroblock encoding use a disproportionate amount of bandwidth computationally relative to other regions in one or more of the multiple frames of raw video; and identifying the group of pixels in the H.264 macroblock encoding as an instance of a feature in the at least two video frames from the raw video data, the feature instance not being aligned with the underlying macroblock grid; modeling a feature by vectorizing at least one of a feature pixel and a feature descriptor; identifying similar features not aligned with the underlying macroblock grid by; at least one of (a) minimizing means-squared error (MSE) and (b) maximizing inner products between different feature pixel vectors or feature descriptors; and applying a standard motion estimation and compensation algorithm to account for translational motion of the feature, resulting in identified similar features; associating the identified similar features with at least one specific macroblock of video data to be encoded; and from the identified similar features, generating an indirect model-based prediction for the at least one specific macroblock of video data, by using offsets between (i) the at least one macroblock of video data and (ii) the respective similar features to generate indirect predictions for the at least one macroblock of video data, such that feature track information used indirectly to predict macroblocks instead of directly to predict instances of the respective similar features, the indirect model-based prediction having model-based motion vectors, said indirect model-based prediction including; comparing the compression efficiency of a standards-compliant encoding derived from the model-based motion vectors with the compression efficiency of the H.264 macroblock encoding of the groups of pixels in close proximity to each other exhibiting encoding complexity; caching the model-based motion vectors if it is determined that the standards-compliant encoding derived from the model-based motion vectors provides improved compression efficiency relative to the H.264 macroblock encoding of the groups of pixels in close proximity to each other exhibiting encoding complexity; and incorporating the cached model-based motion vectors into a standards-compliant bit stream such that the feature modeling prediction and model-based motion vectors are stored as standards-compliant encoded video data.
-
-
17. A method of encoding raw video data, comprising:
-
implementing a model-based prediction by configuring a codec to encode a target frame from raw video data; encoding a macroblock in the target frame using an H.264 macroblock encoding process, resulting in an H.264 macroblock encoding; analyzing the macroblock encoding such that the H.264 macroblock encoding is deemed to be at least one of efficient and inefficient according to a codec standard if, in the H.264 macroblock encoding, a groups of pixels in close proximity to each other are identified as exhibiting encoding complexity, such that the group of pixels of the H.264 macroblock encoding use a disproportionate amount of bandwidth computationally relative to other regions in one or more of the multiple frames of raw video; wherein if the H.264 macroblock encoding is deemed inefficient, analyzing candidate standards-compliant model-based encodings of the macroblock by generating several predictions for the macroblock based on multiple models, and applying the generated predictions, resulting in plural candidate standards-compliant model-based encodings of the macroblock including; detecting an instance of a feature in the target frame from the raw video data, the feature corresponding to the group of pixels exhibiting the encoding complexity identified in the H.264 macroblock encoding;
the feature instance not being aligned with the underlying macroblock grid;modeling a feature by vectorizing at least one of a feature pixel and a feature descriptor; identifying similar features not aligned with the underlying macroblock grid by; at least one of (a) minimizing means-squared error (MSE) and (b) maximizing inner products between different feature pixel vectors or feature descriptors; and applying a standard motion estimation and compensation algorithm to account for translational motion of the feature, resulting in identified similar features; associating the identified similar features with at least one specific macroblock of video data to be encoded; and from the identified similar features, generating an indirect model-based prediction for the at least one specific macroblock of video data, by using offsets between (i) the at least one macroblock of video data and (ii) the respective similar features to generate indirect predictions for the at least one macroblock of video data, the indirect model-based prediction having model-based motion vectors, such that feature track information is used indirectly to predict macroblocks instead of directly to predict instances of the identified similar features, said indirect model-based prediction including incorporating feature modeling prediction information and model-based motion vectors from the cache into a standards-compliant bit stream such that the feature modeling prediction and model-based motion vectors are stored as one of the standards-compliant encodings of the macroblock; evaluating the resulting candidate standards-compliant model-based encodings of the macroblock according to encoding size; ranking the candidate standards-compliant model-based encodings of the macroblock a relative to the H.264 macroblock encoding of the groups of pixels; comparing the compression efficiency of the candidate standards-compliant encodings with the compression efficiency of the H.264 macroblock encoding of the groups of pixels; and encoding using the candidate standards-compliant it is determined that the candidate standards-compliant encoding provides improved compression efficiency relative to the H.264 macroblock encoding of the groups of pixels. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
Specification