Context Based Video Encoding and Decoding
First Claim
1. A method for processing video data, comprising:
- detecting at least one of a feature and an object in the region of interest using a detection algorithm in at least one frame;
modeling the detected at least one of the feature and the object using a set of parameters;
associating any instances of the at least one of the feature and the object across frames;
forming at least one track of the associated instances;
relating the at least one track to at least one specific block of video data to be encoded; and
producing a model-based prediction for the at least one specific block of video data using the related track information, said producing including storing the model-based prediction as processed video data.
3 Assignments
0 Petitions
Accused Products
Abstract
A model-based compression codec applies higher-level modeling to produce better predictions than can be found through conventional block-based motion estimation and compensation. Computer-vision-based feature and object detection algorithms identify regions of interest throughout the video datacube. The detected features and objects are modeled with a compact set of parameters, and similar feature/object instances are associated across frames. Associated features/objects are formed into tracks and related to specific blocks of video data to be encoded. The tracking information is used to produce model-based predictions for those blocks of data, enabling more efficient navigation of the prediction search space than is typically achievable through conventional motion estimation methods. A hybrid framework enables modeling of data at multiple fidelities and selects the appropriate level of modeling for each portion of video data.
38 Citations
32 Claims
-
1. A method for processing video data, comprising:
-
detecting at least one of a feature and an object in the region of interest using a detection algorithm in at least one frame; modeling the detected at least one of the feature and the object using a set of parameters; associating any instances of the at least one of the feature and the object across frames; forming at least one track of the associated instances; relating the at least one track to at least one specific block of video data to be encoded; and producing a model-based prediction for the at least one specific block of video data using the related track information, said producing including storing the model-based prediction as processed video data. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for processing video data, comprising:
-
detecting at least one of a feature and an object in the region of interest; modeling the at least one of the feature and the object using a set of parameters; associating any instances of the at least one of the feature and the object across frames; forming at least one matrix of the associated instances; relating the at least one matrix to at least one specific block of video data to be encoded; and producing a model-based prediction for the at least one specific block of video data using the related matrix information, said producing storing the model-based prediction as processed video data. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A codec for processing video data, comprising:
-
a feature-based detector configured to identify instances of a feature in at least two video frames, where each identified feature instance includes a plurality of pixels exhibiting data complexity relative to other pixels in the one or more video frames; a modeler operatively coupled to the feature based detector and configured to create feature-based correspondence models modeling correspondence of the feature instances in two or more video frames; and a cache configured to prioritize use of the feature-based correspondence models if it is determined that an encoding of the feature instances using the feature-based correspondence models provides improved compression efficiency relative to an encoding of the feature instances using a first video encoding process. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A codec for processing video data, comprising:
-
a feature-based detector to identify an instance of a feature in at least two video frames, an identified feature instance including a plurality of pixels exhibiting data complexity relative to other pixels in at least one of the at least two video frames; a modeler operatively coupled to the feature-based detector, wherein the modeler creates a feature-based correspondence model modeling correspondence of the respective identified feature instance in the at least two video frames; and a memory, wherein for a plurality of the feature-based correspondence models, the memory prioritizes use of a respective feature-based correspondence model if an improved compression efficiency of the identified feature instance is determined. - View Dependent Claims (21)
-
-
22. A method for processing video data, comprising:
-
modeling a feature by vectorizing at least one of a feature pel and a feature descriptor; identifying similar features by at least one of (a) minimizing means-squared error (MSE) and (b) maximizing inner products between different feature pel vectors or feature descriptors; and applying a standard motion estimation and compensation algorithm to account for translational motion of the feature, resulting in processed video data.
-
-
23. A method for processing video data, comprising:
-
implementing a model-based prediction by configuring a codec to encode a target frame; encoding a macroblock in the target frame using a conventional encoding process; analyzing the macroblock encoding, wherein the conventional encoding of the macroblock is deemed to be at least one of efficient and inefficient; wherein if the conventional encoding is deemed inefficient, the encoder is analyzed by generating several predictions for the macroblock based on multiple models, and wherein the evaluation of the several predictions of the macroblock are based on an encoding size; and ranking the predictions of the macroblock with the conventionally encoded macroblock; - View Dependent Claims (24, 25, 26, 27, 28, 29)
-
-
30. A method for processing video data, comprising:
-
modeling data at multiple fidelities for a model-based compression, the multiple fidelities including at least one of a macroblock level, a macroblock as feature level, a feature level, and an object level, wherein the macroblock level uses a block-based motion estimation and compensation (BBMEC) application to find predictions for each tile from a limited search space in previously decoded reference frames, wherein the macroblock as feature level (i) uses a first BBMEC application identical to the macroblock level to find a first prediction for a target macroblock from a most-recent reference frame, (ii) uses a second BBMEC application to find a second prediction for the first prediction by searching in a second-most-recent frame, and (iii) creates a track for the target macroblock by applying BBMEC applications through progressively older frames, wherein the feature level detects and tracks features independent of the macroblock grid and associates the features with overlapping macroblocks such that feature tracks are used to navigate previously-decoded reference frames to find better matches for the overlapping macroblocks; and
where multiple features overlap a given target macroblock, the feature with greatest overlap is selected to model that target macroblock, andwherein the object level an object encompasses or overlaps multiple macroblocks, a single motion vector can be calculated for all of the macroblocks associated with the object to result in computation and encoding size savings. - View Dependent Claims (31, 32)
-
Specification