STANDARDS-COMPLIANT MODEL-BASED VIDEO ENCODING AND DECODING
First Claim
1. A method for processing video data, comprising:
- receiving multiple frames of video dataforming tracking information by;
detecting at least one of a feature and an object in a region of interest of the video data using a detection algorithm in at least one frame;
modeling the detected at least one of the feature and the object using a set of parameters; and
associating any instances of the detected and modeled at least one of the feature and the object across plural frames of the video data, resulting inat least one track of the associated instances,each track providing tracking information of respective associated instances;
relating the at least one track to at least one specific block of video data to be encoded; and
producing a model-based prediction for the at least one specific block of video data using the tracking information of the at least one related track, the model-based prediction having model-based motion vectors, and said producing including incorporating the model-based motion vectors into a standards-compliant bit stream such that the model-based prediction is stored as standards-compliant encoded video data.
3 Assignments
0 Petitions
Accused Products
Abstract
A model-based compression codec applies higher-level modeling to produce better predictions than can be found through conventional block-based motion estimation and compensation. Computer-vision-based feature and object detection algorithms identify regions of interest throughout the video datacube. The detected features and objects are modeled with a compact set of parameters, and similar feature/object instances are associated across frames. Associated features/objects are formed into tracks and related to specific blocks of video data to be encoded. The tracking information is used to produce model-based predictions for those blocks of data, enabling more efficient navigation of the prediction search space than is typically achievable through conventional motion estimation methods. A hybrid framework enables modeling of data at multiple fidelities and selects the appropriate level of modeling for each portion of video data. A compliant-stream version of the model-based compression codec uses the modeling information indirectly to improve compression while producing bitstreams that can be interpreted by standard decoders.
-
Citations
32 Claims
-
1. A method for processing video data, comprising:
-
receiving multiple frames of video data forming tracking information by; detecting at least one of a feature and an object in a region of interest of the video data using a detection algorithm in at least one frame; modeling the detected at least one of the feature and the object using a set of parameters; and associating any instances of the detected and modeled at least one of the feature and the object across plural frames of the video data, resulting in at least one track of the associated instances, each track providing tracking information of respective associated instances; relating the at least one track to at least one specific block of video data to be encoded; and producing a model-based prediction for the at least one specific block of video data using the tracking information of the at least one related track, the model-based prediction having model-based motion vectors, and said producing including incorporating the model-based motion vectors into a standards-compliant bit stream such that the model-based prediction is stored as standards-compliant encoded video data. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7-11. -11. (canceled)
-
12. A codec for processing video data, comprising:
-
a feature-based detector configured to identify instances of a feature in at least two video frames, where each identified feature instance includes a plurality of pixels exhibiting data complexity relative to other pixels in one or more of the at least two video frames; a modeler operatively coupled to the feature based detector and configured to create feature-based models modeling correspondence of the feature instances in two or more video frames; and a cache configured to prioritize use of the feature-based models if it is determined that a standards-compliant encoding of associated video data that is derived from the feature-based models provides improved compression efficiency relative to a standards-compliant encoding of the associated video data that uses a first video encoding process. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A codec for processing video data, comprising:
-
a feature-based detector to identify an instance of a feature in at least two video frames, an identified feature instance including a plurality of pixels exhibiting data complexity relative to other pixels in at least one of the at least two video frames; a modeler operatively coupled to the feature-based detector, wherein the modeler creates a feature-based model modeling correspondence of the respective identified feature instance in the at least two video frames; and a memory, wherein for a plurality of the feature-based models, the memory prioritizes standards compliant use of a respective feature-based model if an improved compression efficiency of associated video data is determined, said standards compliant use of the respective feature-based model including storing model based prediction information in an encoding stream. - View Dependent Claims (21)
-
-
22. A method for processing video data, comprising:
-
modeling a feature by vectorizing at least one of a feature pel and a feature descriptor; identifying similar features by at least one of (a) minimizing means-squared error (MSE) and (b) maximizing inner products between different feature pel vectors or feature descriptors; and applying a standard motion estimation and compensation algorithm to account for translational motion of the feature, resulting in identified similar features; from the identified similar features, producing feature modeling prediction information and deriving motion vectors; storing the feature modeling prediction information in standards-compliant encoded video data including encoding motion vectors.
-
-
23. A method for processing video data, comprising:
-
implementing a model-based prediction by configuring a codec to encode a target frame; encoding a macroblock in the target frame using a conventional encoding process, resulting in a macroblock encoding; analyzing the macroblock encoding such that the macroblock encoding is deemed to be at least one of efficient and inefficient according to a codec standard; wherein if the macroblock encoding is deemed inefficient, analyzing candidate standards-compliant model-based encodings of the macroblock by generating several predictions for the macroblock based on multiple models, and applying the generated predictions, resulting in plural candidate standards-compliant model-based encodings of the macroblock, evaluating the resulting candidate standards-compliant model-based encodings of the macroblock according to encoding size; and ranking the candidate standards-compliant model-based encodings of the macroblock along with the conventionally encoded macroblock. - View Dependent Claims (24, 25, 26, 27, 28, 29)
-
-
30. A method for processing video data, comprising:
-
modeling data at multiple fidelities in a model-based compression, the multiple fidelities including at least one of a macroblock level, a macroblock as feature level, a feature level, and an object level, wherein the macroblock level uses a block-based motion estimation and compensation (BBMEC) application to find predictions for each tile from a limited search space in previously decoded reference frames, wherein the macroblock as feature level (i) uses a first BBMEC application identical to the macroblock level to find a first prediction for a target macroblock from a most-recent reference frame, (ii) uses a second BBMEC application to find a second prediction for the first prediction by searching in a second-most-recent frame, and (iii) creates a track for the target macroblock by applying BBMEC applications through progressively older frames, wherein the feature level detects and tracks features independent of the macroblock grid and associates the features with overlapping macroblocks such that feature tracks are used to navigate previously-decoded reference frames to find better matches for the overlapping macroblocks; and
where multiple features overlap a given target macroblock, the feature with greatest overlap is selected to model that target macroblock, and the feature tracks identifying certain motion vectors, andwherein the object level an object encompasses or overlaps multiple macroblocks, a single motion vector can be calculated for all of the macroblocks associated with the object to result in computation and encoding size savings; and storing one of model-based prediction information and motion vectors in a standards compliant bit stream resulting in standards compliant encoded video data. - View Dependent Claims (31, 32)
-
Specification