Model-based video encoding and decoding
First Claim
1. A method of encoding video data, comprising:
- encoding raw video data using a multiple fidelities encoder executed via at least one computer processor by;
encoding the raw video data at multiple levels of fidelities for a model-based compression, such that encodings are provided at the multiple levels of fidelities;
(A) a macroblock encoding level, (B) a macroblock as feature encoding level, (C) a feature encoding level, and (D) an object encoding level, wherein the (A) macroblock encoding level uses a block-based motion estimation and compensation (BBMEC) application to find predictions for each tile from a limited search space in previously decoded reference frames, the macroblock encoding level generating an H.264 macroblock encoding prediction for a target macroblock;
wherein the (B) macroblock as feature encoding level (i) uses a first BBMEC application identical to the macroblock encoding level to find a first prediction for the target macroblock from a most-recent reference frame, (ii) uses a second BBMEC application to find a second prediction for the first prediction by searching in a second-most-recent frame, (iii) creates a track for the target macroblock by applying BBMEC applications through progressively older frames, and (iv) generates a macroblock as feature encoding level prediction from among the resulting track instances for the target macroblock;
wherein the (C) feature encoding level detects and tracks features independent of the macroblock grid and associates a feature with an overlapping macroblock such that a corresponding feature track of a feature overlapping the target macroblock is used to navigate previously-decoded reference frames to find a better match (prediction) for the overlapping macroblock by using offsets between the target macroblock and respective feature track instances to generate an indirect prediction for the target macroblock; and
if multiple features overlap the target macroblock, of the multiple features, the feature with greatest overlap with the target macroblock is selected to model the target macroblock, andwherein the (D) object encoding level detects and tracks objects (features that overlap at least portions of multiple macroblocks including the target macroblock), such that object tracks are used to navigate previously-decoded reference frames to find better matches (predictions) for all macroblocks overlapping the object by using offsets between the target macroblock and respective object track instances to generate an indirect prediction for the overlapping macroblocks, so that a single motion vector is calculated for all of the macroblocks associated with the object resulting in computation and encoding size savings, the better matches having the most encoding size savings;
comparing compression efficiency of the (A) macroblock encoding level, (B) macroblock as feature encoding level, (C) feature encoding level, and (D) object encoding level; and
selecting, based on the comparison of compression efficiency, which one of the encodings has the fewest number of bits;
(A) macroblock encoding level, (B) macroblock as feature encoding level, (C) feature encoding level, and (D) object encoding level.
3 Assignments
0 Petitions
Accused Products
Abstract
A model-based compression codec applies higher-level modeling to produce better predictions than can be found through conventional block-based motion estimation and compensation. Computer-vision-based feature and object detection algorithms identify regions of interest throughout the video datacube. The detected features and objects are modeled with a compact set of parameters, and similar feature/object instances are associated across frames. Associated features/objects are formed into tracks and related to specific blocks of video data to be encoded. The tracking information is used to produce model-based predictions for those blocks of data, enabling more efficient navigation of the prediction search space than is typically achievable through conventional motion estimation methods. A hybrid framework enables modeling of data at multiple fidelities and selects the appropriate level of modeling for each portion of video data.
216 Citations
15 Claims
-
1. A method of encoding video data, comprising:
-
encoding raw video data using a multiple fidelities encoder executed via at least one computer processor by; encoding the raw video data at multiple levels of fidelities for a model-based compression, such that encodings are provided at the multiple levels of fidelities;
(A) a macroblock encoding level, (B) a macroblock as feature encoding level, (C) a feature encoding level, and (D) an object encoding level, wherein the (A) macroblock encoding level uses a block-based motion estimation and compensation (BBMEC) application to find predictions for each tile from a limited search space in previously decoded reference frames, the macroblock encoding level generating an H.264 macroblock encoding prediction for a target macroblock;wherein the (B) macroblock as feature encoding level (i) uses a first BBMEC application identical to the macroblock encoding level to find a first prediction for the target macroblock from a most-recent reference frame, (ii) uses a second BBMEC application to find a second prediction for the first prediction by searching in a second-most-recent frame, (iii) creates a track for the target macroblock by applying BBMEC applications through progressively older frames, and (iv) generates a macroblock as feature encoding level prediction from among the resulting track instances for the target macroblock; wherein the (C) feature encoding level detects and tracks features independent of the macroblock grid and associates a feature with an overlapping macroblock such that a corresponding feature track of a feature overlapping the target macroblock is used to navigate previously-decoded reference frames to find a better match (prediction) for the overlapping macroblock by using offsets between the target macroblock and respective feature track instances to generate an indirect prediction for the target macroblock; and
if multiple features overlap the target macroblock, of the multiple features, the feature with greatest overlap with the target macroblock is selected to model the target macroblock, andwherein the (D) object encoding level detects and tracks objects (features that overlap at least portions of multiple macroblocks including the target macroblock), such that object tracks are used to navigate previously-decoded reference frames to find better matches (predictions) for all macroblocks overlapping the object by using offsets between the target macroblock and respective object track instances to generate an indirect prediction for the overlapping macroblocks, so that a single motion vector is calculated for all of the macroblocks associated with the object resulting in computation and encoding size savings, the better matches having the most encoding size savings; comparing compression efficiency of the (A) macroblock encoding level, (B) macroblock as feature encoding level, (C) feature encoding level, and (D) object encoding level; and selecting, based on the comparison of compression efficiency, which one of the encodings has the fewest number of bits;
(A) macroblock encoding level, (B) macroblock as feature encoding level, (C) feature encoding level, and (D) object encoding level. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method for encoding video data, comprising:
-
encoding raw video data using a multiple fidelities encoder executed via at least one computer processor by; encoding the raw video data at multiple levels of fidelities for a model-based compression, the multiple fidelities including at least the first three of a macroblock encoding level, a macroblock as feature encoding level, a feature encoding level, and an object encoding level, wherein the macroblock encoding level uses a block-based motion estimation and compensation (BBMEC) application to find predictions for each tile from a limited search space in previously decoded reference frames, the macroblock encoding level modeling generating an H.264 macroblock encoding prediction for a target macroblock, wherein the macroblock as feature encoding level (i) uses a first BBMEC application identical to the macroblock encoding level to find a first prediction for the target macroblock from a most-recent reference frame, (ii) uses a second BBMEC application to find a second prediction for the first prediction by searching in a second-most-recent frame, (iii) creates a track for the target macroblock by applying BBMEC applications through progressively older frames, and (iv) generates a prediction from among the resulting track instances; wherein the feature encoding level detects and tracks-features independent of the macroblock grid and associates a feature with an overlapping macroblock such that a corresponding feature track (the track belong to the feature overlapping the macroblock) is used to navigate previously-decoded reference frames to find a better match (prediction) for the overlapping macroblock by using offsets between the macroblock and respective feature track instances to generate an indirect prediction for the target macroblock; and
where multiple features overlap the target macroblock, the feature with greatest overlap is selected to model the target macroblock, andwherein the object encoding level detects and tracks objects (that overlap at least portions of multiple macroblocks) and associates an object with all overlapping macroblocks, such that object tracks are used to navigate previously-decoded reference frames to find better matches (predictions) for all macroblocks overlapping the object by using offsets between the target macroblock and respective object track instances to generate an indirect prediction for the overlapping macroblocks, so that a single motion vector is calculated for all of the macroblocks associated with the object resulting in computation and encoding size savings, the better matches having the most encoding size savings in terms of number of bits; generating model-based predictions for the target macroblock from the multiple encoding levels of fidelities; comparing compression efficiency of the multiple encoding levels of fidelities; and determining, based on the comparison of the compression efficiencies, the best prediction for the target macroblock from among the multiple encoding levels of fidelities based on which of the model-based prediction from the multiple encoding levels of fidelities has the smallest encoding size in terms of the number of bits; and integrating the best prediction with the additional steps of transform, quantization, and entropy encoding to produce a best encoding for the target macroblock.
-
-
15. A data processing system encoding video data, comprising:
-
a multiple fidelities encoder stored on a non-transitory medium, executed by at least one computer processor, that encodes raw video data at multiple encoding levels of fidelities for a model-based compression, the multiple encoding levels of fidelities including a macroblock encoding level, a macroblock as feature encoding level, a feature encoding level, and an object encoding level, the multiple encoding fidelities encoder to model the data using the multiple fidelities for model-based compression including; the macroblock encoding level using a block-based motion estimation and compensation (BBMEC) application to find predictions for each tile from a limited search space in previously decoded reference frames, the macroblock level modeling generating an H.264 macroblock encoding prediction for a target macroblock, the macroblock as feature encoding level (i) using a first BBMEC process substantially identical to the macroblock encoding level to find a first prediction for the target macroblock from a most-recent reference frame, (ii) using a second BBMEC application to find a second prediction for the first prediction by searching in a second-most-recent frame, (iii) creating a track for the target macroblock by applying BBMEC applications through progressively older frames, and (iv) generating a macroblock as feature encoding level prediction from among the resulting track instances; the feature encoding level detecting and tracking features independent of the macroblock grid and associating a feature with an overlapping macroblock such that a corresponding feature track (the track belonging to the feature overlapping the macroblock) is used to navigate previously-decoded reference frames to find a better match (prediction) for the overlapping macroblock by using offsets between the macroblock and respective feature track instances to generate an indirect prediction for the macroblock; and
where multiple features overlap a given target macroblock, the feature with greatest overlap is selected to model that target macroblock, andthe object encoding level detecting and tracking objects (features that overlap at least portions of multiple macroblocks) and associating an object with all overlapping macroblocks, such that object tracks are used to navigate previously-decoded reference frames to find better matches (predictions) for all macroblocks overlapping the object by using offsets between the target macroblock and respective object track instances to generate an indirect prediction for the overlapping macroblocks, so that a single motion vector is calculated for all of the macroblocks associated with the object resulting in computation and encoding size savings, the better matches having the most encoding size savings; the modeler generating model-based predictions for the target macroblock from the multiple encoding levels of fidelities; an encoder, in communication with the modeler, to determine the best prediction for the target macroblock from among the multiple encoding levels of fidelities; and the encoder integrating the best prediction with transform, quantization, and entropy encoding to produce a best encoding for the target macroblock.
-
Specification