Feature-Based Video Compression

US 20110182352A1
Filed: 10/06/2009
Published: 07/28/2011
Est. Priority Date: 03/31/2005
Status: Active Grant

First Claim

Patent Images

1. A computer method of processing video data comprising the computer implemented steps of:

receiving video data formed of a series of video frames; and

encoding portions of the video frames by;

detecting one or more instances of a candidate feature in one or more of the video frames;

said detection determining positional information for instances in the one or more previously decoded video frames, the positional information including a frame number, a position within that frame, and a spatial perimeter of the instance;

said candidate feature being a set of one or more detected instances;

predicting, by a motion compensated prediction process, a portion of a current video frame in the series using one or more previously decoded video frames;

said motion compensated prediction process being initialized with positional predictions, where the positional predictions provide the positional information from detected feature instances in previously decoded video frames;

using one or more of the candidate feature instances that are transformed by augmenting the motion compensated prediction process, defining one or more features along with the transformed instances to create a first feature-based model, the first feature-based model enabling prediction in the current frame of an appearance and a source position of a substantially matching feature instance, where the substantially matching feature instance is a key feature instance;

comparing the first feature-based model to a conventional video encoding model of the one or more defined features, and determining from the comparison which model enables greater encoding compression; and

using results of the comparing and determining step, applying feature-based encoding to portions of one or more of the video frames, and applying conventional video encoding to other portions of the one or more video frames.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods of processing video data are provided. Video data having a series of video frames is received and processed. One or more instances of a candidate feature are detected in the video frames. The previously decoded video frames are processed to identify potential matches of the candidate feature. When a substantial amount of portions of previously decoded video frames include instances of the candidate feature, the instances of the candidate feature are aggregated into a set. The candidate feature set is used to create a feature-based model. The feature-based model includes a model of deformation variation and a model of appearance variation of instances of the candidate feature. The feature-based model compression efficiency is compared with the conventional video compression efficiency.

Citations

35 Claims

1. A computer method of processing video data comprising the computer implemented steps of:
- receiving video data formed of a series of video frames; and
  
  encoding portions of the video frames by;
  
  detecting one or more instances of a candidate feature in one or more of the video frames;
  
  said detection determining positional information for instances in the one or more previously decoded video frames, the positional information including a frame number, a position within that frame, and a spatial perimeter of the instance;
  
  said candidate feature being a set of one or more detected instances;
  
  predicting, by a motion compensated prediction process, a portion of a current video frame in the series using one or more previously decoded video frames;
  
  said motion compensated prediction process being initialized with positional predictions, where the positional predictions provide the positional information from detected feature instances in previously decoded video frames;
  
  using one or more of the candidate feature instances that are transformed by augmenting the motion compensated prediction process, defining one or more features along with the transformed instances to create a first feature-based model, the first feature-based model enabling prediction in the current frame of an appearance and a source position of a substantially matching feature instance, where the substantially matching feature instance is a key feature instance;
  
  comparing the first feature-based model to a conventional video encoding model of the one or more defined features, and determining from the comparison which model enables greater encoding compression; and
  
  using results of the comparing and determining step, applying feature-based encoding to portions of one or more of the video frames, and applying conventional video encoding to other portions of the one or more video frames.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
- - 2. A method as claimed in claim 1 wherein detecting one or more instances of a candidate feature in one or more of the video frames further includes:
    - detecting at least one instance of a candidate feature by identifying a spatially continuous group of pels having substantially close spatial proximity; and
      
      said identified pels defining a portion of one of the one or more video frames.
  - 3. A method as claimed in claim 2 wherein detecting one or more instances of a candidate feature in one or more of the video frames further includes:
    - using the motion compensated prediction process, selecting, from a plurality of candidate feature instances, one or more instances that are predicted to provide encoding efficiency; and
      
      determining a segmentation of the current instance of the candidate feature from other features and non-features in the current video frame based on the motion compensated prediction process'"'"' selection of predictions from unique previously decoded video frames.
  - 4. A method as claimed in claim 2 wherein said motion compensated prediction process being further initialized using feature instances belonging to one or more features, such features having instances in the current frame coincident with the video portion, where the video portion is in the current frame.
  - 5. A method as claimed in claim 2 wherein the group of pels further includes one or more:
    - macroblocks or portions of macroblocks.
  - 6. A method as claimed in claim 1 further including forming a second feature-based model by:
    - using the first feature-based model as a target of prediction for one or more motion compensated predictions from one or more feature instances, yielding a set of predictions of the first feature-based model; and
      
      upon being combined, the set of predictions becoming the second feature-based model.
  - 7. A method as claimed in claim 6 wherein the second feature-based model is used to model residual of first feature-based model including:
    - modeling structural variation and appearance variation of the second-feature based model relative to the residual;
      
      encoding the residual with the model yielding appearance and deformation parameters; and
      
      using the parameters to reduce the encoding size of the residual.
  - 8. A method as claimed in claim 1 wherein defining one or more features further includes defining one or more aggregate features based on one or more of the instances of the candidate feature by:
    - aggregating the instances of different candidate features into an aggregate candidate feature; and
      
      using the set of instances of the aggregate candidate feature to form a region substantially larger than the original instances of un-aggregated candidate features, where the larger region is formed through the identification of coherency among the instances of the candidate feature in the set.
  - 9. A method as claimed in claim 8 wherein the coherency is defined as appearance correspondences in the instances substantially approximated by a lower parameter motion model.
  - 10. A method as claimed in claim 7 where the second feature-model provides an optional rectangular area extent of pels associated with that instance in the decoded frame relative to the spatial position.
  - 11. A method as claimed in claim 10 wherein the second feature-model is derived by modeling prior normalized instances of the feature;
    - andwhere the prior normalized instances are any one of the following;
      
      the instance in the current frame, an instance that is from a previously decoded frame that is substantially recent, or an average of the instances from the previously decoded video frames.
  - 12. A method as claimed in claim 11 where the appearance model is represented by a PCA decomposition of the normalized second feature-based model instances.
  - 13. A method as claimed in claim 10 further comprising determining a deformation model of the spatial variation of correspondences in the feature instances of each set as compared to their second feature-based model instances;
    - for each feature instance in the set, using one or more of the following to approximate variation in the deformation instances for the deformation model;
      
      a motion compensated prediction process, mesh deformation, and a motion model with a substantially reduced parameterization;
      
      integrating the deformation instances into the deformation model; and
      
      where the variation in the deformation model is represented by a PCA decomposition.
  - 14. A method as claimed in claim 1 wherein the motion compensated prediction process operates on a selection of a substantially larger number of the previously decoded video frames than in conventional video data encoding;
    - andwhere the selection of previously decoded video frames does not rely on user supervision.
  - 15. A method as claimed in claim 1 wherein applying conventional video encoding in response to the comparing and determining step further includes augmenting the conventional video encoding by an instance prediction process that enables greater compression of portions of one or more of the video frames in memory when forming a prediction of portions of the current frame;
    - andwhere said instance prediction process further includes;
      
      using the feature-based model to determine one or more instances of the defined feature that are incident to a target macroblock being encoded to form the predicted portions of the current frame; and
      
      using the feature-based model, synthesizing pels to predict portions of the current frame.
  - 16. A method as claimed in claim 15 wherein applying conventional video encoding to portions of one or more of the video frames in response to the comparing and determining step further includes:
    - assigning a probability for the previously decoded video frames, where the probability is based on the combined predicted encoding performance improvement for the frame determined using positional predictions from the motion compensated prediction process;
      
      defining the probability as the combined encoding performance of motion compensated prediction process utilized during the analysis of the first feature-based model and a second feature-based model for the current frame;
      
      determining an indexing based on sorting the previously decoded video frames based on their probability, from best to worst; and
      
      truncating the indexed list based on computational and memory requirements.
  - 17. A method as claimed in claim 15 further including reusing the feature instance'"'"'s predicted pels for predicting other feature instances in the current frame in response to determining that:
    - one or more instances of the defined feature overlaps more than one macroblock in the current frame;
      
      orone or more instances of the defined feature represents one macroblock when one or more instances of the defined feature substantially matches positional information for a macroblock in the current frame.
  - 18. A method as claimed in claim 10 further comprising the step of predicting the appearance parameters and deformation parameters for synthesis of the current instance of a feature-based model, and using the appearance model and deformation model along with temporally recent parameters to interpolate and extrapolate parameters from the feature-based model to predict pels in the current frame, including:
    - determining the values of the synthesis for the temporally recent feature instances are either linearly interpolated or linearly extrapolated based on which method has yielded the most accurate approximation for those instances;
      
      detecting the substantially diminished effectiveness of the linear interpolative and extrapolative methods, utilizing higher order quadratic methods;
      
      detecting the substantially diminished effectiveness of the quadratic methods and employing more advanced state-based methods including extended Kalman filters to predict the appearance and deformation parameters; and
      
      where the actual parameters for the model are optionally differentially encoded relative to the predicted parameters
  - 19. A method as claimed in claim 18 wherein the parameters from the feature-based model enable a reduction in computing resources required to predict pels in the current frame, such that more computing resources are required when using conventional video compression to predict the pels in the current frame using one or more portions of the previously decoded video frames.
  - 20. A method as claimed in claim 1 wherein the feature-based encoding is embedded within conventional video encoding.
  - 21. A method as claimed in claim 1 wherein the one or more defined features are free of correspondence to distinct salient entities (object, sub-objects) in the one or more video frames.
  - 22. A method as claimed in claim 1 wherein the salient entities are determined through user supervised labeling of detected features as belonging to or not belonging to an object.
  - 23. A method as claimed in claim 1 wherein the defined features contain elements of two or more salient entities, background or other parts of the video frames.
  - 24. A method as claimed in claim 1 wherein a defined feature does not correspond to an object.
  - 25. A method as claimed in claim 11 wherein the step of applying feature-based encoding to portions of one or more of the video frames, and applying conventional video encoding to other portions of the one or more video frames:
    - applying compressed sensing to the residual of the second feature-based model prediction;
      
      where the application of compressed sensing utilizes the average appearance as a measurement and predicts the signal from it;
      
      where variance associated with the compressed sensing prediction is removed from the second feature-based model;
      
      where feature-based modeling focuses on a more compact encoding of the remaining residual; and
      
      applying conventional video encoding to remaining pels of the one or more video frames and to remaining video frames.
  - 26. A method as claimed in claim 25 further comprising the step of making the video data sparse to increase effectiveness of the step of applying compressed sensing.
  - 27. A method as claimed in claim 1 wherein the one or more of the instances are transformed using a linear transform.
  - 28. A method as claimed in claim 1 wherein the substantially matching feature is a best match determined using a rate-distortion metric.
  - 29. A method as claimed in claim 1 further includes decoding the encoded video data by:
    - determining on a macroblock level whether there is an encoded feature in the encoded video data;
      
      in response to determining that there is no encoded feature in the encoded video data, decoding using conventional video decoding;
      
      in response to determining that there is an encoded feature in the encoded video data, separating the encoded feature from the encoded video data in order to synthesize the encoded feature separately from the conventionally encoded portions of the video data;
      
      determining feature-based models and feature parameters associated with the encoded feature;
      
      using the determined feature-based models and feature parameters to synthesize the encoded feature instance; and
      
      combining conventionally encoded portions of the video data with the synthesized feature instances to reconstruct original video data.
  - 30. A method as claimed in claim 1 wherein the feature-based encoding includes applying object-based encoding for portions of the one or more video frames.

31. A digital processing system for processing video data having one or more video frames comprising:
- one or more computer processors executing an encoder;
  
  the encoder using feature-based encoding to encode portions of the video frames by;
  
  detecting one or more instances of a candidate feature in one or more of the video frames;
  
  using a motion compensated prediction process, segmenting the one or more instances of the candidate feature from non-features in the one or more video frames, the motion compensated prediction process selecting previously decoded video frames having features corresponding to the one or more instances of the candidate feature;
  
  defining one or more feature instances using one or more of the instances of the candidate feature, where the one or more defined feature instances are predicted to provide relatively increased compactness in the feature-based encoding relative to conventional video encoding;
  
  determining positional information from the one or more previously decoded video frames, the positional information including a position and a spatial perimeter of the one or more defined feature instances in the one or more previously decoded video frames;
  
  forming a feature-based model using the one or more defined feature instances, the feature-based model including the positional information from the previously decoded video frames;
  
  normalizing the one or more defined feature instances using the feature-based model, said normalizing using the positional information from the one or more previously decoded video frames as a positional prediction, resulting normalization being prediction of the one or more defined feature instances in the current video frame;
  
  comparing the feature-based model to a conventional video encoding model for one or more of the defined features, and determining from the comparison which model enables greater encoding compression; and
  
  using results of the comparing and determining step, applying feature-based encoding to portions of one or more of the video frames, and applying conventional video encoding to other portions of the one or more video frames.

32. A method of processing video data comprising:
- receiving video data having a series of video frames;
  
  detecting a candidate feature in one or more of the video frames;
  
  segmenting the candidate feature from non-features in the video frame by employing reference frame processing used in a motion compensated prediction process;
  
  processing the one or more portions of previously decoded video frames to identify potential matches of the candidate feature;
  
  determining that a substantial amount of the portions of previously decoded video frames include instances of the candidate feature;
  
  aggregating the instances of the candidate feature into a set of instances of the candidate feature;
  
  processing the candidate feature set to create a feature-based model, where the feature-based model includes a model of deformation variation and a model of appearance variation of the instances of the candidate feature, the appearance variation models being created by modeling pel variation of the instances of the candidate feature, the deformation variation models being created by modeling pel correspondence variation of the instances of the candidate feature;
  
  determining compression efficiency associated with using the feature-based model to model the candidate feature set;
  
  determining compression efficiency associated with using conventional video compression to model the candidate feature set;
  
  comparing the feature-based model compression efficiency with the conventional video modeling compression efficiency, and determining which one is of greater compression value;
  
  encoding the video data using the feature-based models and conventional video encoding based on which one is of greater compression value.

33. A digital processing system for processing video data having one or more video frames comprising:
- one or more computer processors executing an encoder;
  
  the encoder using feature-based encoding to encode portions of the video frames by;
  
  detecting a candidate feature in one or more of the video frames;
  
  segmenting the candidate feature from non-features in the video frame by employing reference frame processing used in a motion compensated prediction process;
  
  processing the one or more portions of previously decoded video frames to identify potential matches of the candidate feature;
  
  determining that a substantial amount of the portions of previously decoded video frames include instances of the candidate feature;
  
  aggregating the instances of the candidate feature into a set of instances of the candidate feature;
  
  processing the candidate feature set to create a feature-based model, where the feature-based model includes a model of deformation variation and a model of appearance variation of the instances of the candidate feature, the appearance variation models being created by modeling pel variation of the instances of the candidate feature, the structural variation models being created by modeling pel correspondence variation of the instances of the candidate feature;
  
  determining compression efficiency associated with using the feature-based model to model the candidate feature set;
  
  determining compression efficiency associated with using conventional video compression to model the candidate feature set;
  
  comparing the feature-based model compression efficiency with the conventional video modeling compression efficiency, and determining which one is of greater compression value;
  
  encoding the video data using the feature-based models and conventional video encoding based on which one is of greater compression value.

34. A method of processing video data comprising:
- decoding encoded video data by determining on a macroblock level whether there is an encoded feature in the encoded video data;
  
  in response to determining that there is no encoded feature in the encoded video data, decoding using conventional video decoding;
  
  in response to determining that there is an encoded feature in the encoded video data, separating the encoded feature from the encoded video data in order to synthesize the encoded feature instance separately from the conventionally encoded portions of the video data;
  
  determining feature-based models and feature parameters associated with the encoded feature;
  
  using the determined feature-based models and feature parameters to synthesize the encoded feature instance; and
  
  combining conventionally encoded portions of the video data with the synthesized feature instances to reconstruct original video data.

35. A data processing system for processing video data comprising:
- one or more computer processors executing a hybrid codec decoder capable of using video data decoding by;
  
  decoding an encoded video data by determining on a macroblock level whether there is an encoded feature in the encoded video data;
  
  in response to determining that there is no encoded feature in the encoded video data, decoding using conventional video decoding;
  
  in response to determining that there is an encoded feature in the encoded video data, separating the encoded feature from the encoded video data in order to synthesize the encoded feature instance separately from the conventionally encoded portions of the video data;
  
  determining feature-based models and feature parameters associated with the encoded feature;
  
  using the determined feature-based models and feature parameters to synthesize the encoded feature instance; and
  
  combining conventionally encoded portions of the video data with the synthesized features of the video data to reconstruct an original video data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Euclid Discoveries LLC
Original Assignee
Euclid Discoveries LLC
Inventors
Pace, Charles P.

Granted Patent

US 8,942,283 B2
Time in Patent Office

Days
Field of Search
US Class Current

375/240.1
CPC Class Codes

G06T 7/215   Motion-based segmentation

G06T 9/001   Model-based coding, e.g. wi...

H04N 19/17   the unit being an image reg...

H04N 19/23   with coding of regions that...

H04N 19/54   using feature points or meshes

H04N 19/543   using regions

Feature-Based Video Compression

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

35 Claims

Specification

Solutions

Use Cases

Quick Links

Feature-Based Video Compression

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

35 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links