Feature-based hybrid video codec comparing compression efficiency of encodings

US 8,942,283 B2
Filed: 10/06/2009
Issued: 01/27/2015
Est. Priority Date: 03/31/2005
Status: Active Grant

First Claim

Patent Images

1. A computer implemented method of processing a series of video frames of video data, comprising the computer implemented steps of:

encoding one or more portions of the video data using a first encoding process and one or more other portions of the video data using a feature-based encoding process by;

processing a plurality of decoded video frames in the series to detect one or more instances of a candidate feature, said candidate feature being a region of pels exhibiting encoding complexity relative to neighboring pels;

said detection including determining positional information for the instances of the candidate feature in the one or more decoded video frames, the positional information for a respective instance of the candidate feature including one or more of;

a frame identifier for the respective instance of the candidate feature, a position within that respective frame, or a spatial perimeter of the respective instance of the candidate feature;

determining, using a motion compensated prediction process, an instance of the candidate feature in a subject video frame in the series using the one or more decoded video frames, where said motion compensated prediction is initialized with the positional information from instances of the candidate feature in the decoded video frame;

aggregating one or more of the candidate feature instances;

transforming one or more of the candidate feature instances;

forming a first feature-based model based on the aggregated, transformed candidate feature instances, the first feature-based model enabling prediction in the subject video frame of an appearance and a source position of a substantially matching feature instance, where the substantially matching feature instance is a key feature instance, the first feature-based model resulting in a feature-based encoding;

comparing compression efficiency of the feature-based encoding to an encoding from the first video encoding process; and

using results of the comparing step, applying feature-based encoding to portions of one or more of the video frames, and applying the first encoding process to other portions of the one or more video frames.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods of processing video data are provided. Video data having a series of video frames is received and processed. One or more instances of a candidate feature are detected in the video frames. The previously decoded video frames are processed to identify potential matches of the candidate feature. When a substantial amount of portions of previously decoded video frames include instances of the candidate feature, the instances of the candidate feature are aggregated into a set. The candidate feature set is used to create a feature-based model. The feature-based model includes a model of deformation variation and a model of appearance variation of instances of the candidate feature. The feature-based model compression efficiency is compared with the conventional video compression efficiency.

185 Citations

58 Claims

1. A computer implemented method of processing a series of video frames of video data, comprising the computer implemented steps of:
- encoding one or more portions of the video data using a first encoding process and one or more other portions of the video data using a feature-based encoding process by;
  
  processing a plurality of decoded video frames in the series to detect one or more instances of a candidate feature, said candidate feature being a region of pels exhibiting encoding complexity relative to neighboring pels;
  
  said detection including determining positional information for the instances of the candidate feature in the one or more decoded video frames, the positional information for a respective instance of the candidate feature including one or more of;
  
  a frame identifier for the respective instance of the candidate feature, a position within that respective frame, or a spatial perimeter of the respective instance of the candidate feature;
  
  determining, using a motion compensated prediction process, an instance of the candidate feature in a subject video frame in the series using the one or more decoded video frames, where said motion compensated prediction is initialized with the positional information from instances of the candidate feature in the decoded video frame;
  
  aggregating one or more of the candidate feature instances;
  
  transforming one or more of the candidate feature instances;
  
  forming a first feature-based model based on the aggregated, transformed candidate feature instances, the first feature-based model enabling prediction in the subject video frame of an appearance and a source position of a substantially matching feature instance, where the substantially matching feature instance is a key feature instance, the first feature-based model resulting in a feature-based encoding;
  
  comparing compression efficiency of the feature-based encoding to an encoding from the first video encoding process; and
  
  using results of the comparing step, applying feature-based encoding to portions of one or more of the video frames, and applying the first encoding process to other portions of the one or more video frames.
- View Dependent Claims (2, 3, 4, 5)
- - 2. A computer implemented method as claimed in claim 1 wherein the substantially matching feature is a best match determined using a rate-distortion metric.
  - 3. A computer implemented method as in claim 1 wherein the first encoding process is an MPEG encoding process.
  - 4. A computer implemented method as in claim 1 wherein the decoded frames were originally encoded using at least the first encoding process.
  - 5. A computer implemented method as in claim 1 wherein the one or more instances of candidate features are detected by identifying a region of pels in the one or more video frames which has computational complexity as compared to one or more neighboring pels.

6. A digital processing system for processing video data having one or more video frames comprising:
- one or more computer processors executing an encoder;
  
  the encoder configured to use feature-based encoding to encode portions of the video frames and a first encoding process to encode other portions of the video frames by;
  
  detecting one or more instances of a candidate feature in one or more of the video frames, said candidate feature being a region of pels exhibiting encoding complexity relative to neighboring pels;
  
  using a motion compensated prediction process, segmenting the one or more instances of the candidate feature from non-features in the one or more video frames, the motion compensated prediction process selecting decoded video frames having features corresponding to the one or more instances of the candidate feature;
  
  determining positional information from the one or more decoded video frames;
  
  forming a feature-based model based on aggregating the candidate feature instances, the feature-based model including the positional information from the decoded video frames;
  
  normalizing the one or more candidate feature instances using the feature-based model, said normalizing using the positional information from the one or more decoded video frames as a positional prediction, resulting normalization being prediction of the one or more candidate feature instances in a subject video frame in the series;
  
  comparing the feature-based model to a video encoding resulting from the first video encoding process for the one or more instances of the candidate feature, and determining from the comparison which enables greater encoding compression; and
  
  using results of the comparing, applying feature-based encoding to portions of one or more of the video frames, and applying the first video encoding process to other portions of the one or more video frames.
- View Dependent Claims (7, 8, 9, 10)
- - 7. A digital processing system as in claim 6 wherein the positional information includes one or more of:
    - a position and a spatial perimeter of the one or more candidate feature instances in the one or more decoded video frames.
  - 8. A digital processing system as in claim 6 wherein the first encoding process is an MPEG encoding process.
  - 9. A digital processing system as in claim 6 wherein the decoded frames were originally encoded using at least the first encoding process.
  - 10. A digital processing system as in claim 6 wherein the one or more instances of candidate features are detected by identifying a region of pels in the one or more video frames which has computational complexity as compared neighboring pels.

11. A method of processing video data comprising:
- receiving video data having a series of video frames;
  
  detecting a candidate feature in one or more of the video frames, said candidate feature being a region of pels exhibiting encoding complexity relative to neighboring pels;
  
  segmenting the candidate feature from non-features in the video frame by employing reference frame processing used in a motion compensated prediction process;
  
  processing at least one or more portions of decoded video frames to identify potential matches in the one or more previously decoded video frames;
  
  determining that one or more of the decoded video frames include instances of the candidate feature;
  
  forming a representative feature model based on aggregating one or more of the candidate features by;
  
  aggregating the instances of the candidate feature into a set of instances of the candidate feature; and
  
  processing the candidate feature set to create a feature-based model, where the feature-based model includes one or more of a model of deformation variation or a model of appearance variation of the instances of the candidate feature, the appearance variation models being created by modeling pel variation of the instances of the candidate feature, the deformation variation models being created by modeling pel correspondence variation of the instances of the candidate feature;
  
  determining compression efficiency associated with using the feature-based model to model the candidate feature set;
  
  determining compression efficiency associated with using a first video encoding process to model the candidate feature set;
  
  comparing the feature-based model compression efficiency with the first video modeling compression efficiency, and determining which one is of greater compression value;
  
  encoding the video data using the feature-based models and the first video encoding process based on the compression efficiency.
- View Dependent Claims (12, 13, 14)
- - 12. A method as in claim 11 wherein the first encoding process is an MPEG encoding process.
  - 13. A method as in claim 11 wherein the decoded frames were originally encoded using at least the first encoding process.
  - 14. A method as in claim 11 wherein the one or more instances of candidate features are detected by identifying a region of pels in the one or more video frames which has computational complexity as compared to neighboring pels.

15. A digital processing system for processing video data having one or more video frames comprising:
- one or more computer processors executing an encoder;
  
  the encoder configured to use feature-based encoding to encode portions of the video frames by;
  
  detecting a candidate feature in one or more of the video frames, said candidate feature being a region of pels exhibiting encoding complexity relative to neighboring pels;
  
  segmenting the candidate feature from non-features in the video frame by employing reference frame processing used in a motion compensated prediction process;
  
  processing at least the one or more portions of decoded video frames to identify potential matches of the candidate feature;
  
  determining that an amount of the portions of the decoded video frames include instances of the candidate feature;
  
  forming a representative feature model based on aggregating one or more of the candidate features by;
  
  aggregating the instances of the candidate feature into a set of instances of the candidate feature; and
  
  processing the candidate feature set to create a feature-based model, where the feature-based model includes a model of deformation variation and a model of appearance variation of the instances of the candidate feature, the appearance variation models being created by modeling pel variation of the instances of the candidate feature, the structural variation models being created by modeling pel correspondence variation of the instances of the candidate feature;
  
  determining compression efficiency associated with using the feature-based model to model the candidate feature set;
  
  determining compression efficiency associated with using the first video encoding process to model the candidate feature set;
  
  comparing the feature-based model compression efficiency with the video modeling compression efficiency of the first video encoding process; and
  
  encoding the video data using the feature-based models and the first video encoding process based on the compression efficiency.
- View Dependent Claims (16, 17, 18)
- - 16. A digital processing system as in claim 15 wherein the first encoding process is an MPEG encoding process.
  - 17. A digital processing system as in claim 15 wherein the decoded frames were originally encoded using at least the first encoding process.
  - 18. A digital processing system as in claim 15 wherein the one or more instances of candidate features are detected by identifying a region of pels in the one or more video frames which has computational complexity as compared to neighboring pels.

19. A data processing system including:
- an encoder configured to process a series of video frames of video data, such that portions of the video data are encoded using a first encoding process and portions are encoded using a feature-based compression process by;
  
  processing a plurality of decoded video frames in the series to detect one or more instances of a candidate feature, said candidate feature being a region of pels exhibiting encoding complexity relative to neighboring pels;
  
  determining positional information for the detected instances of the candidate feature in the one or more decoded video frames;
  
  using a motion compensated prediction process to facilitate determining an instance of the candidate feature in a subject video frame in the series by initializing the motion compensated prediction process with the positional information for at least one of the detected instances of the candidate feature in the one or more decoded video frames;
  
  transforming one or more of the determined or the detected candidate feature instances;
  
  aggregating the transformed one or more of the determined or detected feature instances to create a feature-based encoding; and
  
  comparing compression efficiency of the feature-based encoding with an encoding from the first encoding process to determine which encoding process to apply.
- View Dependent Claims (20, 21)
- - 20. A digital processing system as in claim 19 wherein the one or more instances of candidate features are detected by identifying a region of pels in the one or more video frames which has computational complexity as compared to neighboring pels.
  - 21. A digital processing system as in claim 19 wherein the first decoding process is an MPEG decoding process.

22. A computer program product stored on a non-transitory computer useable medium configured to be executed by one or more processors, the computer program product being configured to cause the one or more processors to process video data by:
- encoding portions of the video data using a first encoding process and portions using a feature-based compression process by;
  
  processing a plurality of decoded video frames in the series to detect one or more instances of a candidate feature, said candidate feature being a region of pels exhibiting encoding complexity relative to neighboring pels;
  
  determining positional information for at least one of the detected instances of the candidate feature in the one or more decoded video frames;
  
  using a motion compensated prediction process to facilitate determining an instance of the candidate feature in a subject video frame in the series by initializing the motion compensated prediction process with the positional information for at least one of the detected instances of the candidate feature in the one or more decoded video frames;
  
  transforming one or more of the determined or the detected candidate feature instances;
  
  aggregating the transformed one or more of the determined or detected feature instances to create a feature-based encoding; and
  
  comparing compression efficiency of the feature-based encoding with an encoding from the first encoding process to determine which encoding process to apply.

23. A computer implemented method of processing a series of video frames of video data, comprising the computer implemented steps of:
- encoding portions of the video data using a first encoding process and portions using a feature-based compression process by;
  
  processing a plurality of decoded video frames in the series to detect one or more instances of a candidate feature, said candidate feature being a region of pels exhibiting encoding complexity relative to neighboring pels;
  
  determining positional information for at least one of the detected instances of the candidate feature in the one or more decoded video frames;
  
  using a motion compensated prediction process to facilitate determining an instance of the candidate feature in a subject video frame in the series by initializing the motion compensated prediction process with the positional information for at least one of the detected instances of the candidate feature in the one or more decoded video frames;
  
  transforming one or more of the determined or the detected candidate feature instances;
  
  aggregating one or more of the transformed candidate feature instances;
  
  using the aggregated and transformed feature instances to create a feature based model configured as a feature-based encoding; and
  
  comparing compression efficiency of the feature-based encoding with an encoding from the first encoding process to determine which encoding process to apply.
- View Dependent Claims (24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58)
- - 24. A computer implemented method as claimed in claim 23 wherein the step of processing decoded video frames to detect one or more instances of a candidate feature further includes:
    - detecting at least one instance of a candidate feature by identifying a spatially continuous group of pels having substantially close spatial proximity; and
      
      said identified pels defining a portion of one of the one or more video frames.
  - 25. A computer implemented method as claimed in claim 24 wherein the step of processing the one or more decoded video frames to detect one or more instances of a candidate feature further includes:
    - using the motion compensated prediction process, selecting, from a plurality of candidate feature instances, one or more instances that are predicted to provide encoding efficiency; and
      
      determining a segmentation of a respective instance of the candidate feature from other features and non-features in the subject video frame based on the motion compensated prediction process'"'"' selection of one or more predicted instances of the candidate feature.
  - 26. A computer implemented method as claimed in claim 25 wherein the motion compensated prediction process operates on a selection of a substantially larger number of the decoded video frames than in the first video encoding process;
    - andwhere the selection of decoded frame does not rely on user supervision.
  - 27. A computer implemented method as claimed in claim 24 wherein the group of pels further includes one or more:
    - macroblocks or portions of macroblocks.
  - 28. A computer implemented method as claimed in claim 23 further including forming a second feature-based model by:
    - using the first feature-based model as a target of prediction for one or more motion compensated predictions from one or more candidate feature instances, yielding a set of predictions of the first feature-based model; and
      
      upon being combined, the set of predictions becoming the second feature-based model.
  - 29. A computer implemented method as claimed in claim 28 wherein the second feature-based model is used to model residual of the first feature-based model including one or more of:
    - modeling structural variation or appearance variation of the second-feature based model relative to the residual;
      
      encoding the residual with the second feature-based model yielding appearance and deformation parameters; and
      
      using the appearance and deformation parameters to reduce the encoding size of the residual.
  - 30. A computer implemented method as claimed in claim 29 where the second feature-based model provides an optional rectangular area extent of pels associated with that instance in the decoded frame relative to the spatial position.
  - 31. A computer implemented method as claimed in claim 30 further comprising determining the appearance parameters or deformation parameters for synthesis of the current instance of a feature-based model, and using the appearance model or deformation model along with temporally recent parameters to interpolate or extrapolate parameters from the feature-based model to predict pels in the subject video frame, including one or more of:
    - determining the values of the synthesis for temporally recent candidate feature instances are linearly interpolated or linearly extrapolated based on which method has yielded the most accurate approximation for those instances;
      
      in response to detecting diminished effectiveness of the linear interpolative or extrapolative methods, utilizing higher order quadratic methods to predict the appearance or deformation parameters;
      
      orin response to detecting diminished effectiveness of the quadratic methods, employing more advanced state-based methods including extended Kalman filters to predict the appearance or deformation parameters;
      
      where the actual parameters for the model are optionally differentially encoded relative to the predicted parameters.
  - 32. A computer implemented method as claimed in claim 31 wherein the parameters from the feature-based model enable a reduction in computing resources required to predict pels in the subject video frame, such that more computing resources are required when using the first video encoding process to predict the pels in the subject video frame using one or more portions of the decoded video frames.
  - 33. A computer implemented method as claimed in claim 28 wherein the second feature-based model is derived by modeling the decoded feature instances;
    - andwhere the decoded feature instances are created using a feature based encoding of one or more of;
      
      of a candidate feature instance in a subject video frame in the series, a candidate feature instance that is from a decoded frame that is substantially recent in the series, or an average of candidate feature instances from the decoded frame video frames.
  - 34. A method as claimed in claim 33 where the appearance model is represented by a PCA decomposition of the second feature-based model, which has been normalized.
  - 35. A computer implemented method as claimed in claim 33 further comprising determining a deformation model of the spatial variation of correspondences in the candidate feature instances in the first feature-based model as compared to the candidate feature based instances in the second feature-based model;
    - andusing one or more of the following to approximate variation in the deformation instances for the deformation model;
      
      a motion compensated prediction process, mesh deformation, and a motion model with a substantially reduced parameterization; and
      
      integrating the deformation instances into the deformation model.
  - 36. A computer implemented method as claimed in claim 33 wherein the step of applying feature-based encoding to portions of one or more of the video frames, and applying the first encoding process to other portions of the one or more video frames further includes one or more of:
    - applying compressed sensing to the residual of the second feature-based model prediction;
      
      where the application of compressed sensing utilizes the average appearance as a measurement and predicts portions of the video data from it;
      
      where variance associated with the compressed sensing prediction is removed from the second feature-based model;
      
      where feature-based modeling focuses on a more compact encoding of the remaining residual;
      
      orapplying the first video encoding process to remaining pels of the one or more video frames and to remaining video frames.
  - 37. A computer implemented method as claimed in claim 36 further comprising the step of making the video data sparse to increase effectiveness of the step of applying compressed sensing.
  - 38. A computer implemented method as claimed in claim 23 wherein aggregating the candidate feature instances to form a set of candidate feature instances further includesaggregating the instances of the candidate feature into an aggregate candidate feature;
    - andusing the set of instances of the aggregate candidate feature to form a region larger than the original instances of un-aggregated candidate features, where the larger region is formed through the identification of coherency among the instances of the candidate feature in the set.
  - 39. A computer implemented method as claimed in claim 38 wherein the coherency is based on appearance correspondences in the one or more instances of the candidate feature approximated by a lower parameter motion model.
  - 40. A computer implemented method as claimed in claim 23 wherein based on the results of the comparison, applying the first video encoding process to at least a portion of the subject video frame, and using the feature-based model to predict the pels or marcoblocks of the candidate feature instance in the subject video frame.
  - 41. A computer implemented method as claimed in claim 40 wherein applying the first video encoding process to portions of one or more of the video frames in response to the comparing step further includes:
    - assigning a probability for one or more of the decoded video frames, where the probability is based on the combined predicted encoding performance improvement for the subject video frame determined using positional information from the motion compensated prediction process, where the probability provides the combined encoding performance of motion compensated prediction process utilized during the analysis of the first feature-based model and a second feature-based model for the subject video frame; and
      
      determining an indexing of the decoded video frames based on their probability.
  - 42. A computer implemented method as claimed in claim 40 further including reusing the feature instance'"'"'s predicted pels for predicting other feature instances in the subject video frame in response to determining that:
    - one or more instances of the candidate feature overlaps more than one macroblock in the subject video frame;
      
      orone or more instances of the candidate feature represents one macroblock when one or more instances of the candidate feature substantially matches positional information for a macroblock in the subject video frame.
  - 43. A computer implemented method as claimed in claim 23 wherein the feature-based encoding process is embedded within the first encoding process.
  - 44. A computer implemented method as claimed in claim 23 wherein the one or more instances of candidate features are free of correspondence to distinct salient entities (object, sub-objects) in the one or more video frames.
  - 45. A computer implemented method as claimed in claim 44 wherein the salient entities are determined through user supervised labeling of detected features as belonging to or not belonging to an object.
  - 46. A computer implemented method as claimed in claim 23 wherein the one or more instances of the candidate feature contain elements of two or more salient entities, background or other parts of the video frames.
  - 47. A computer implemented method as claimed in claim 23 wherein the one or more instances of a candidate feature do not correspond to an object.
  - 48. A computer implemented method as claimed in claim 23 wherein the one or more of the instances are transformed using a linear transform.
  - 49. A computer implemented method as claimed in claim 23 further includes decoding the encoded video data by performing one or more of:
    - determining on a macroblock level whether there is an encoded feature in the encoded video data;
      
      in response to determining that there is no encoded feature in the encoded video data, decoding using the first video decoding process or an other decoding process;
      
      in response to determining that there is an encoded feature in the encoded video data, separating the encoded feature from the encoded video data in order to synthesize the encoded feature separately from the encoded portions of the video data resulting from the first video encoding process;
      
      determining feature-based models and feature parameters associated with the encoded feature;
      
      using the determined feature-based models and feature parameters to synthesize the encoded feature instance; and
      
      combining encoded portions of the video data resulting from the first video encoding process with the synthesized feature instances to reconstruct original video data.
  - 50. A computer implemented method as claimed in claim 23 wherein the feature-based encoding includes applying object-based encoding for portions of the one or more video frames.
  - 51. A computer implemented method as in claim 23 wherein the first encoding process is an MPEG encoding process.
  - 52. A computer implemented method as in claim 23 wherein the decoded frames were originally encoded using at least the first encoding process.
  - 53. A computer implemented method as in claim 23 wherein the one or more instances of candidate features are detected by identifying a region of pels in the one or more video frames which has computational complexity as compared to one or more neighboring pels.
  - 54. A computer implemented method as in claim 23 wherein the one or more instances of candidate features are detected by identifying a region of pels in the one or more video frames which requires a disproportionate amount of bandwidth for video processing as compared to neighboring pels in the one or more video frames.
  - 55. A computer implemented method as in claim 23 wherein said detection further includes determining respective positional information for the one or more detected candidate feature instances in the one or more decoded video frames, the positional information including one or more of:
    - a frame identifier, a position within that frame, or a spatial perimeter of the instance.
  - 56. A computer implemented method as in claim 23 wherein said detection further includes predicting that the one or more detected candidate features provide relatively increased compactness when encoded using the feature-based encoding relative to an encoding resulting from the first encoding process.
  - 57. A computer implemented method as in claim 23 wherein creating the feature based encoding further includes using the transformed detected candidate feature instances and the predicted candidate feature instances to create one or more feature-based models for the feature-based encoding.
  - 58. A computer implemented method as in claim 23 wherein processing a plurality of decoded video frames in the series to detect one or more candidate feature instances further includes identifying a spatially continuous region of pels that are in close spatial proximity to each other having coherency.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Euclid Discoveries LLC
Original Assignee
Euclid Discoveries LLC
Inventors
Pace, Charles P.
Primary Examiner(s)
Torrente, Richard
Assistant Examiner(s)
SUH, JOSEPH JINWOO

Application Number

US13/121,904
Publication Number

US 20110182352A1
Time in Patent Office

1,939 Days
Field of Search

371/240.1, 371/240.08, 371/240.16, 371/240.01, 725/136, 707/5, 375/240.08, 375/240.16, 375/240.1, 375/240.01
US Class Current

375/240.1
CPC Class Codes

G06T 7/215   Motion-based segmentation

G06T 9/001   Model-based coding, e.g. wi...

H04N 19/17   the unit being an image reg...

H04N 19/23   with coding of regions that...

H04N 19/54   using feature points or meshes

H04N 19/543   using regions

Feature-based hybrid video codec comparing compression efficiency of encodings

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

185 Citations

58 Claims

Specification

Solutions

Use Cases

Quick Links

Feature-based hybrid video codec comparing compression efficiency of encodings

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

185 Citations

58 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links