Apparatus and method for processing video data

US 7,508,990 B2
Filed: 06/07/2007
Issued: 03/24/2009
Est. Priority Date: 07/30/2004
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for processing video signal data from a plurality of video frames, the method comprising:

using the computer to perform the following stepsdetecting an object in two or more given video frames, each video frame being formed of pel data;

tracking the detected object through the two or more video frames;

segmenting pel data corresponding to the detected object from other pel data in the two or more video frames so as to generate a first intermediate form of the video signal data, the segmenting utilizing a spatial segmentation of the pel data;

generating correspondence models of elements of the detected object, each correspondence model relating an element of the detected object in one video frame to a corresponding element of the detected object in another video frame; and

using the correspondence models, normalizing the segmented pel data, said normalizing including modeling global motion of the detected object and resulting in re-sampled pel data corresponding to the detected object in the two or more video frames, the re-sampled pel data providing an object-based encoded form of the video signal data normalized as output;

the object-based encoded form being able to be decoded by;

(i) restoring spatial positions of the re-sampled pel data by utilizing the correspondence models, thereby generating restored pels corresponding to the detected object; and

(ii) recombining the restored pel data together with the other pel data in the first intermediate form of the video signal data to re-create an original video frame; and

wherein generating correspondence models includes estimating a multi-dimensional projective motion model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An apparatus and methods for processing video data are described. The invention provides a representation of video data that can be used to assess agreement between the data and a fitting model for a particular parameterization of the data. This allows the comparison of different parameterization techniques and the selection of the optimum one for continued video processing of the particular data. The representation can be utilized in intermediate form as part of a larger process or as a feedback mechanism for processing video data. When utilized in its intermediate form, the invention can be used in processes for storage, enhancement, refinement, feature extraction, compression, coding, and transmission of video data. The invention serves to extract salient information in a robust and efficient manner while addressing the problems typically associated with video data sources.

Citations

21 Claims

1. A computer-implemented method for processing video signal data from a plurality of video frames, the method comprising:
- using the computer to perform the following stepsdetecting an object in two or more given video frames, each video frame being formed of pel data;
  
  tracking the detected object through the two or more video frames;
  
  segmenting pel data corresponding to the detected object from other pel data in the two or more video frames so as to generate a first intermediate form of the video signal data, the segmenting utilizing a spatial segmentation of the pel data;
  
  generating correspondence models of elements of the detected object, each correspondence model relating an element of the detected object in one video frame to a corresponding element of the detected object in another video frame; and
  
  using the correspondence models, normalizing the segmented pel data, said normalizing including modeling global motion of the detected object and resulting in re-sampled pel data corresponding to the detected object in the two or more video frames, the re-sampled pel data providing an object-based encoded form of the video signal data normalized as output;
  
  the object-based encoded form being able to be decoded by;
  
  (i) restoring spatial positions of the re-sampled pel data by utilizing the correspondence models, thereby generating restored pels corresponding to the detected object; and
  
  (ii) recombining the restored pel data together with the other pel data in the first intermediate form of the video signal data to re-create an original video frame; and
  
  wherein generating correspondence models includes estimating a multi-dimensional projective motion model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. A method as claimed in claim 1 wherein the step of generating correspondence models includes:
    - identifying corresponding elements of the detected object in the given two or more video frames;
      
      analyzing the corresponding elements to generate relationships between the corresponding elements; and
      
      forming the correspondence models by using the generated relationship between the corresponding elements;
      
      wherein analyzing the corresponding elements comprises using an appearance-based motion estimation between two or more of the video frames.
  - 3. A method as claimed in claim 2 wherein the modeling global motion includes integrating the generated relationships between the corresponding elements into a model of global motion.
  - 4. A method as claimed in claim 1 wherein the detecting and tracking comprise using a face detection algorithm.
  - 5. The method of claim 1 further comprising compressing the resampled pel data, the compressing including:
    - decomposing the re-sampled pel data into an encoded representation,truncating zero or more bytes of the encoded representation; and
      
      recomposing the re-sampled pel data from the truncated encoded representation;
      
      wherein each of the decomposing and the recomposing uses Principal Component Analysis.
  - 6. The method of claim 1 further comprising factoring the correspondence models into global deformation models, including:
    - integrating relationships between the corresponding elements into a model of global motion;
      
      decomposing the re-sampled pel data into an encoded representation;
      
      truncating zero or more bytes of the encoded representation; and
      
      recomposing the re-sampled pel data from the truncated encoded representation;
7. The method of claim 6 wherein each of the two or more video frames have object pels and non-object pels, the method further comprising:
- identifying corresponding elements in the non-object pels in two or more of the video frames;
  
  analyzing the corresponding elements in the non-object pels and generating relationships between the corresponding elements in the non-object pels; and
  
  forming second correspondence models by using the generated relationships between the corresponding elements in the non-object pels;
  
  wherein the step of analyzing the corresponding elements in the non-object pels employs a time-based occlusion filter.
8. The method of claim 1 further comprising:
- factoring the correspondence models into global deformation models;
  
  integrating relationships between the corresponding elements into a model of global motion;
  
  decomposing the re-sampled pel data into an encoded representation,truncating zero or more bytes of the encoded representation; and
  
  recomposing the re-sampled pel data from the truncated encoded representation;
  
  wherein each of the decomposing and the recomposing uses a conventional video compression/decompression process; and
  
  wherein generating correspondence models includes analyzing the corresponding elements using a sampling population based on finite differences generated from a block-based motion estimation between two or more of the video frames.
9. The method of claim 1 wherein the step of normalizing factors the correspondence models into local deformation models by:
- defining a two dimensional mesh overlying pel data corresponding to the detected object, the mesh being based on a regular grid of vertices and edges; and
  
  creating a model of local motion from relationships between the corresponding elements, the relationships comprising vertex displacements based on finite differences generated from a block-based motion estimation between two or more of the video frames.
10. The method of claim 9 wherein the vertices correspond to discrete image features, the step of defining a two dimensional mesh further identifies significant image features corresponding to the detected object based on image intensity gradient of the object in the video frames.
11. The method of claim 9 wherein the created local motion model is based on a residual motion not approximated by a global motion model.

12. A computer-implemented method of generating an encoded form of video signal data from a plurality of video frames, the method comprising:
- using the computer to perform the following stepsdetecting an object in two or more video frames of the plurality of video frames, each video frame being formed of pel data;
  
  tracking the detected object through the two or more video frames, the detected object having one or more elements;
  
  for an element of the detected object in one video frame, identifying a corresponding element of the detected object in the other video frames;
  
  analyzing the corresponding elements to generate relationships between the corresponding elements;
  
  forming correspondence models for the detected object by using the generated relationships between the corresponding elements;
  
  normalizing pel data corresponding to the detected object in the two or more video frames by utilizing the formed correspondence models and a deformable mesh, said normalizing generating re-sampled pel data representing an object-based encoded form of the video signal data; and
  
  rendering the object-based encoded form of the video signal data for subsequent use, the object-based encoded form enabling restoring of spatial positions of the re-sampled pel data by utilizing the correspondence models, and generating restored pel data of the detected object;
  
  wherein the detecting and tracking comprise using any one or combination of a Viola/Jones face detection algorithm and Principle Component Analysis.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 13. The method of claim 12 further comprising:
    - segmenting the pel data corresponding to the detected object from other pel data in the two or more video frames resulting in a first intermediate form of the video signal data, the segmenting utilizing temporal integration; and
      
      the object-based encoded form further enabling recombining of the restored pel data together with a portion of the first intermediate form of the video signal data to re-create an original video frame.
  - 14. The method of claim 12 further comprising the step of factoring the correspondence models into global models by:
    - integrating the generated relationships between the corresponding elements into a model of global motion;
15. The method of claim 12 further comprising compressing the resampled pel data by:
- decomposing the re-sampled pel data into an encoded representation;
  
  truncating zero or more bytes of the encoded representation; and
  
  recomposing the re-sampled pel data from the truncated encoded representation;
  
  wherein each of the decomposing and the recomposing uses Principal Component Analysis.
16. The method of claim 12 further comprising factoring the correspondence models into global deformation models by:
- integrating the generated relationships between the corresponding elements into a model of global motion;
  
  decomposing the re-sampled pel data into an encoded representation,truncating zero or more bytes of the encoded representation; and
  
  recomposing the re-sampled pel data from the truncated encoded representation;
  
  wherein each of the decomposing and the recomposing uses Principal Component Analysis;
  
  the step of forming correspondence models uses a robust sampling consensus for solving a two dimensional affine motion model, andthe step of analyzing the corresponding elements uses a sampling population based on finite differences generated from a block-based motion estimation between two or more of the video frames.
17. The method of claim 16 wherein each of the two or more video frames comprises object pel data and non-object pel data, the method further comprising:
- identifying corresponding elements in the non-object pel data in two or more of the video frames;
  
  analyzing the corresponding elements in the non-object pel data to generate relationships between the corresponding elements in the non-object pel data;
  
  generating second correspondence models by using the relationships between the corresponding elements in the non-object pel data;
  
  wherein the analyzing of the corresponding elements in the non-object pel data includes a time-based occlusion filter.
18. The method of claim 12 further comprising:
- factoring the correspondence models into global deformation models;
  
  integrating the relationships between the corresponding elements into a model of global motion;
  
  decomposing the re-sampled pel data into an encoded representation,truncating zero or more bytes of the encoded representation; and
  
  recomposing the re-sampled pel data from the truncated encoded representation;
  
  wherein each of the decomposing and the recomposing uses a conventional video compression/decompression process;
  
  wherein forming correspondence models uses a robust sampling consensus for solving a two dimensional affine motion model, andwherein analyzing the corresponding elements uses a sampling population based on finite differences generated from a block-based motion estimation between two or more of the video frames.
19. The method of claim 12 further comprising factoring the correspondence models into local deformation models including:
- defining a two dimensional mesh overlying pels corresponding to the detected object, the mesh being based on a regular grid of vertices and edges, and;
  
  generating a model of local motion from the relationships between the corresponding elements, the relationships comprising vertex displacements based on finite differences generated from a block-based motion estimation between two or more of the video frames.
20. The method of claim 19 wherein the vertices correspond to discrete image features, the method comprising identifying significant image features corresponding to the detected object by using an analysis of an image gradient Harris response.
21. The method of claim 19 wherein the generated local motion model is based on a residual motion not approximated by a global motion model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Euclid Discoveries LLC
Original Assignee
Euclid Discoveries LLC
Inventors
Pace, Charles Paul
Primary Examiner(s)
SHERALI, ISHRAT I

Application Number

US11/810,759
Publication Number

US 20070297645A1
Time in Patent Office

656 Days
Field of Search

382115-118, 382/100, 382/103, 382232-253, 382/173, 348 141- 1416, 348 77- 78, 348/699, 37524001-24026
US Class Current

382/236
CPC Class Codes

G06T 2207/10016   Video; Image sequence

G06T 7/251   involving models

G06T 9/00   Image coding bandwidth or r...

G06V 10/24   Aligning, centring, orienta...

G06V 10/32   Normalisation of the patter...

G06V 10/7515   Shifting the patterns to ac...

G06V 20/40   in video content extracting...

H04N 19/53   Multi-resolution motion est...

H04N 19/54   using feature points or meshes

Apparatus and method for processing video data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and method for processing video data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links