Apparatus and method for processing video data
First Claim
1. A computer-implemented method for processing video signal data from a plurality of video frames, the method comprising:
- using the computer to perform the following stepsdetecting an object in two or more given video frames, each video frame being formed of pel data;
tracking the detected object through the two or more video frames;
segmenting pel data corresponding to the detected object from other pel data in the two or more video frames so as to generate a first intermediate form of the video signal data, the segmenting utilizing a spatial segmentation of the pel data;
generating correspondence models of elements of the detected object, each correspondence model relating an element of the detected object in one video frame to a corresponding element of the detected object in another video frame; and
using the correspondence models, normalizing the segmented pel data, said normalizing including modeling global motion of the detected object and resulting in re-sampled pel data corresponding to the detected object in the two or more video frames, the re-sampled pel data providing an object-based encoded form of the video signal data normalized as output;
the object-based encoded form being able to be decoded by;
(i) restoring spatial positions of the re-sampled pel data by utilizing the correspondence models, thereby generating restored pels corresponding to the detected object; and
(ii) recombining the restored pel data together with the other pel data in the first intermediate form of the video signal data to re-create an original video frame; and
wherein generating correspondence models includes estimating a multi-dimensional projective motion model.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus and methods for processing video data are described. The invention provides a representation of video data that can be used to assess agreement between the data and a fitting model for a particular parameterization of the data. This allows the comparison of different parameterization techniques and the selection of the optimum one for continued video processing of the particular data. The representation can be utilized in intermediate form as part of a larger process or as a feedback mechanism for processing video data. When utilized in its intermediate form, the invention can be used in processes for storage, enhancement, refinement, feature extraction, compression, coding, and transmission of video data. The invention serves to extract salient information in a robust and efficient manner while addressing the problems typically associated with video data sources.
-
Citations
21 Claims
-
1. A computer-implemented method for processing video signal data from a plurality of video frames, the method comprising:
- using the computer to perform the following steps
detecting an object in two or more given video frames, each video frame being formed of pel data; tracking the detected object through the two or more video frames; segmenting pel data corresponding to the detected object from other pel data in the two or more video frames so as to generate a first intermediate form of the video signal data, the segmenting utilizing a spatial segmentation of the pel data; generating correspondence models of elements of the detected object, each correspondence model relating an element of the detected object in one video frame to a corresponding element of the detected object in another video frame; and using the correspondence models, normalizing the segmented pel data, said normalizing including modeling global motion of the detected object and resulting in re-sampled pel data corresponding to the detected object in the two or more video frames, the re-sampled pel data providing an object-based encoded form of the video signal data normalized as output; the object-based encoded form being able to be decoded by;
(i) restoring spatial positions of the re-sampled pel data by utilizing the correspondence models, thereby generating restored pels corresponding to the detected object; and
(ii) recombining the restored pel data together with the other pel data in the first intermediate form of the video signal data to re-create an original video frame; andwherein generating correspondence models includes estimating a multi-dimensional projective motion model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
wherein each of the decomposing and the recomposing uses Principal Component Analysis; and wherein generating correspondence models includes analyzing the corresponding elements using a sampling population based on finite differences generated from a block-based motion estimation between two or more of the video frames.
- using the computer to perform the following steps
-
7. The method of claim 6 wherein each of the two or more video frames have object pels and non-object pels, the method further comprising:
-
identifying corresponding elements in the non-object pels in two or more of the video frames; analyzing the corresponding elements in the non-object pels and generating relationships between the corresponding elements in the non-object pels; and forming second correspondence models by using the generated relationships between the corresponding elements in the non-object pels; wherein the step of analyzing the corresponding elements in the non-object pels employs a time-based occlusion filter.
-
-
8. The method of claim 1 further comprising:
-
factoring the correspondence models into global deformation models; integrating relationships between the corresponding elements into a model of global motion; decomposing the re-sampled pel data into an encoded representation, truncating zero or more bytes of the encoded representation; and recomposing the re-sampled pel data from the truncated encoded representation; wherein each of the decomposing and the recomposing uses a conventional video compression/decompression process; and wherein generating correspondence models includes analyzing the corresponding elements using a sampling population based on finite differences generated from a block-based motion estimation between two or more of the video frames.
-
-
9. The method of claim 1 wherein the step of normalizing factors the correspondence models into local deformation models by:
-
defining a two dimensional mesh overlying pel data corresponding to the detected object, the mesh being based on a regular grid of vertices and edges; and creating a model of local motion from relationships between the corresponding elements, the relationships comprising vertex displacements based on finite differences generated from a block-based motion estimation between two or more of the video frames.
-
-
10. The method of claim 9 wherein the vertices correspond to discrete image features, the step of defining a two dimensional mesh further identifies significant image features corresponding to the detected object based on image intensity gradient of the object in the video frames.
-
11. The method of claim 9 wherein the created local motion model is based on a residual motion not approximated by a global motion model.
-
12. A computer-implemented method of generating an encoded form of video signal data from a plurality of video frames, the method comprising:
- using the computer to perform the following steps
detecting an object in two or more video frames of the plurality of video frames, each video frame being formed of pel data; tracking the detected object through the two or more video frames, the detected object having one or more elements; for an element of the detected object in one video frame, identifying a corresponding element of the detected object in the other video frames; analyzing the corresponding elements to generate relationships between the corresponding elements; forming correspondence models for the detected object by using the generated relationships between the corresponding elements; normalizing pel data corresponding to the detected object in the two or more video frames by utilizing the formed correspondence models and a deformable mesh, said normalizing generating re-sampled pel data representing an object-based encoded form of the video signal data; and rendering the object-based encoded form of the video signal data for subsequent use, the object-based encoded form enabling restoring of spatial positions of the re-sampled pel data by utilizing the correspondence models, and generating restored pel data of the detected object; wherein the detecting and tracking comprise using any one or combination of a Viola/Jones face detection algorithm and Principle Component Analysis. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
wherein the step of forming correspondence models uses a robust sampling consensus for solving a two dimensional affine motion model, and the step of analyzing the corresponding elements uses a sampling population based on finite differences generated from a block-based motion estimation between two or more of the video frames.
- using the computer to perform the following steps
-
15. The method of claim 12 further comprising compressing the resampled pel data by:
-
decomposing the re-sampled pel data into an encoded representation; truncating zero or more bytes of the encoded representation; and recomposing the re-sampled pel data from the truncated encoded representation; wherein each of the decomposing and the recomposing uses Principal Component Analysis.
-
-
16. The method of claim 12 further comprising factoring the correspondence models into global deformation models by:
-
integrating the generated relationships between the corresponding elements into a model of global motion; decomposing the re-sampled pel data into an encoded representation, truncating zero or more bytes of the encoded representation; and recomposing the re-sampled pel data from the truncated encoded representation; wherein each of the decomposing and the recomposing uses Principal Component Analysis; the step of forming correspondence models uses a robust sampling consensus for solving a two dimensional affine motion model, and the step of analyzing the corresponding elements uses a sampling population based on finite differences generated from a block-based motion estimation between two or more of the video frames.
-
-
17. The method of claim 16 wherein each of the two or more video frames comprises object pel data and non-object pel data, the method further comprising:
-
identifying corresponding elements in the non-object pel data in two or more of the video frames; analyzing the corresponding elements in the non-object pel data to generate relationships between the corresponding elements in the non-object pel data; generating second correspondence models by using the relationships between the corresponding elements in the non-object pel data; wherein the analyzing of the corresponding elements in the non-object pel data includes a time-based occlusion filter.
-
-
18. The method of claim 12 further comprising:
-
factoring the correspondence models into global deformation models; integrating the relationships between the corresponding elements into a model of global motion; decomposing the re-sampled pel data into an encoded representation, truncating zero or more bytes of the encoded representation; and recomposing the re-sampled pel data from the truncated encoded representation; wherein each of the decomposing and the recomposing uses a conventional video compression/decompression process; wherein forming correspondence models uses a robust sampling consensus for solving a two dimensional affine motion model, and wherein analyzing the corresponding elements uses a sampling population based on finite differences generated from a block-based motion estimation between two or more of the video frames.
-
-
19. The method of claim 12 further comprising factoring the correspondence models into local deformation models including:
-
defining a two dimensional mesh overlying pels corresponding to the detected object, the mesh being based on a regular grid of vertices and edges, and; generating a model of local motion from the relationships between the corresponding elements, the relationships comprising vertex displacements based on finite differences generated from a block-based motion estimation between two or more of the video frames.
-
-
20. The method of claim 19 wherein the vertices correspond to discrete image features, the method comprising identifying significant image features corresponding to the detected object by using an analysis of an image gradient Harris response.
-
21. The method of claim 19 wherein the generated local motion model is based on a residual motion not approximated by a global motion model.
Specification