SYSTEM AND METHOD FOR VIDEO CONTEXT-BASED COMPOSITION AND COMPRESSION FROM NORMALIZED SPATIAL RESOLUTION OBJECTS
First Claim
1. System (300) for video context-based composing and compression from normalized spatial resolution object characterized by comprising the steps of:
- an object detection module (310) that detects a first category of target objects (311) and extracts its coordinate data (312);
a spatial resolution adjustment module (320) that adjusts the sampling of the detected object (311) to match the resolution informed as a parameter (202);
a frame composition module (330) that organizes the detected objects (311) of each input frame (201) in a grid to create a final table (331); and
a video coding module that encodes the final video (341) using spatial and temporal correlations of similar objects in similar position in the subsequent final frames (331);
the final video (341) and its coordinate data (312) are transmitted to an analysis system based on vision (350), where it is stored and analyzed.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention relates to a system and method for efficiently generating images and videos as an array of objects of interest (e.g., faces and hands, plates, etc.) in a desired resolution to perform vision tasks, such as face recognition, facial expression analysis, detection of hand gestures, among others. The composition of such images and videos takes into account the similarity of objects in the same category to encode them more effectively, providing savings in terms of time transmission and storage. Transmission time is less advantage to such a system in terms of efficiency, while less low cost storage means for storing data.
-
Citations
13 Claims
-
1. System (300) for video context-based composing and compression from normalized spatial resolution object characterized by comprising the steps of:
-
an object detection module (310) that detects a first category of target objects (311) and extracts its coordinate data (312); a spatial resolution adjustment module (320) that adjusts the sampling of the detected object (311) to match the resolution informed as a parameter (202); a frame composition module (330) that organizes the detected objects (311) of each input frame (201) in a grid to create a final table (331); and a video coding module that encodes the final video (341) using spatial and temporal correlations of similar objects in similar position in the subsequent final frames (331); the final video (341) and its coordinate data (312) are transmitted to an analysis system based on vision (350), where it is stored and analyzed. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
Specification