SYSTEM AND METHOD FOR VIDEO CONTEXT-BASED COMPOSITION AND COMPRESSION FROM NORMALIZED SPATIAL RESOLUTION OBJECTS

US 20160275354A1
Filed: 03/20/2015
Published: 09/22/2016
Est. Priority Date: 03/17/2015
Status: Active Grant

First Claim

Patent Images

1. System (300) for video context-based composing and compression from normalized spatial resolution object characterized by comprising the steps of:

an object detection module (310) that detects a first category of target objects (311) and extracts its coordinate data (312);

a spatial resolution adjustment module (320) that adjusts the sampling of the detected object (311) to match the resolution informed as a parameter (202);

a frame composition module (330) that organizes the detected objects (311) of each input frame (201) in a grid to create a final table (331); and

a video coding module that encodes the final video (341) using spatial and temporal correlations of similar objects in similar position in the subsequent final frames (331);

the final video (341) and its coordinate data (312) are transmitted to an analysis system based on vision (350), where it is stored and analyzed.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention relates to a system and method for efficiently generating images and videos as an array of objects of interest (e.g., faces and hands, plates, etc.) in a desired resolution to perform vision tasks, such as face recognition, facial expression analysis, detection of hand gestures, among others. The composition of such images and videos takes into account the similarity of objects in the same category to encode them more effectively, providing savings in terms of time transmission and storage. Transmission time is less advantage to such a system in terms of efficiency, while less low cost storage means for storing data.

Citations

13 Claims

1. System (300) for video context-based composing and compression from normalized spatial resolution object characterized by comprising the steps of:
- an object detection module (310) that detects a first category of target objects (311) and extracts its coordinate data (312);
  
  a spatial resolution adjustment module (320) that adjusts the sampling of the detected object (311) to match the resolution informed as a parameter (202);
  
  a frame composition module (330) that organizes the detected objects (311) of each input frame (201) in a grid to create a final table (331); and
  
  a video coding module that encodes the final video (341) using spatial and temporal correlations of similar objects in similar position in the subsequent final frames (331);
  
  the final video (341) and its coordinate data (312) are transmitted to an analysis system based on vision (350), where it is stored and analyzed.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. System (300) according to claim 1, characterized by receiving as input data (200) a set of digital video or image frames (201), with the highest possible resolution, and the parameters (202) informing the categories of target objects and a spatial resolution for each category.
  - 3. System (300) according to claim 2, characterized by the set of digital video frames (201) is obtained by a camera (100) with the maximum possible resolution, preferably in RAW format in which data from the sensor of the camera (100) are minimally processed;
    - and the parameters (202) are specified by the system user and represent the requirements and demands of the final computer vision task, comprising;
      
      (i) one or more types of target objects to be detected in the input frame, providing predefined names;
      
      or providing an image model of the target object;
      
      or providing specific coordinate fixed target objects;
      
      (ii) a spatial resolution in pixels for each category.
  - 4. System (300) according to claim 1, characterized by object detection module (310) for each input video frame (201), detecting the first category of target objects (311) and extracting its coordinate data (312).
  - 5. System (300) according to claim 4, characterized by the fact that the detection and extraction of target objects (311) is implemented by one or more of object recognizer based on convolutional network, various image descriptors, and delimitation of the object (311) according to specific coordinates.
  - 6. System (300) according to claim 1, characterized by the spatial resolution adjustment module (320) processing the objects (311) detected in the previous module (310) so that they are represented in spatial resolution target informed by user parameters (202);
    - if the current resolution of the object is less than the desired resolution, an up sampling process is performed;
      
      otherwise, a down sampling process is performed.
  - 7. System (300) according to claim 1, characterized by the composition module frame (330) for each input video frame (201), arranging image tiles with detected objects (already spatially adjusted by previous module) in a grid that corresponds to the final table (331), considering the information on the maximum number of objects that can be detected in the video;
    - preferably, the grid should be as square as possible, for better compression.
  - 8. System (300) according to claim 1, characterized by the video coding module (340) joined initially all frames previously generated a raw video sequence, and then apply a standard video codec, in order to generate a final coded video sequence (341) ready to be stored and/or transmitted and/or analyzed by vision-based computing systems (350).
  - 9. System (300) according to claim 8, characterized by the fact that each image tile corresponding to each object (311) within each frame (331) can be encoded with a resolution of different quality, by applying different quantization parameters, resulting in a final compressed frame (331) comprises tiles with different quality resolutions, optimizing the picture compression procedure (331).
  - 10. System (300) according to claim 8, characterized by the fact that the video codec is preferably, but not limited to, H.264/AVC or HEVC.
  - 11. Method (400) for video context-based compositing and compression from the normalized spatial resolution object, implemented by the system (300) as defined in claim 1, characterized by comprising the steps of:
    - receiving (405) as input data (200) a set of digital video or image frames (201), with the highest possible resolution, and the parameters (202), informing the categories of target objects and spatial resolution for each category;
      
      for each category of object entered as parameter (202) and for each frame (201) of the input video;
      
      detecting and extracting (410) the desired objects (311), considering the categories informed as a parameter (202) in each frame (201) of the input video;
      
      adjusting (420) the spatial resolution of the extracted objects (311) according to the parameters (202);
      
      composing (430) a final frame (331) matching the extracted and adjusted objects (311) spatially grouped in a grid;
      
      generating (440) a final video (341) by processing all end frames (331) with an encoding algorithm that utilizes the visual similarities and location correlations in the frame;
      
      transmitting (450) the final videos (341) and coordinate data (312) corresponding to an vision-based analysis system (350), where it is stored and analyzed.
  - 12. Method (400) according to claim 11, characterized in that the step of receiving (405) parameters (202) as input data (200) comprises receiving one of a class name, a model image or specific coordinates of fixed target objects.
  - 13. Method (400) according to claim 11, characterized in that the step of detecting and extracting the desired objects (410) is implemented/performed by the object detection module (310) of the system (300);
    - the step of adjusting the spatial resolution of the extracted objects (420) is implemented/performed by the spatial resolution adjustment module (320);
      
      the step of composing a final table (430) is implemented/carried by the frame composition module (330);
      
      a final step of generating the video (440) is implemented/performed by the video encoding module (340).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Eletrônica da Amazônia Ltda (Samsung Electronics Co. Ltd.)
Original Assignee
Samsung Eletrônica da Amazônia Ltda (Samsung Electronics Co. Ltd.)
Inventors
ANDAL, FERNANDA A., PENATTI, OTVIO A.B., TESTONI, VANESSA, KOCH, FERNANDO

Granted Patent

US 9,699,476 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06V 10/454   Integrating the filters int...

G06V 40/161   Detection; Localisation; No...

H04N 19/124   Quantisation

H04N 19/154   Measured or subjectively es...

H04N 19/17   the unit being an image reg...

H04N 19/29   involving scalability at th...

H04N 19/59   involving spatial sub-sampl...

SYSTEM AND METHOD FOR VIDEO CONTEXT-BASED COMPOSITION AND COMPRESSION FROM NORMALIZED SPATIAL RESOLUTION OBJECTS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

SYSTEM AND METHOD FOR VIDEO CONTEXT-BASED COMPOSITION AND COMPRESSION FROM NORMALIZED SPATIAL RESOLUTION OBJECTS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links