Dual-mode mask person event automatic detection method based on video feature statistics

Dual-mode mask person event automatic detection method based on video feature statistics

  • CN 105,678,213 B
  • Filed: 12/20/2015
  • Issued: 08/10/2021
  • Est. Priority Date: 12/20/2015
  • Status: Active Grant
First Claim
Patent Images

1. The method for automatically detecting the dual-mode mask event based on the video feature statistics is characterized by comprising the following steps of:

  • (a) reading in a video image frame, scaling the image frame to a set ratio A of original width and height, and converting a color image of the video frame into a single-channel gray image;

    (b) performing motion foreground detection on the video image read in the step (a) frame by using a frame difference method of Gaussian background modeling to obtain a moving object foreground image;

    (c) performing contour detection, further processing the image obtained in the step (b), removing contours with the areas smaller than a set threshold value, and finding out the maximum rectangular contour of the moving object by calculating the position of the point coordinate in the contour and the coincidence proportion of the position and the moving foreground of the previous frame;

    the method comprises the following steps;

    (c-1) carrying out binarization on the image obtained after the processing in the step (b) to obtain a binarized image;

    (c-2) performing median filtering operation on the binarized image;

    (c-3) continuing to perform contour detection on the image obtained by the processing and storing the image;

    (c-4) sequentially calculating the area of each contour, if the area of each contour is smaller than the set percentage of the total image area, abandoning the calculation of the next contour, if the area meets the set requirement, obtaining an external rectangle of the contour, comparing the external rectangle with the positions of the upper left corner and the lower right corner of a global maximum external rectangle, and calculating a current maximum external rectangle frame, so that the maximum external rectangle frame of the current frame is obtained after the circulation is finished;

    (c-5) comparing the maximum circumscribed rectangle frame of the current frame with the width and height of the previous frame, if the image height is less than 0.7 times of the height of the previous frame, amplifying the frame image to be 1.5 times of the width and height of the original image of the frame, and if the width and height of the rectangle frame exceeds the limit of the boundary size of the original image of the current frame, amplifying to be the size of the boundary;

    (c-6) recording the height of the rectangular frame finally obtained in the step (c-5);

    (d) detecting the maximum rectangular outline process of the moving object obtained in the step (c) to obtain head position information for preliminarily positioning the head position;

    obtaining a motion foreground area of an original image, after converting the motion foreground area into a gray image, adopting a linear interpolation method to scale the image width to a set ratio A of the original image width, then utilizing scale invariance to carry out human head detection, wherein the scaling is 1.1 times each time, the rectangular width of a human head area is 20 percent of the total width of the image, judging the human head after meeting the requirements of a cascade classifier for 3 times continuously, and recording the position information of the human head area for the next step;

    (e) applying face detection to the head region image obtained in the step (d), and then estimating the position of the preliminarily positioned mouth;

    (f) carrying out accurate mouth position on the result image in the step (e), calculating the number of connected domains and the proportion of the areas of the first two large connected domains of the image after gradient calculation, and finally judging the masked person through threshold setting;

    the method specifically comprises the following steps;

    (f-1) further precisely positioning the mouth region;

    (f-2) adopting Gaussian blur noise reduction, wherein the kernel size is 3 x 3, and then converting into a gray-scale image;

    (f-3) using Sobel operator gradient detection and using linear transformation to convert the input array elements to 8-bit unsigned integer of their absolute values;

    (f-4) firstly sharpening the image obtained in the step (f-3), and then carrying out binarization processing by adopting a self-adaptive Dajin threshold value;

    (f-5) carrying out contour detection on the image processed in the step (f-4), calculating the number of contours of the image, and acquiring the features of the number of the contours of the image;

    (f-6) reserving the first two connected domains of the image processed in the step (f-4), and calculating the proportion of the first two connected domains to the number of pixels of the image to obtain the proportion characteristics of the first two connected domains;

    (f-7) determining whether the person is a masked person based on the characteristics of the step (f-5) and the step (f-6);

    judging whether the image is a masked person or not, if the number of the contours is not more than the threshold value of the number of the contours, preliminarily judging that the appearing object of the frame of image is a normal person, further judging that the appearing object of the frame of image is a masked person if the size ratio of the first two connected domains to the accurate mouth positioning image obtained in the step (f-1) is not more than 0.13, and otherwise, judging that the appearing object of the frame of image is a normal person; and

    (4) if the number of the contours is larger than the threshold value of the number of the contours, preliminarily judging that the object appearing in the frame of image is a mask person, and further judging that the object appearing in the frame of image is a normal person if the size ratio of the first two connected domains to the mouth accurate positioning image obtained in the step (f-1) is larger than 0.13, otherwise, the object appearing in the frame of image is a mask person.

View all claims
    ×
    ×

    Thank you for your feedback

    ×
    ×