Hand-gesture-based region of interest localization
First Claim
Patent Images
1. A method for localizing a region of interest in an ego-centric video using a hand gesture, comprising:
- acquiring, by a processor, an image containing the hand gesture from the ego-centric video;
detecting, by the processor, pixels that correspond to one or more hands in the image using a hand segmentation algorithm;
identifying, by the processor, a hand enclosure in the pixels that are detected within the image, wherein the identifying comprises;
generating, by the processor, a binary mask of the pixels that are detected within the image; and
applying, by the processor, an image processing to the binary mask to reduce a probability of false positives and false negatives occurring in the binary mask, wherein the image processing comprises;
detecting, by the processor, a plurality of inner contour holes that are located within a border around the pixels that correspond to the one or more hands in the image from a plurality of contour holes within a frame that include a plurality of outer contour holes that are located outside of the border in the binary mask;
calculating, by the processor, a respective size percentage of each one of the plurality of inner contour holes, wherein the respective size percentage is calculated based on a length of a diagonal of a respective inner contour hole divided by a length of a diagonal of the frame;
eliminating, by the processor, one or more of the plurality of inner contour holes that have the respective size percentage that are outside of a predefined range of size percentages; and
identifying, by the processor, a single inner contour hole from a remaining plurality of inner contour holes that is closest to a center of the frame as the hand enclosure;
localizing, by the processor, a region of interest based on the hand enclosure; and
performing, by the processor, an action based on an object in the region of interest, wherein the object comprises text and the performing the action comprises;
recognizing, by the processor, the text using an optical character recognition program; and
automatically populating, by the processor, one or more fields of a form using the text that is identified, wherein the text comprises alphanumeric text on a license plate and the form comprises a citation for a traffic violation.
7 Assignments
0 Petitions
Accused Products
Abstract
A method, non-transitory computer readable medium, and apparatus for localizing a region of interest using a hand gesture are disclosed. For example, the method acquires an image containing the hand gesture from the ego-centric video, detects pixels that correspond to one or more hands in the image using a hand segmentation algorithm, identifies a hand enclosure in the pixels that are detected within the image, localizes a region of interest based on the hand enclosure and performs an action based on the object in the region of interest.
-
Citations
8 Claims
-
1. A method for localizing a region of interest in an ego-centric video using a hand gesture, comprising:
-
acquiring, by a processor, an image containing the hand gesture from the ego-centric video; detecting, by the processor, pixels that correspond to one or more hands in the image using a hand segmentation algorithm; identifying, by the processor, a hand enclosure in the pixels that are detected within the image, wherein the identifying comprises; generating, by the processor, a binary mask of the pixels that are detected within the image; and applying, by the processor, an image processing to the binary mask to reduce a probability of false positives and false negatives occurring in the binary mask, wherein the image processing comprises; detecting, by the processor, a plurality of inner contour holes that are located within a border around the pixels that correspond to the one or more hands in the image from a plurality of contour holes within a frame that include a plurality of outer contour holes that are located outside of the border in the binary mask; calculating, by the processor, a respective size percentage of each one of the plurality of inner contour holes, wherein the respective size percentage is calculated based on a length of a diagonal of a respective inner contour hole divided by a length of a diagonal of the frame; eliminating, by the processor, one or more of the plurality of inner contour holes that have the respective size percentage that are outside of a predefined range of size percentages; and identifying, by the processor, a single inner contour hole from a remaining plurality of inner contour holes that is closest to a center of the frame as the hand enclosure; localizing, by the processor, a region of interest based on the hand enclosure; and performing, by the processor, an action based on an object in the region of interest, wherein the object comprises text and the performing the action comprises; recognizing, by the processor, the text using an optical character recognition program; and automatically populating, by the processor, one or more fields of a form using the text that is identified, wherein the text comprises alphanumeric text on a license plate and the form comprises a citation for a traffic violation. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A non-transitory computer-readable medium storing a plurality of instructions which, when executed by a processor, cause the processor to perform operations for localizing a region of interest using a hand gesture in an ego-centric video, the operations comprising:
-
acquiring an image containing the hand gesture from the ego-centric video; detecting pixels that correspond to one or more hands in the image using a hand segmentation algorithm; identifying a hand enclosure in the pixels that are detected within the image, wherein the identifying comprises; generating a binary mask of the pixels that are detected within the image; and applying an image processing to the binary mask to reduce a probability of false positives and false negatives occurring in the binary mask, wherein the image processing comprises; detecting a plurality of inner contour holes that are located within a border around the pixels that correspond to the one or more hands in the image from a plurality of contour holes within a frame that include a plurality of outer contour holes that are located outside of the border in the binary mask; calculating a respective size percentage of each one of the plurality of inner contour holes, wherein the respective size percentage is calculated based on a length of a diagonal of a respective inner contour hole divided by a length of a diagonal of the frame; eliminating one or more of the plurality of inner contour holes that have the respective size percentage that are outside of a predefined range of size percentages; and identifying a single inner contour hole from a remaining plurality of inner contour holes that is closest to a center of the frame as the hand enclosure; localizing a region of interest based on the hand enclosure; and performing an action based on the object in the region of interest, wherein the object comprises text and the performing the action comprises; recognizing the text using an optical character recognition program; and automatically populating one or more fields of a form using the text that is identified, wherein the text comprises alphanumeric text on a license plate and the form comprises a citation for a traffic violation.
-
-
8. A method for localizing a region of interest using a hand gesture, comprising:
-
capturing, by a processor, an ego-centric video using a head-mounted video device; detecting, by the processor, one or more hands in a frame of the ego-centric video using a binary mask generated from a hand segmentation algorithm; identifying, by the processor, a hand enclosure from a plurality of inner contour holes in the binary mask, wherein the identifying comprises; applying, by the processor, an image processing to the binary mask to reduce a probability of false positives and false negatives occurring in the binary mask, wherein the image processing comprises; detecting, by the processor, the plurality of inner contour holes that are located within a border of pixels around the one or more hands in the image from a plurality of contour holes within the frame that include a plurality of outer contour holes that are located outside of the border in the binary mask; calculating, by the processor, a respective size percentage of each one of the plurality of inner contour holes, wherein the respective size percentage is calculated based on a length of a diagonal of a respective inner contour hole divided by a length of a diagonal of the frame; eliminating, by the processor, one or more of the plurality of inner contour holes that have the respective size percentage that are outside of a predefined range of size percentages; and identifying, by the processor, a single inner contour hole from a remaining plurality of inner contour holes that is closest to a center of the frame as the hand enclosure; determining, by the processor, a selecting a region of interest command is being initiated based on the hand enclosure that is identified; localizing, by the processor, the region of interest based on the hand enclosure; fitting, by the processor, a shape around the region of interest that is displayed as an overlay in a display of the head-mounted video device around the region of interest; cropping, by the processor, an object in the region of interest; and performing, by the processor, an automated action based the object in the region of interest, wherein the object comprises text and the performing the automated action comprises; recognizing, by the processor, the text using an optical character recognition program; and automatically populating, by the processor, one or more fields of a form using the text that is identified, wherein the text comprises alphanumeric text on a license plate and the form comprises a citation for a traffic violation.
-
Specification