Hand-gesture-based region of interest localization

US 9,778,750 B2
Filed: 09/30/2014
Issued: 10/03/2017
Est. Priority Date: 09/30/2014
Status: Active Grant

First Claim

Patent Images

1. A method for localizing a region of interest in an ego-centric video using a hand gesture, comprising:

acquiring, by a processor, an image containing the hand gesture from the ego-centric video;

detecting, by the processor, pixels that correspond to one or more hands in the image using a hand segmentation algorithm;

identifying, by the processor, a hand enclosure in the pixels that are detected within the image, wherein the identifying comprises;

generating, by the processor, a binary mask of the pixels that are detected within the image; and

applying, by the processor, an image processing to the binary mask to reduce a probability of false positives and false negatives occurring in the binary mask, wherein the image processing comprises;

detecting, by the processor, a plurality of inner contour holes that are located within a border around the pixels that correspond to the one or more hands in the image from a plurality of contour holes within a frame that include a plurality of outer contour holes that are located outside of the border in the binary mask;

calculating, by the processor, a respective size percentage of each one of the plurality of inner contour holes, wherein the respective size percentage is calculated based on a length of a diagonal of a respective inner contour hole divided by a length of a diagonal of the frame;

eliminating, by the processor, one or more of the plurality of inner contour holes that have the respective size percentage that are outside of a predefined range of size percentages; and

identifying, by the processor, a single inner contour hole from a remaining plurality of inner contour holes that is closest to a center of the frame as the hand enclosure;

localizing, by the processor, a region of interest based on the hand enclosure; and

performing, by the processor, an action based on an object in the region of interest, wherein the object comprises text and the performing the action comprises;

recognizing, by the processor, the text using an optical character recognition program; and

automatically populating, by the processor, one or more fields of a form using the text that is identified, wherein the text comprises alphanumeric text on a license plate and the form comprises a citation for a traffic violation.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, non-transitory computer readable medium, and apparatus for localizing a region of interest using a hand gesture are disclosed. For example, the method acquires an image containing the hand gesture from the ego-centric video, detects pixels that correspond to one or more hands in the image using a hand segmentation algorithm, identifies a hand enclosure in the pixels that are detected within the image, localizes a region of interest based on the hand enclosure and performs an action based on the object in the region of interest.

Citations

8 Claims

1. A method for localizing a region of interest in an ego-centric video using a hand gesture, comprising:
- acquiring, by a processor, an image containing the hand gesture from the ego-centric video;
  
  detecting, by the processor, pixels that correspond to one or more hands in the image using a hand segmentation algorithm;
  
  identifying, by the processor, a hand enclosure in the pixels that are detected within the image, wherein the identifying comprises;
  
  generating, by the processor, a binary mask of the pixels that are detected within the image; and
  
  applying, by the processor, an image processing to the binary mask to reduce a probability of false positives and false negatives occurring in the binary mask, wherein the image processing comprises;
  
  detecting, by the processor, a plurality of inner contour holes that are located within a border around the pixels that correspond to the one or more hands in the image from a plurality of contour holes within a frame that include a plurality of outer contour holes that are located outside of the border in the binary mask;
  
  calculating, by the processor, a respective size percentage of each one of the plurality of inner contour holes, wherein the respective size percentage is calculated based on a length of a diagonal of a respective inner contour hole divided by a length of a diagonal of the frame;
  
  eliminating, by the processor, one or more of the plurality of inner contour holes that have the respective size percentage that are outside of a predefined range of size percentages; and
  
  identifying, by the processor, a single inner contour hole from a remaining plurality of inner contour holes that is closest to a center of the frame as the hand enclosure;
  
  localizing, by the processor, a region of interest based on the hand enclosure; and
  
  performing, by the processor, an action based on an object in the region of interest, wherein the object comprises text and the performing the action comprises;
  
  recognizing, by the processor, the text using an optical character recognition program; and
  
  automatically populating, by the processor, one or more fields of a form using the text that is identified, wherein the text comprises alphanumeric text on a license plate and the form comprises a citation for a traffic violation.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the acquiring comprises receiving a prompt to initiate the acquiring of the image, wherein the prompt comprises at least one of:
    - an audio command, a tap or a swipe gesture.
  - 3. The method of claim 2, wherein the acquiring further comprises capturing a still image after the prompt is received.
  - 4. The method of claim 2, wherein the acquiring further comprises selecting a frame from the ego-centric video after the prompt is received.
  - 5. The method of claim 1, further comprising:
    - displaying, by the processor, a shape around the region of interest in a display of a head-mounted video device; and
      
      receiving, by the processor, a confirmation input that the shape correctly surrounds the region of interest.
  - 6. The method of claim 1, further comprising:
    - translating, by the processor, the text in a first language to a second language.

7. A non-transitory computer-readable medium storing a plurality of instructions which, when executed by a processor, cause the processor to perform operations for localizing a region of interest using a hand gesture in an ego-centric video, the operations comprising:
- acquiring an image containing the hand gesture from the ego-centric video;
  
  detecting pixels that correspond to one or more hands in the image using a hand segmentation algorithm;
  
  identifying a hand enclosure in the pixels that are detected within the image, wherein the identifying comprises;
  
  generating a binary mask of the pixels that are detected within the image; and
  
  applying an image processing to the binary mask to reduce a probability of false positives and false negatives occurring in the binary mask, wherein the image processing comprises;
  
  detecting a plurality of inner contour holes that are located within a border around the pixels that correspond to the one or more hands in the image from a plurality of contour holes within a frame that include a plurality of outer contour holes that are located outside of the border in the binary mask;
  
  calculating a respective size percentage of each one of the plurality of inner contour holes, wherein the respective size percentage is calculated based on a length of a diagonal of a respective inner contour hole divided by a length of a diagonal of the frame;
  
  eliminating one or more of the plurality of inner contour holes that have the respective size percentage that are outside of a predefined range of size percentages; and
  
  identifying a single inner contour hole from a remaining plurality of inner contour holes that is closest to a center of the frame as the hand enclosure;
  
  localizing a region of interest based on the hand enclosure; and
  
  performing an action based on the object in the region of interest, wherein the object comprises text and the performing the action comprises;
  
  recognizing the text using an optical character recognition program; and
  
  automatically populating one or more fields of a form using the text that is identified, wherein the text comprises alphanumeric text on a license plate and the form comprises a citation for a traffic violation.

8. A method for localizing a region of interest using a hand gesture, comprising:
- capturing, by a processor, an ego-centric video using a head-mounted video device;
  
  detecting, by the processor, one or more hands in a frame of the ego-centric video using a binary mask generated from a hand segmentation algorithm;
  
  identifying, by the processor, a hand enclosure from a plurality of inner contour holes in the binary mask, wherein the identifying comprises;
  
  applying, by the processor, an image processing to the binary mask to reduce a probability of false positives and false negatives occurring in the binary mask, wherein the image processing comprises;
  
  detecting, by the processor, the plurality of inner contour holes that are located within a border of pixels around the one or more hands in the image from a plurality of contour holes within the frame that include a plurality of outer contour holes that are located outside of the border in the binary mask;
  
  calculating, by the processor, a respective size percentage of each one of the plurality of inner contour holes, wherein the respective size percentage is calculated based on a length of a diagonal of a respective inner contour hole divided by a length of a diagonal of the frame;
  
  eliminating, by the processor, one or more of the plurality of inner contour holes that have the respective size percentage that are outside of a predefined range of size percentages; and
  
  identifying, by the processor, a single inner contour hole from a remaining plurality of inner contour holes that is closest to a center of the frame as the hand enclosure;
  
  determining, by the processor, a selecting a region of interest command is being initiated based on the hand enclosure that is identified;
  
  localizing, by the processor, the region of interest based on the hand enclosure;
  
  fitting, by the processor, a shape around the region of interest that is displayed as an overlay in a display of the head-mounted video device around the region of interest;
  
  cropping, by the processor, an object in the region of interest; and
  
  performing, by the processor, an automated action based the object in the region of interest, wherein the object comprises text and the performing the automated action comprises;
  
  recognizing, by the processor, the text using an optical character recognition program; and
  
  automatically populating, by the processor, one or more fields of a form using the text that is identified, wherein the text comprises alphanumeric text on a license plate and the form comprises a citation for a traffic violation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Xerox Corporation (Xerox Holdings Corp.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Kumar, Jayant, Yang, Xiaodong, Li, Qun, Bernal, Edgar A., Bala, Raja
Primary Examiner(s)
FLORES, ROBERTO W

Application Number

US14/501,284
Publication Number

US 20160091975A1
Time in Patent Office

1,099 Days
Field of Search
US Class Current
CPC Class Codes

G02B 2027/0138   comprising image capture sy...

G02B 2027/014   comprising information/imag...

G02B 2027/0178   Eyeglass type eyeglass deta...

G02B 2027/0187   slaved to motion of at leas...

G02B 27/0172   characterised by optical fe...

G06F 1/163   Wearable computers, e.g. on...

G06F 3/011   Arrangements for interactio...

G06F 3/017   Gesture based interaction, ...

G06F 3/0304   Detection arrangements usin...

G06V 10/235   based on user input or inte...

G06V 20/10   Terrestrial scenes scenes u...

G06V 40/113   Recognition of static hand ...

H04N 23/611   where the recognised object...

Hand-gesture-based region of interest localization

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Hand-gesture-based region of interest localization

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links