Two-dimensional method and system enabling three-dimensional user interaction with a device
First Claim
1. A method to enable at least one user object interaction, in a three-dimensional hover zone, with an image presented on a display functionally coupled to a device, said interaction creating a detectable event useable by said device, where at least a portion of said user object is representable by at least one landmark, the method including the following steps:
- (a) disposing a first camera having a first FOV and disposing a second camera having a second FOV such that intersecting said first FOV and second FOV define said three-dimensional hover zone;
(b) obtaining a first two-dimensional image from said first camera of at least a portion of said user object within said three-dimensional hover zone, and obtaining a second two-dimensional image from said second camera of at least a portion of said user object in said three-dimensional hover zone;
wherein said first two-dimensional image and said second two-dimensional image are obtained within a timing tolerance that is the longer of (i) said first and said second three-dimensional image are obtained within about ±
1.5 ms of each other, and (ii) said first and said second two-dimensional image each have an exposure duration of X ms, and said first and said second image are obtained within a tolerance of about ±
10%·
X;
(c) analyzing said first two-dimensional image and said second two-dimensional image to identify at least one said landmark and fewer than one hundred potential landmarks definable on said user object;
(d) establishing correspondence between said landmark in said first two-dimensional image and said same landmark in said second two-dimensional image to determine position of said landmark in three-dimensions; and
(e) using three-dimensional position information determined for said landmark at step (d) to create at least one instruction usable by said electronic device, in response to a detected said user object interaction;
wherein said user object interaction includes at least one interaction selected from a group consisting of (i) said user object physically touches said surface of said display, (ii) a gesture made by said user object in a region of said three-dimensional hover zone without physically touching said surface of said display.
6 Assignments
0 Petitions
Accused Products
Abstract
User interaction with a display is detected using at least two cameras whose intersecting FOVs define a three-dimensional hover zone within which user interactions can be imaged. Each camera substantially simultaneously acquires from its vantage point two-dimensional images of the user within the hover zone. Separately and collectively the image data is analyzed to identify therein a relatively few landmarks definable on the user. A substantially unambiguous correspondence is established between the same landmark on each acquired image, and as to those landmarks a three-dimensional reconstruction is made in a common coordinate system. This landmark identification and position information can be converted into a command causing the display to respond appropriately to a gesture made by the user. Advantageously size of the hover zone can far exceed size of the display, making the invention usable with smart phones as well as large size entertainment TVs.
100 Citations
23 Claims
-
1. A method to enable at least one user object interaction, in a three-dimensional hover zone, with an image presented on a display functionally coupled to a device, said interaction creating a detectable event useable by said device, where at least a portion of said user object is representable by at least one landmark, the method including the following steps:
-
(a) disposing a first camera having a first FOV and disposing a second camera having a second FOV such that intersecting said first FOV and second FOV define said three-dimensional hover zone; (b) obtaining a first two-dimensional image from said first camera of at least a portion of said user object within said three-dimensional hover zone, and obtaining a second two-dimensional image from said second camera of at least a portion of said user object in said three-dimensional hover zone; wherein said first two-dimensional image and said second two-dimensional image are obtained within a timing tolerance that is the longer of (i) said first and said second three-dimensional image are obtained within about ±
1.5 ms of each other, and (ii) said first and said second two-dimensional image each have an exposure duration of X ms, and said first and said second image are obtained within a tolerance of about ±
10%·
X;(c) analyzing said first two-dimensional image and said second two-dimensional image to identify at least one said landmark and fewer than one hundred potential landmarks definable on said user object; (d) establishing correspondence between said landmark in said first two-dimensional image and said same landmark in said second two-dimensional image to determine position of said landmark in three-dimensions; and (e) using three-dimensional position information determined for said landmark at step (d) to create at least one instruction usable by said electronic device, in response to a detected said user object interaction; wherein said user object interaction includes at least one interaction selected from a group consisting of (i) said user object physically touches said surface of said display, (ii) a gesture made by said user object in a region of said three-dimensional hover zone without physically touching said surface of said display. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system to enable at least one user object interaction, in a three-dimensional hover zone, with an image presented on a display functionally coupled to a device, said interaction creating a detectable event useable by said device, where at least a portion of said user object is representable by at least one landmark, the system including:
-
at least a first camera having a first FOV and a second camera having second FOV, said first camera and said second camera disposed such that intersecting said first FOV and said second FOV define said three-dimensional hover zone; means for synchronizing, operatively coupled to at least said first camera and to said second camera, to obtain a first two-dimensional image from said first camera of at least a portion of said user object in said three-dimensional hover zone, and to obtain a second two-dimensional image from said second camera, of at least a portion of said user object in said three-dimensional hover zone; wherein at least one of said first camera and said second camera has at least one characteristic selected from a consisting of (i) a two-dimensional array of pixel sensors that senses color spectra, (ii) a two-dimensional array of pixel sensors that senses monochrome spectra, (iii) two-dimensional sensors that senses IR spectra, (iv) said first camera has a two-dimensional array of pixel sensors having equal pixel (x,y) resolution to said two-dimensional array of pixel sensors in said second camera, (v) a camera exposure duration that starts and stops within a timing tolerance of about ±
1.5 ms, (vi) a camera exposure duration of X ms that starts and stops within a timing tolerance of about ±
10%·
X. (vii) said display includes a bezel and mounting of said first camera and said second camera is behind said bezel, (viii) first camera and said second camera are disposed such that said three-dimensional hover zone is adjacent said surface of said display, (ix) said first camera and said second camera are disposed such that said three-dimensional hover zone is adjacent said surface of said display and includes at least a region of said surface of said display, (x) said first camera and said second camera are selected and disposed such that a cross-sectional dimension of said three-dimensional hover zone taken parallel to a surface of said display is larger than a diagonal dimension of said display, and (xi) at least said first camera has been previously calibrated and calibration information for said first camera is known a priori;means for analyzing said first two-dimensional image and said second two-dimensional image to identify at least one said landmark and fewer than about one-hundred potential landmarks definable on said user object, said means for analyzing coupled to said first camera and to said second camera; means for establishing correspondence between said landmark in said first two-dimensional image and said same landmark in said second two-dimensional image to determine position of said landmark in three-dimensions, said means for establishing coupled to said means for analyzing; and means far creating at least one instruction usable by said device in response to a detected said user object interaction using three-dimensional position information determined for said landmark, said means for creating at least one instruction coupled to said means for establishing correspondence; wherein said user object interaction includes at least one interaction selected from a group consisting of (i) said user object physically touches said surface of said display, (ii) a gesture made by said user object in a region of said three-dimensional hover zone without physically touching said surface of said display. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A hand-holdable electronic device enabling at least one user interaction in a three-dimensional hover zone with an image presented on a display, said interaction creating a detectable event useable by said electronic device, where at least a portion of said user is representable by at least one landmark, the electronic device including:
-
a housing; a processor-controller unit including a processor coupled to memory storing at least one routine executable by said processor, said processor-controller unit disposed in said housing, a display having a surface, coupled to said processor-controller unit, able to present user viewable images responsive to commands from said processor-controller unit, said display integrally joined to said housing; at least a first camera having a first FOV and a second camera having a second FOV, said first camera and said second camera disposed relative to said housing such that intersecting said first FOV and second FOV define a three-dimensional hover zone adjacent said surface of said display, said first camera and said second camera integrally attached to said housing such that said three-dimensional hover zone projects outwardly relative to said surface of said display, wherein a transverse dimension of a cross-section of said three-dimensional hover zone in a plane parallel to said surface of said display is at least equal in size to a diagonal dimension of said display; wherein said first camera and said second camera each include a two-dimensional array of pixel sensors sensing at least one of (i) color spectra, monochrome spectra, and (iii) IR spectra; said processor controller unit further including; means for synchronizing, operatively coupled to at least said first camera and to said second camera, to obtain a first two-dimensional image from said first camera of at least a portion of said user in said three-dimensional hover zone, and to obtain a second two-dimensional image from said second camera of at least a portion of said user in said three-dimensional hover zone, means for analyzing said first two-dimensional image and said second two-dimensional image to identify at least one said landmark and fewer than about one-hundred potential landmarks definable on said user, said means for analyzing coupled to said first camera and to said second camera; wherein an identified said landmark includes at least one landmark selected from a group consisting of (i) approximate centroid of a user'"'"'s body, (ii) approximate centroid of a user'"'"'s head, (iii) approximate centroid of a user'"'"'s hand, (iv) approximate location of a user'"'"'s fingertip, (v) approximate location of a user'"'"'s shoulder joint, (vi) approximate location of a user'"'"'s knee joint, and (vii) approximate location of user'"'"'s foot; means for establishing correspondence between said landmark in said first two-dimensional image and said same landmark in said second two-dimensional image to determine position of said landmark in three-dimensions, said means for establishing coupled to said means for analyzing; and means for creating at least one instruction usable by said electronic device in response to a detected said user interaction using three-dimensional position information determined for said landmark, said means for creating at least one instruction coupled to said means for establishing correspondence; wherein said instruction causes at least one action selected from a group consisting of (i) said instruction causes said electronic device to alter at least a portion of an image presented on said display, (ii) said instruction causes said electronic device to issue an audible sound, (iii) said instruction causes said electronic device to alter a characteristic of said electronic device; and wherein said user interaction includes at least one interaction selected from a group consisting of (i) said user physically touches said surface of said display, (ii) a gesture made by said user in a region of said three-dimensional hover zone without physically touching said surface of said display. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
Specification