Two-dimensional method and system enabling three-dimensional user interaction with a device
First Claim
1. A method to enable a device to detect and use at least one user-made user-object interaction that represents a device-detectable event, wherein at least a portion of said user-object is representable by at least one landmark, the method including the following steps:
- (a) disposing a first camera, having a first FOV (field of view), and disposing a second camera, having a second FOV, such that intersecting said first FOV and second FOV define a three-dimensional hover zone within which said user-made user-object interaction is detectable by said device;
(b) obtaining from said first camera a first two-dimensional image, comprising a first set of pixel data, of at least a portion of said user object, and obtaining from said second camera a second two-dimensional image, comprising a second set of pixel data, where a number N represents a maximum total data points acquirable by said first camera and by said second camera, wherein each said two-dimensional image is acquired within said three-dimensional hover zone;
(c) analyzing said first set and said second set of pixel data obtained at step (b) to identify therein potential two-dimensional locations of landmark data points on said user object, said analyzing reducing data to a relatively small number of data points that are typically less than 10% N;
(d) determining for at least some said two-dimensional locations identified in step (c) three-dimensional locations of potential landmarks on said user object, wherein said determining reduces remaining data to a sparse set of three-dimensional data typically less than 1% N;
(e) using three-dimensional locations determined at step (d) and using dynamic information for data remaining after step (d) to further reduce a number of three-dimensional locations of potential landmarks on said user-object by at least a factor of 10 to typically less than 0.1% N;
wherein reduced three-dimensional data following step (e) is outputtable to said device to affect at least one device parameter responsive to detected user-object interaction.
5 Assignments
0 Petitions
Accused Products
Abstract
User interaction with a display is detected substantially simultaneously using at least two cameras whose intersecting FOVs define a three-dimensional hover zone within which user interactions can be imaged. Separately and collectively image data is analyzed to identify a relatively few user landmarks. A substantially unambiguous correspondence is established between the same landmark on each acquired image, and a three-dimensional reconstruction is made in a common coordinate system. Preferably cameras are modeled to have characteristics of pinhole cameras, enabling rectified epipolar geometric analysis to facilitate more rapid disambiguation among potential landmark points. Consequently processing overhead is substantially reduced, as are latency times. Landmark identification and position information is convertible into a command causing the display to respond appropriately to a user gesture. Advantageously size of the hover zone can far exceed size of the display, making the invention usable with smart phones as well as large size entertainment TVs.
19 Citations
23 Claims
-
1. A method to enable a device to detect and use at least one user-made user-object interaction that represents a device-detectable event, wherein at least a portion of said user-object is representable by at least one landmark, the method including the following steps:
-
(a) disposing a first camera, having a first FOV (field of view), and disposing a second camera, having a second FOV, such that intersecting said first FOV and second FOV define a three-dimensional hover zone within which said user-made user-object interaction is detectable by said device; (b) obtaining from said first camera a first two-dimensional image, comprising a first set of pixel data, of at least a portion of said user object, and obtaining from said second camera a second two-dimensional image, comprising a second set of pixel data, where a number N represents a maximum total data points acquirable by said first camera and by said second camera, wherein each said two-dimensional image is acquired within said three-dimensional hover zone; (c) analyzing said first set and said second set of pixel data obtained at step (b) to identify therein potential two-dimensional locations of landmark data points on said user object, said analyzing reducing data to a relatively small number of data points that are typically less than 10% N; (d) determining for at least some said two-dimensional locations identified in step (c) three-dimensional locations of potential landmarks on said user object, wherein said determining reduces remaining data to a sparse set of three-dimensional data typically less than 1% N; (e) using three-dimensional locations determined at step (d) and using dynamic information for data remaining after step (d) to further reduce a number of three-dimensional locations of potential landmarks on said user-object by at least a factor of 10 to typically less than 0.1% N; wherein reduced three-dimensional data following step (e) is outputtable to said device to affect at least one device parameter responsive to detected user-object interaction. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system to detect and use at least one user-made user-object interaction that represents a device-detectable event, wherein at least a portion of said user-object is representable by at least one landmark, the system including:
-
a first camera, having a first FOV (field of view), and a second camera, having a second FOV, disposed such that intersecting said first FOV and second FOV define a three-dimensional hover zone within which said user-made user-object interaction is detectable by a device; means for obtaining a first two-dimensional image from said first camera of at least a portion of said user object within said three-dimensional hover zone, and obtaining a second two-dimensional image from said second camera having of at least a portion of said user object in said three-dimensional hover zone, wherein said first image comprises a first set of pixel data, and said second image comprises a second set of pixel data, where a number N represents a maximum total data points acquired by said first and by said second set of pixel data; means for analyzing said first and second set of pixel data, obtained by said means for obtaining, to identify in said pixel data potential two-dimensional locations of landmark data points on said user object, such that data reduction to less than typically about 10% N occurs; means for determining for at least some said two-dimensional locations, identified by said means for analyzing, a sparse set of three-dimensional data locations of potential landmarks on said user object, said sparse set wherein remaining data is less than typically about 1% N; means for further reducing, coupled to use three-dimensional locations determined by said means for determining and coupled to use dynamic information for data remaining of potential landmarks to less than typically about 0.1% N; wherein three-dimensional data obtained from said means for using is outputtable to said device to affect at least one device parameter responsive to detected user-object interaction. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A hand-holdable electronic device enabling at least one user-object interaction in a three-dimensional hover zone with an image presented on a display, said interaction creating a detectable event useable by said electronic device, where at least a portion of said user-object is representable by at least one landmark, the electronic device including:
-
a housing; a processor-controller unit including a processor coupled to memory storing at least one routine executable by said processor, said processor-controller unit disposed in said housing; a display having a display surface, coupled to said processor-controller unit, able to present user viewable images responsive to commands from said processor-controller unit, said display integrally joined to said housing; at least a first camera having a first FOV and a second camera having a second FOV, said first and second camera disposed such an intersection of said first FOV and second FOV defines said three-dimensional hover zone; said processor controller unit further including; means for obtaining a first two-dimensional image from said first camera of at least a portion of said user object within said three-dimensional hover zone, and obtaining a second two-dimensional image from said second camera having of at least a portion of said user object in said three-dimensional hover zone, wherein said first image comprises a first set of pixel data, and said second image comprises a second set of pixel data, where a number N represents a maximum total data points acquired by said first and by said second set of pixel data; means for analyzing said first and second set of pixel data, obtained by said means for obtaining, to identify in said pixel data potential two-dimensional locations of landmark data points on said user object, such that data reduction to substantially less than typically about 10% N occurs; means for determining for at least some said two-dimensional locations, identified by said means for analyzing, three-dimensional locations of potential landmarks on said user object, wherein remaining data is substantially less than typically about 1% N; means for further reducing, coupled to use three-dimensional locations determined by said means for determining and coupled to use dynamic information for data remaining of potential landmarks to less than typically about 0.1% N; wherein three-dimensional data obtained from said means for further reducing is coupled to said electronic device to affect at least one device parameter selected from a group consisting of (i) causing said electronic device to alter at least a portion of an image presented on said display, (ii) causing said electronic device to issue an audible sound, (iii) causing said electronic device to alter a characteristic of said electronic device, and (iv) causing a change in orientation in said first camera relative to said second camera; and wherein said electronic device includes at least one device selected from a group consisting of (i) a smart phone, (ii) a tablet, (iii) a netbook, (iv) a laptop, (v) an e-book reader, (vi) a PC, (vii) a TV, and (viii) a set top box. - View Dependent Claims (21, 22, 23)
-
Specification