Estimating vanishing points in images
1. An automated image processor arranged to receive original digital image data and estimate the position of a vanishing point in an image plane of a respective original digital image, including by detecting pairs of similar image patches within an image and identifying a concurrent set of straight virtual lines that substantially converge at a point on the image plane, each line passing through a pair of similar image patches within the image.
A digital image is processed to provide an estimation of the position in the image plane of a vanishing point. The processing includes detecting pairs of similar image patches and identifying a concurrent set of straight virtual lines that substantially converge at a point on the image plane, each line passing through a pair of similar image patches within the image.
View as Search Results
|LEARNING DEVICE AND METHOD, RECOGNITION DEVICE AND METHOD, AND PROGRAM|
Patent #US 20110135192A1
Current AssigneeJun Yokono
Sponsoring EntityJun Yokono
|Method for Reconstructing 3D Lines from 2D Lines in an Image|
Patent #US 20140341463A1
Current AssigneeMitsubishi Electric Research Laboratories
Sponsoring EntityMitsubishi Electric Research Laboratories
|Method for reconstructing 3D lines from 2D lines in an image|
Patent #US 9,183,635 B2
Current AssigneeMitsubishi Electric Research Laboratories
Sponsoring EntityMitsubishi Electric Research Laboratories
|VISION BASED REAL TIME TRAFFIC MONITORING|
Patent #US 20100322476A1
Current AssigneeClemson University
Sponsoring EntityClemson University
|3D Depth Generation by Vanishing Line Detection|
Patent #US 20100079453A1
Current AssigneeNational Taiwan University, Himax Technologies Incorporated
Sponsoring EntityNational Taiwan University, Himax Technologies Incorporated
|Road Division Line Detector|
Patent #US 20090034799A1
Current AssigneeIbaraki Toyota Jidosha Kabushiki Kaisha, Kabushiki Kaisha Toyota Chuo Kenkyusho
Sponsoring EntityIbaraki Toyota Jidosha Kabushiki Kaisha, Kabushiki Kaisha Toyota Chuo Kenkyusho
- 1. An automated image processor arranged to receive original digital image data and estimate the position of a vanishing point in an image plane of a respective original digital image, including by detecting pairs of similar image patches within an image and identifying a concurrent set of straight virtual lines that substantially converge at a point on the image plane, each line passing through a pair of similar image patches within the image.
- 20. An image processor comprising:
a. an input to receive digital data representing a digital image; b. a data store to store a received digital image; and c. a vanishing point estimator, which is adapted to estimate the position of a vanishing point in an image plane of a digital image, including by detecting pairs of similar image patches within an image and identifying a concurrent set of straight virtual lines that substantially converge at a point on the image plane, each line passing through a pair of similar image patches within the image.
- 21. An automated method of processing an image by estimating the position of a vanishing point in an image plane of a respective original digital image, including by detecting pairs of similar image patches within an image and identifying a concurrent set of straight virtual lines that substantially converge at a point on the image plane, each line passing through a pair of similar image patches within the image.
The present invention is in the field of automated image understanding and relates in particular to estimating the location of a vanishing point in a digitized image.
The field of digital image editing, manipulation, and enhancement is evolving to contain three-dimensional (3D) scene structure understanding. Under a pinhole camera model, a set of parallel lines in a 3D scene is projected to a set of concurrent lines which meet at a single point, known as a vanishing point (VP). Each VP is associated with a unique 3D orientation, and hence can provide valuable information on the 3D structure of the scene. VPs are used for a variety of vision tasks such as camera calibration, perspective rectification, scene reconstruction and more. For example, knowledge of the location of a VP is required in applications such as reliably planting objects in images uploaded via the internet: where ‘reliably’ typically means ensuring accurate scaling. Techniques for determining VPs in images are well-known and typically rely on finding straight lines in the images, and projecting those lines to find locations, on the image plane, where the projected lines intersect. Intersections may be within the image area or outside of the image area, depending on the nature of the image. Known techniques can fail in a variety of cases where straight features are either not present or are too faint or blurred to be detected credibly. In addition, the accuracy and credibility of current VP estimation methods can deteriorate quickly in relatively low resolution images, due to feature blurring and line digitization artifacts. Finally, the computational complexity of existing techniques is relatively high and scales not only with image size but also with the density of straight line segments in the image. Hence it is hard to design a general solution with a good control of the trade-off between estimation accuracy and speed.
According to a first aspect, the present invention provides an automated method of estimating the position of a vanishing point in an image plane of a digital image, including by detecting pairs of similar image patches within an image and identifying a concurrent set of straight virtual lines that substantially converge at a point on the image plane, each line passing through a pair of similar image patches within the image.
According to a second aspect, the present invention provides an image processor comprising: an input to receive digital data representing a digital image; a data store to store a received digital image; and a vanishing point estimator, which is adapted to estimate the position of a vanishing point in an image plane of a digital image, including by detecting pairs of similar image patches within an image and identifying a concurrent set of straight virtual lines that substantially converge at a point on the image plane, each line passing through a pair of similar image patches within the image.
According to a third embodiment, the present invention provides an automated method of processing an image by estimating the position of a vanishing point in an image plane of a respective original digital image, including by detecting pairs of similar image patches within an image and identifying a concurrent set of straight virtual lines that substantially converge at a point on the image plane, each line passing through a pair of similar image patches within the image.
Other aspects and embodiments of the invention will become apparent from the following description, claims and drawings.
Various features and advantages of the invention will become apparent from the following description of embodiments of the invention, given by way of example only, which is made with reference to the accompanying drawings, of which:
Various embodiments of the present invention will now be described in more detail with reference to the accompanying drawings. It will be appreciated that the invention is not limited in its application to the details of method and the arrangement of components as set forth in the following description or illustrated in the drawings. It will be apparent to a person skilled in the art that additional embodiments of the present invention not detailed in the description are possible and will fall within the scope of the present claims. Accordingly, the following description should not be interpreted as limiting in any way, and the scope of protection is defined solely by the claims appended hereto.
A principle employed for determining VPs in images according to embodiments of the present invention is illustrated in
The arrows connecting the ends of the matching profiles (respective image corners) point towards, and meet at the VP. In other words, the VP position can be extracted from 1D-affine matching between a pair of parallel 1D-profiles. In other words, it can be appreciated that a set of straight virtual lines that connect pairs of matching image patches (white circles) are concurrent and converge at the VP. As used herein, the term ‘virtual line’ is a line constructed or projected through matching (or similar) patches in an image. A virtual line may coincide with a true straight line in the image but, equally (as will be described in detail below), may not coincide with any discernable line, straight edge or linear feature in the image. In effect, the process of obtaining a single VP from a global 2D self-similarity can be viewed as equivalent to clustering a large collection of VP candidates, each obtained from either meeting point of virtual lines connecting matching points as described above, or equivalently obtained by 1D-affine similarity between a pair of parallel 1D image profiles. Following this view, it is possible to generalize a self similarity approach for detecting multiple VPs located anywhere in an image plane; even when the VPs are not within the image area. Embodiments of the present invention employ this principle, as will be described below.
With reference to the diagram in
All such cross-sections are similar up to scaling with respect to the VP. In particular the relation between two cross sections at x and xR respectively is:
From the general scaling similarity relation of (1), it is possible to estimate the location of the VP as follows. Equation (1) can be expressed as an affine similarity relation relative to the origin:
If the affine transformation parameters (s, τ) are found between a matching pair of cross sections at xR, x, the pencil'"'"'s VP can be determined as:
In practice each matching pair of cross-sections (x, xR) produces a VP candidate. As these candidates are based on information from an entire pencil (collection of many real or virtual lines) they are considerably more credible than traditional VP candidates corresponding to meeting points of line pairs. Hence, there tend to be fewer “false alarms” (i.e. misplaced VP candidates). Given a collection of VP candidates (regardless of how they are obtained), the VPs can be estimated by candidate accumulation and clustering. Since in typical man-made scenes many VPs lie far away from the image boundaries, or at infinity, the candidate accumulation step cannot work in regular spatial coordinates, since VPs at infinity or at a large distance from the image boundaries would not be handled properly. There are suitable accumulation spaces used in the literature, such as Gaussian sphere representation. However, they typically depend on camera calibration parameters, such as the focal length, which would not always be available, for example, for images downloaded from the Internet. One feature of embodiments of the present invention is a new pencil-based accumulator space, that is designed to deal with distant VPs, and is, very conveniently, independent of camera calibration. First, from Equations (1) and (3) it can be established that transformation parameters relating a fixed reference cross-section at xR to another cross section at x is a linear function of x. Hence the rate of change in the affine matching parameters as≡∂xs, aτ≡∂xτ is fixed. The parameter pair as, aτ, can be denoted as the pencil slope. The pencil slope has an invertible relation to the VP coordinates (xv, yv), and can be computed directly from the transformation parameters of a single cross-section match:
Since different matched cross-section pairs may have different reference positions xR, the pencil slopes can be transformed to a common reference position xo, for example the image centre. From equations (5) and (6) the following transformation rule is obtained:
In this parameterization, infinite vanishing points are mapped to
[as(xo), aτ(xo)]=(0, tan θv),
where θv is the VP direction. The main limitation of the pencil slope parameterization is that VPs at xo are mapped to infinity. This limitation can be resolved by employing two different parameterizations: a first search for VPs located inside the image, in bounded image space (xv, yv); and a second search for VPs located outside the image in pencil slope space with xo taken at the image centre.
The derivation described thus far can be applied to cross-sections having arbitrary orientations. In practice, plural sets of cross sections at different orientations are required in order to reliably establish all VPs for an image, as pencils cannot be detected if their lines are substantially parallel to the orientation of a cross section set. For example, as will be applied according to an embodiment of the present invention below, in most practical cases, two perpendicular cross section orientations can be used to detect VPs located at all orientations.
While any cross section orientation can be used, it is a fact that most scenes having man-made objects or edifices contain a significant number of horizontal and vertical surfaces. In addition, images of such scenes tend to be photographed with the optical axis of the camera essentially parallel to the horizontal support surface (for example the ground or floor) or to a distant horizon. Under these conditions, substantially vertical and horizontal cross sections are generally well suited for detecting pencils of real or virtual lines. In addition, these orientations are most compatible with the image pixel grid of a digitized image, so that accuracy reduction due to aliasing tends to be minimal.
An embodiment of the present invention can be implemented using an image processing system 300, as illustrated in the functional block diagram in
The image conditioner 340 comprises a number of processing components that are optional in the context of embodiments of the present invention. That is, other embodiments of the invention may not apply image conditioning, and VP estimation would then be carried out on raw image data. However, according to the present embodiment, image conditioning is applied in order to make VP estimation faster (in image processing efficiency terms) and in some cases more reliable. The kind of image conditioning that is applied (if any) can be varied according to the kind of image being processed, and the skilled person, on reading the present description, would be able to apply any image conditioning deemed appropriate.
The image conditioner 340 comprises a downscaler 342, for downscaling an image to a lower resolution, a feature map generator 344, for generating a feature map of the image, and a feature vector generator 346, for generating a feature vector of each pixel, where the feature vector for a pixel characterizes the image patch consisting of a neighbourhood of pixels around the respective pixel. The operation of feature map generator 344 and the feature vector generator 346 will be described in detail below. The VP estimator 350 comprises a vertical image strip processor 352, for determining VPs from vertical image strips, and a horizontal image strip processor 362, for determining VPs from horizontal image strips. The presence of both vertical and horizontal image strip processors fulfills the preference to analyze at least two orientations (though, other orientations could instead be analyzed).
The term “image strip” is used herein synonymously with ‘cross section’, in the context of a relatively narrow (but not infinitely so), elongated region in an image. As will be described, pairs of parallel image strips are used, according to one implementation of the present invention, to establish the position of candidate VPs in an image. It will be appreciated that use of image strips as such is not essential: in the alternative all regions of the image could be analyzed (i.e. not just image strips). However, as will become apparent, using image strips provides a significantly more efficient process.
Each of the vertical and horizontal image strip processor comprise an image strip positioner, 354 & 364, for positioning image strips on the image, an image strip correlator, 356 & 366, for comparing image patches that are sub-regions of the image strips and establishing similar image patches and a candidate VP locator, 358 & 368, for establishing from all matching image patches the candidate VP location. The cluster processor 370 uses the positions on the image plane potential VP locations to determine the positions of the VPs for the image.
All (or at least some of the) processing elements of
An embodiment of the present invention consisting of processing an image to determine its vanishing point or points will now be described with reference to the flow diagram in
First [step 400] the image data input 330 receives original image data 312 and stores it in the main memory 310.
An exemplary image comprising a scene of an unlit, empty room with daylight shining through a window and reflecting on the floor on the left hand side of the image, is shown in
In terms of image conditioning, next [step 405], the downscaler 342 reduces the resolution of the original image from (in this example) 1500×1000 pixels to 75×50, using well known procedures, such as bilinear interpolation, to produce downscaled image data 313, which is stored in main memory 310. According to the present embodiment, the aspect ratio of the original image is preserved in the down-scaled image. Reduction of the resolution of an image to be analyzed reduces processing overhead and increases processing speed. Although a significant reduction in resolution can lead to a small loss of VP estimation accuracy, results are still accurate enough for most purposes. It has been found by experiment that horizontal or vertical resolutions as low as 50 pixels, but preferably exceeding 70 pixels and more preferably exceeding 100 pixels, can provide sufficient results for some applications.
The feature map generator 344 then [step 410] generates a collection of feature maps 314, each being stored in main memory 310, by operating on the downscaled image data 313. In the flow diagram only three feature map generating steps are shown, but there may be more or fewer, or indeed only one. Each feature map 314 assigns a single value to each pixel, characterizing some properties of the colour or intensity distribution around the pixel. According to the present embodiment, the feature maps are preferably designed to be insensitive to slow illumination changes and noise. In practical terms, the process will work with only a single feature map being generated. However, multiple maps, generated in differing ways, can lead to more accurate VP estimation. Then [step 415], for each image point (c, r), the values of all feature maps in a small spatial context (patch) around that pixel are collected into a feature vector Vc,r, by the feature vector generator 346. The assembly of all the feature vectors—one for each image pixel—forms the “feature vector map” data 315, which is stored in main memory 310.
According to the present embodiment, an exemplary feature map 314 is generated using a Laplacian of Gaussian (LoG) filter on the luminance channel of the downscaled image data 313. The LoG filter is well known and typically used for edge detection in image processing applications. The output of an exemplary LoG filter operation on the image in
After the image has been conditioned to produce a feature vector map 315, the next stage is to find at least one pair (and preferably more pairs) of image strips (i.e. cross sections) that have matching patches (sub-regions). Taking the vertical image strip processor 352 first, according to the present embodiment, image strips correspond to parallel, elongated, vertical columns of the feature vector map, that encapsulate information from a vertical strip of the conditioned image. For example regions a and b as illustrated in the image in
According to the flow diagram in
According to the present embodiment, an image strip correlation procedure first creates a similarity map, referred to herein as a Structural CORrelation Evidence (‘SCORE’) matrix, as illustrated in the diagram in
The SCORE value of a pair of feature vectors v1, v2 is given by:
where <v1,v2> is the dot product between the vectors v1,v2, |v| is the magnitude (modulus) of vector v, T is a characteristic activity threshold, and |0+ denotes clipping of negative values to 0.
The SCORE metric is designed such that for approximately similar feature vectors (approximately aligned, and with similar magnitudes) of considerable magnitude (larger than T), it behaves like the SSIM metric, and attributes a high score (positive correlation evidence). However, unlike SSIM, if the vectors are anti-aligned (anti-correlation evidence), the score is kept to zero instead of being negative in order to be robust against matches of mirrored patches. Additionally, if at least one of the vector magnitudes is much less than the threshold T, the SCORE becomes very small. Hence SCORE does not get high values for accidental vector alignment (e.g. due to noise) if at least one of the vectors does not correspond to a perceptually meaningful feature. Large activity threshold T in equation (8) increases the robustness to noise, but it also decreases the SCORE grade for weak matching features. A tuning of T that is adaptive to both image content and the type of feature-vector used, is the mean of vector magnitudes in the feature vector map, for example multiplied by some factor close to 1 (e.g. 1.5).
For any two arbitrary columns (i.e. image strips according to the present invention), denoted c1, c2, a SCORE matrix is generated by calculating a SCORE value between every two point combination, according to:
An exemplary SCORE matrix is illustrated in
The sample VP locator 358 [step 430] finds the similarity transformation parameters (s, τ) from the SCORE matrix as will be described below. While this process may be repeated for all possible pairs of image strips (which may comprise hundreds or even thousands of potential pairs), according to the preferred embodiment, it has been found that only a relatively small subset (for example between 50 and 150) of all possible column pairs need to be examined to find candidate VPs. In principle, any arrangement of image strips may be chosen and any pair of image strips may be correlated and many ways of positioning and correlating image strips will be apparent to the skilled person on reading the present description; trading off the number of pairs selected (more pairs leading to potentially higher accuracy) against processing overhead.
With regard to equation (9), if column c2 is a perfect affine transformation of the column c1, then c1(s·i+τ)=c2(i), where s, τ are the transformation parameters. In the corresponding SCOREc1,c2 matrix, that relation would appear as high intensity areas (that correspond to strong matching points) spread along a straight fitted line 800, determined by the transformation parameters, as illustrated in the diagram in
where CM=Σii·SCORE(i,s·i+τ) is the centre of mass of the line, SCORE(i, s·i+τ) is the linear interpolation of the SCORE matrix at the coordinates (i, s·i+τ) and |i−CM| is a term that linearly increases the significance in terms of weight of points distant from the centre of mass. A hierarchical exhaustive search may be used to solve that maximization problem, but other optimization methods could be applied as well.
As shown in
As already indicated, this correlation process is repeated by the sample VP locator 358 for all selected pairs of image strips to produce a plurality of candidate VPs.
In parallel with steps 420, 425 and 430 [steps 435, 440 and 445], the horizontal image strip processor 362 repeats the process (using the respective image strip positioner 364, image strip correlator 366 and sample VP locator 368) for finding candidate VP locations using horizontal image strips. In this instance, of course, the pixel context is a horizontal rectangle, rather than a vertical rectangle, around each pixel in the image strip.
Finally [step 450], when all candidate VP locations have been determined, the cluster processor 370 uses known accumulation and clustering techniques (such as a ‘mean shift procedure; though others may be used) to determine the estimated VP locations and store the locations 316 on the image plane of the locations in the main memory 310.
In the present embodiment the accumulation and clustering step is performed in the pencil-slope representation by Equation (7), with two reference points (e.g. xo taken at the left and right boundaries of the image), in order to avoid the problem of mapping VPs near the reference point to infinity in pencil-slope space.
Any downstream image processor 380 can then operate on the original image data 312, using the estimated VP locations as required, for example for object placement.
Using the aforementioned process, two VPs 910 and 912 are found (each marked with an “X”) for the image in
An enhancement of the present embodiment is to, optionally, refine the estimation of VPs using a higher resolution version of the image (for example the original image, or a less downscaled version thereof). The refinement is done by choosing the most credible image strips that contributed to each found VP, and their best matching column pairs to re-enact the process in
Advantages of embodiments of the present invention include the ability to operate reliably on low resolution images, thereby significantly increasing processing speed. In addition, the image conditioning steps (if applied) enable the process to perform well with low quality images (for example, small, blurred or poorly lit images). Significantly also, embodiments of the invention enable VP determination in scenes that do not contain explicit or detectable straight lines; whereas prior art processes rely on straight line detection. For example, embodiments of the invention can determine VPs based on regularly-textured planes, as illustrated by the image in
It will be appreciated that embodiments of the present invention are not limited to the image strips being straight, as illustrated in
The above embodiments are to be understood as illustrative examples of the invention. Further embodiments of the invention are envisaged. For example, other known or even new kinds of feature vector generation and correlation could be used for VP estimation purposes It is to be understood that any feature described in relation to any one embodiment may be used alone, or, if the context permits, in combination with other features described, and may also be used in combination with one or more features of any other of the embodiments, or any combination of any other of the embodiments. Furthermore, equivalents and modifications not described above may also be employed without departing from the scope of the invention, which is defined in the accompanying claims.