Method for object size and rotation estimation based on logpolar space

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
0Forward
Citations 
0
Petitions 
0
Assignments
First Claim
1. A method estimating scale and rotation changes of a visual object in a sequence based on a logpolar space, comprising steps of:
 1) inputting an image waiting for test and an image template, logpolar transforming the image waiting for test;
2) performing feature extraction on a transformed image;
capturing an image feature;
3) achieving a response map of a scale and a rotation of the image object corresponding to a template through a phase correlation by using features extracted from the image feature;
4) calculating a relative displacement by using a response value and coordinates of the response value in a surround area of a maximum value of the response map; and
5) calculating the scale and the rotation of an object in the image waiting for test corresponding to the template through the relative displacement.
0 Assignments
0 Petitions
Accused Products
Abstract
The present invention discloses a method which comprises the following steps: inputting the image waiting for test and the template; logpolar transforming the image waiting for test; performing feature extraction on transformed image object; capturing an image feature; achieving a response map of scale and rotation of the image object corresponding to a template through a phase correlation by using features extracted from the image features; calculating a relative displacement by using a response value and coordinates of the response value in a surround area of a maximum value of the response pattern; calculating the scale and the rotation of an object in the image waiting for test corresponding to the template through the relative displacement. The present invention adopts the logpolar space as the image operating space to estimate the change in size and rotation of the object and provide a fast and robust select for upper applications.
0 Citations
No References
No References
8 Claims
 1. A method estimating scale and rotation changes of a visual object in a sequence based on a logpolar space, comprising steps of:
1) inputting an image waiting for test and an image template, logpolar transforming the image waiting for test; 2) performing feature extraction on a transformed image;
capturing an image feature;3) achieving a response map of a scale and a rotation of the image object corresponding to a template through a phase correlation by using features extracted from the image feature; 4) calculating a relative displacement by using a response value and coordinates of the response value in a surround area of a maximum value of the response map; and 5) calculating the scale and the rotation of an object in the image waiting for test corresponding to the template through the relative displacement.  View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
1 Specification
This is a U.S. National Stage under 35 U.S.C 371 of the International Application PCT/CN2017/118845, filed Dec. 27, 2017, which claims priority under 35 U.S.C. 119(ad) to CN 2017106857252, filed Aug. 11, 2017.
The present invention relates to computer vision and image processing, and more particularly to the scale and rotation estimation of a visual object in an image sequence.
With a development and research on the computer vision and image processing, various applications are adopted in the fields of AI (artificial intelligence), mechanical processing, transportation, entertainments, medical care, security, military and etc. Rapid and accurate estimation on the scale and rotation is a foundation of the computer vision and an important component of the applications in the visual object tracking and the object detection. The present invention provides a rapid and effective method for the scale and rotation estimation of the visual object in a sequence. As a component of object status estimation, the estimation of the scale and rotation is the foundation and premise of the image matching and etc. A capable method for the scale and rotation estimation supports and guarantees various researches.
The main task for the scale and rotation estimation method is to estimate the relative changes on the scale and rotation (two degrees of freedom in the image plane) of the visual object in input images. Similarity transformation, including translation, scale and rotation changes, is usually employed to model the relative changes. Such a comprehensive representation of the object status helps to improve the estimation of object status in various computer vision applications, such as the pedestrian detection, object detection, object tracking and etc, The scale and rotation estimation provides information on the object size and rotation besides the position coordinates, which enables a complete object status information output and an improved algorithm accuracy while matching. However, compared with pure translation estimation, the searching space is increased from 2DOF (Degrees of Freedom) to 4DOF.
The main stream of the scale estimation in conventional computer vision algorithm is pyramidlike searching which samples the image object of different size to form an image pyramid before being matching one by one to achieve the size estimation of the image object. Similarly, the main stream method for rotation estimation is the bruteforce searching which samples from all the rotation angles before matching to achieve the rotation estimation of the image objects. Both pyramidlike and bruteforce approaches involve a large amount of image sampling operation and occupie a large amount of computational resources. This limits the usage of similarity transformation in realtime applications, which has a high requirement on algorithm efficiency.
In summary, the conventional scale and rotation estimation methods are not able to be employed to support the applications which need to be both rapid and accurate.
The present invention is to provide an efficient and effective method to estimate the scale and rotation changes of a visual object in an image sequence. The scale and rotation is obtained by comparing the feature of the visual object to the trained model in a logpolar space. According to the properties of logpolar space, only one sample is needed to estimate the scale and rotation changes at same time.
The present invention adopts the following steps:
1) inputting an image waiting for test and an image template (after logpolar transforming and feature extraction), logpolar transforming the image waiting for test;
wherein both the image waiting for test and the image template contain a whole object with a known center; the size and rotation angle of the object in the image template are known;
2) performing feature extraction on transformed image object; capturing an image feature;
3) achieving a response pattern of a scale and a rotation of the image object corresponding to a template through a phase correlation by using features extracted from the image features;
4) calculating a relative displacement by using the response value and coordinates of the response value in a surround area of the maximum value of the response pattern; and
5) calculating the size and the rotation of an object in the image waiting for test corresponding to the template through the relative displacement.
wherein the origin of a logpolar transform in the step 1) is a geometric center of an object in an image. The image object is transformed from a Cartesian to a logpolar space. The relative scale and rotation change are corresponding to vertical and horizontal move in the logpolar space, which transforms the size and rotation estimation in the image domain to an offset estimation of the vertical and horizontal in logpolar coordinates.
The feature in the step 2) is image gradients.
The step 2) improves the robustness of the algorithm for the visual object in illumination variation, object deformation, pose change and etc. Performing feature extraction after the step 1) reduces the effect on transforming feature maps into the logpolar space to accurately estimates the change in rotation and size of the object.
The step 3) further comprises the following substeps:
3.1) calculating with a following formula:
wherein R denotes a response map in frequency space; G_{a }denotes a result of a Fourier transform of the image features; G_{b }denotes a result of a Fourier transform of the template; G_{a}* denotes a conjugate of the reasult of the Fourier transform of the image features; G_{b}*denotes a conjugate of the reasult of the Fourier transform of the template; ◯ is a Hadamard product;
G_{a}={g_{a}},G_{b}={g_{b}}
wherein g_{a }are image features achieved in the step 2); F is the Fourier transform; g_{b }is the template;
3.2) Transforming the frequency response pattern R back to time domain and achieving the response pattern r by adopting a following formula:
r=^{−1}{}
wherein F^{−1 }is an inverse Fourier transform.
The step 3) calculates the relative change of the image object corresponding to the model trained by the algorithm through the phase correlation and achieves the response map which is adopted by the step 4) to calculate the relative scale and rotation change of the visual object. The phase correlation only involves pixel by pixel multiplication, Hadamard product, in the Fourier space which eases the calculation and achieves the estimation rapidly. The change of scale and rotation are able to be accurately estimated by the convolution theorem without compromising the speed.
The template in the step 3) is a reference for estimating the relative change which is an image feature or an average of a series of image features.
When the template is the average of the series of image features, the series of image features are processed by the linear fusion which comprises the following steps: estimating the size and the rotation of the image object; resampling the image object, wherein the image object is aligned according to rotation values; and practicing the linear fusion by adopting the below formula:
g_{b}^{new}=λg_{resample}+(1−λ)g_{b}^{old }
wherein g_{b}^{new }is the template after the linear fusion; g_{b}^{old }is the template before the linear fusion; g_{resample }is the added template by the linear fusion; λ is a learning parameter.
The step 4) further comprises the following steps: adopting the response value of the surround area of the maximum value as a weight of coordinates in the surround area of the maximum value; weighting the coordinates before subtracting the coordinates of the origin in the response map; and interpolating the coordinates to estimate the relative displacement with floating points by following processing formulas:
wherein Δx and Δy are the relative displacements along a vertical and a horizontal direction of an image respectively; x_{0 }and y_{0 }are coordinates along a vertical and a horizontal direction of an origin of the frequency space; the origin of the frequency space is a point whose frequency is zero in a frequency domain; (i,j) are coordinates of points in the response map; r(i,j) is an image value of (i, j) in the response map; Φ is the surround area of the maximum value, which are within two pixels range in the embodiment; there are 25 response values in the surround area of the maximum value in the embodiment; E is an adjustment parameter which is a small value to prevent the dividend from being zero.
The step 4) achieves accurate relative displacement besides the discretized scale and rotation change. The consecutive displacement value enables accurate estimations on the change in the scale and rotation.
The step 5) calculates a relative change of the size of the object achieved by normalizing the displacement of the vertical and horizontal coordinates. An exponential changing is employed to normalize the scale displacement and a standard normalization is used to rotation.
The step 5) further comprises the following substeps: normalizing the relative displacement achieved in the step 4); calculating the variation of a relative scale and a relative rotation of the object by adopting following formulas:
wherein h denotes the height of the image; θ denotes changes in rotation a variation of the rotation; s denotes a variation of the size.
The step 5) transforms and normalizes the relative displacement to multiplying change in size and angle change in rotation from displacement in the size of the image to provide the estimation which is able to be adopted by other applications.
The embodiment of the present invention adopts a system which comprises the following three modules.
1) input module for receiving the collected the image data and video series;
2) scale and rotation estimation module for analyzing the image and estimating the change in scale and rotation of the visual object corresponding to the trained model.
3) output module for outputting the estimated change in size and rotation of the object.
The present invention obtains the following benefits.
The present invention is able to estimate the scale and rotation of the image object rapidly and accurately. The size and rotation estimation are transformed to the offset estimation of the vertical and horizontal coordinates in logpolar space. With convolution theorem, the estimation only need sample once to obtain accurate result. The algorithm is robust in illumination variation and object deformation through feature extraction, which improves the effectiveness and efficiency of the algorithm and meets the requirement on realtime image processing.
Referring to the drawings, according to a preferred embodiment, the present invention is illustrated completely with clarity. Any embodiment provided by a skilled worker in the field without innovation is within the protection range of the present invention.
In order to better illustrate the object and technical solution of the present invention, the embodiment of the present invention is described in details with a reference of the drawings.
The embodiment of the present invention is as below:
As illustrated in the
input module for receiving the collected the image data and video series;
wherein the inputs are sent to the scale and rotation estimation module for estimation;
scale and rotation estimation module for logpolar transforming the received images; performing the feature extraction on the images in the new coordinate system; performing phase correlation; estimating the scale and rotation of the visual object in the image corresponding to the model template by comparing with the trained model in the algorithm; and sending the scale and rotation estimation to the output model.
output module for displaying the scale and rotation estimation; marking the object status estimated by the scale and rotation estimation module on the corresponding positions in the original image.
As illustrated in the
1) Logpolar transforming the inputted image. The formula for transforming the image coordinates to logpolar coordinates is as follow:
wherein ρ denotes the coordinate of the logarithmic distance; θ denotes the angular coordinate; x and y are the coordinates in the Cartesian coordinate system. The origin of the coordinates is aligned to the center of the visual object before transforming the coordinate system; wherein x=x^{image}−x_{0}, y=y^{image}−y_{0}. The
2) Performing feature extraction on the image achieved in the step 1) in the logpolar coordinate system; wherein the embodiment extracts Histogram of Gradient (HoG) feature; the original image pixel and the deep learning feature are able to be adopted in other embodiments.
3) Achieving the response map of the scale and the rotation of the image object corresponding to the algorithm model by phase correlation; wherein the formula for the phase correlation is as below:
wherein g_{a }is the image feature in the step 2); F denotes the Fourier transform; g_{b }is the trained model in the algorithm; ∘ is the Hadamard product
Transforming the frequency response map R back to time domain as the response pattern r by adopting a following formula:
r=^{−1}{}
wherein F^{−1 }is an inverse Fourier transform.
The model template is the reference for estimating the relative change required by the algorithm, which can be in various forms, such as a reference image feature or an average of a series of image features. In the present embodiment, the model template is the linear fusion for all images tested by the preceding algorithm which comprises the following steps: estimating the scale and rotation of the image object; resampling the image object aligning the model template according to the estimated scale and rotation parameters; processing the image object according to the step 1) and the step 2); and performing linear fusion with the current model by adopting the following formula:
g_{b}^{new}=λg_{resample}+(1−λ)g_{b}^{old }
wherein g_{b}^{new }is the template after the linear fusion; g_{b}^{old }is the template before the linear fusion; g_{resample }is the added sample by the linear fusion; λ is a learning parameter.
4) Calculating the relative displacement by the response value in the surrounding area of the maximum value in the responding map; wherein the maximum value is always in the discrete index position and requires interpolation in the coordinates to achieve the floatingpoint displacement value by following processing formulas:
wherein Δx and Δy are the relative displacements; x_{0 }and y_{0 }are coordinates of the origin of the image; (i,j) are coordinates in the response map; Φ is the surround area of the maximum value, which are within two pixels range in the embodiment; there are 25 response values in the surround area of the maximum value; E is an adjustment parameter which is a small value to prevent the dividend from being zero.
5) Calculating the change in scale and rotation of the object through relative displacement, wherein the change in relative scale and rotation is achieved by adopting the following formula through normalizing the relative displacement achieved in the step 4).
wherein h denotes the height of the image; θ denotes the change in the rotation; s denotes the change in the scale.
The
The
The present invention is tested on the OTB100 (Object Tracking Benchmark) which is a standard benchmark in the field. The present invention is compared with fDSST (fast Discriminative Scale Space Tracking) and SAMF (Scale Adaptive Kernel Correlation Filter) after being embedded in the KCF (Kernel Correlation Filter). The test result is compared with the ground truth of the test set to draw the overlap rate curve and the error rate curve which is a standard for judging the performance of the algorithm.
The OTB100 contains 100 video sequences and rich annotation information. As illustrated in the
As illustrated in the
The
The test result shows that the present invention outperforms the conventional algorithms for scale and rotation estimation.
The embodiment is just an example to illustrate the present invention. The embodiment is not a limitation for the present invention which has other variations. Any alteration and modification on the embodiment without innovation are within the protection range of the present invention. The protection range of the present invention is the maximum range covered by the claims.