TARGET RECOGNITION METHOD AND APPARATUS FOR A DEFORMED IMAGE

US 20200134366A1
Filed: 06/12/2018
Published: 04/30/2020
Est. Priority Date: 06/16/2017
Status: Active Grant

First Claim

Patent Images

1. An object recognition method for a deformed image, comprising:

inputting an image into a preset localization network to obtain a plurality of localization parameters for the image, wherein the preset localization network comprises a preset number of convolutional layers, and wherein the plurality of localization parameters are obtained by regressing image features in a feature map that is generated from a convolution operation on the image;

performing a spatial transformation on the image based on the plurality of localization parameters to obtain a corrected image; and

inputting the corrected image into a preset recognition network to obtain an object classification result for the image.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An object recognition method and apparatus for a deformed image are provided. The method includes: inputting an image into a preset localization network to obtain a plurality of localization parameters for the image, wherein the preset localization network comprises a preset number of convolutional layers, and wherein the plurality of localization parameters are obtained by regressing image features in a feature map that is generated from a convolution operation on the image; performing a spatial transformation on the image based on the plurality of localization parameters to obtain a corrected image; and inputting the corrected image into a preset recognition network to obtain an object classification result for the image. In the process of the neural network based object recognition, the embodiment of the present application first transforms the deformed image that has deformation, and then performs the object recognition on the transformed image.

Citations

20 Claims

1. An object recognition method for a deformed image, comprising:
- inputting an image into a preset localization network to obtain a plurality of localization parameters for the image, wherein the preset localization network comprises a preset number of convolutional layers, and wherein the plurality of localization parameters are obtained by regressing image features in a feature map that is generated from a convolution operation on the image;
  
  performing a spatial transformation on the image based on the plurality of localization parameters to obtain a corrected image; and
  
  inputting the corrected image into a preset recognition network to obtain an object classification result for the image.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method according to claim 1, wherein inputting the image into the preset localization network to obtain the plurality of localization parameters for the image, comprises:
    - performing a feature extraction on the image using the preset number convolutional layers to generate the feature map containing the image features of the image; and
      
      regressing the image features in the feature map of the image using a fully connected layer in the preset localization network, to obtain the plurality of localization parameters for the image;
      
      wherein the localization parameters are coordinates of pixels in the image, wherein image features of said pixels match with image features of a preset number of reference points in the corrected image.
  - 3. The method according to claim 2, wherein performing the spatial transformation on the image based on the plurality of localization parameters to obtain the corrected image, comprises:
    - determining a spatial transformation relationship for the reference points from the image to the corrected image, based on the localization parameters corresponding to the preset number of reference points and coordinates of the preset number of reference points in the corrected image; and
      
      obtaining respective coordinates in the corrected image for all pixels of the image based on the spatial transformation relationship to obtain the corrected image.
  - 4. The method according to claim 3, wherein determining the spatial transformation relationship for the reference points from the image to the corrected image, based on the localization parameters corresponding to the preset number of reference points and the coordinates of the preset number of reference points in the corrected image, comprises:
    - obtaining transformation parameters required by a preset transformation algorithm for transforming the coordinates of the reference points in the image into the coordinates of the reference points in the corrected image, based on the localization parameters corresponding to the preset number of reference points and the coordinates of the preset number of reference points in the corrected image, wherein the preset transformation algorithm comprises one of an affine transformation algorithm, a perspective transformation algorithm or a thin plate spline transformation algorithm; and
      
      wherein obtaining the respective coordinates in the corrected image for all pixels of the image based on the spatial transformation relationship to obtain the corrected image, comprises;
      
      calculating, from coordinates of all the pixels of the image, the respective coordinates in the corrected image for all the pixels, by using the preset transformation algorithm with the transformation parameters to obtain the corrected image.
  - 5. The method according to claim 1, wherein inputting the corrected image into the preset recognition network to obtain the object classification result for the image, comprises:
    - performing feature extraction on the corrected image using convolutional layers in the preset recognition network to generate a feature map containing image features of the corrected image; and
      
      classifying the image features in the feature map of the corrected image using a fully connected layer in the preset recognition network to obtain the object classification result for the image.

6-10. -10. (canceled)

11. An electronic device, which comprises a processor and a memory,the memory is configured to store a computer program;
- andthe processor is configured to execute the computer program stored in the memory to carry out operations comprising;
  
  inputting an image into a preset localization network to obtain a plurality of localization parameters for the image, wherein the preset localization network comprises a preset number of convolutional layers, and wherein the plurality of localization parameters are obtained by regressing image features in a feature map that is generated from a convolution operation on the image;
  
  performing a spatial transformation on the image based on the plurality of localization parameters to obtain a corrected image; and
  
  inputting the corrected image into a preset recognition network to obtain an object classification result for the image.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The electronic device according to claim 11, wherein inputting the image into the preset localization network to obtain the plurality of localization parameters for the image, comprises:
    - performing a feature extraction on the image using the preset number convolutional layers to generate the feature map containing the image features of the image; and
      
      regressing the image features in the feature map of the image using a fully connected layer in the preset localization network, to obtain the plurality of localization parameters for the image;
      
      wherein the localization parameters are coordinates of pixels in the image, wherein image features of said pixels match with image features of a preset number of reference points in the corrected image.
  - 13. The electronic device according to claim 12, wherein performing the spatial transformation on the image based on the plurality of localization parameters to obtain the corrected image, comprises:
    - determining a spatial transformation relationship for the reference points from the image to the corrected image, based on the localization parameters corresponding to the preset number of reference points and coordinates of the preset number of reference points in the corrected image; and
      
      obtaining respective coordinates in the corrected image for all pixels of the image based on the spatial transformation relationship to obtain the corrected image.
  - 14. The electronic device according to claim 13, wherein determining the spatial transformation relationship for the reference points from the image to the corrected image, based on the localization parameters corresponding to the preset number of reference points and the coordinates of the preset number of reference points in the corrected image, comprises:
    - obtaining transformation parameters required by a preset transformation algorithm for transforming the coordinates of the reference points in the image into the coordinates of the reference points in the corrected image, based on the localization parameters corresponding to the preset number of reference points and the coordinates of the preset number of reference points in the corrected image, wherein the preset transformation algorithm comprises one of an affine transformation algorithm, a perspective transformation algorithm or a thin plate spline transformation algorithm; and
      
      wherein obtaining the respective coordinates in the corrected image for all pixels of the image based on the spatial transformation relationship to obtain the corrected image, comprises;
      
      calculating, from coordinates of all the pixels of the image, the respective coordinates in the corrected image for all the pixels, by using the preset transformation algorithm with the transformation parameters to obtain the corrected image.
  - 15. The electronic device according to claim 11, wherein inputting the corrected image into the preset recognition network to obtain the object classification result for the image, comprises:
    - performing feature extraction on the corrected image using convolutional layers in the preset recognition network to generate a feature map containing image features of the corrected image; and
      
      classifying the image features in the feature map of the corrected image using a fully connected layer in the preset recognition network to obtain the object classification result for the image.

16. A non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, cause the processor to carry out operations comprising:
- inputting an image into a preset localization network to obtain a plurality of localization parameters for the image, wherein the preset localization network comprises a preset number of convolutional layers, and wherein the plurality of localization parameters are obtained by regressing image features in a feature map that is generated from a convolution operation on the image;
  
  performing a spatial transformation on the image based on the plurality of localization parameters to obtain a corrected image; and
  
  inputting the corrected image into a preset recognition network to obtain an object classification result for the image.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The storage medium according to claim 16, wherein inputting the image into the preset localization network to obtain the plurality of localization parameters for the image, comprises:
    - performing a feature extraction on the image using the preset number convolutional layers to generate the feature map containing the image features of the image; and
      
      regressing the image features in the feature map of the image using a fully connected layer in the preset localization network, to obtain the plurality of localization parameters for the image;
      
      wherein the localization parameters are coordinates of pixels in the image, wherein image features of said pixels match with image features of a preset number of reference points in the corrected image.
  - 18. The storage medium according to claim 17, wherein performing the spatial transformation on the image based on the plurality of localization parameters to obtain the corrected image, comprises:
    - determining a spatial transformation relationship for the reference points from the image to the corrected image, based on the localization parameters corresponding to the preset number of reference points and coordinates of the preset number of reference points in the corrected image; and
      
      obtaining respective coordinates in the corrected image for all pixels of the image based on the spatial transformation relationship to obtain the corrected image.
  - 19. The storage medium according to claim 18, wherein determining the spatial transformation relationship for the reference points from the image to the corrected image, based on the localization parameters corresponding to the preset number of reference points and the coordinates of the preset number of reference points in the corrected image, comprises:
    - obtaining transformation parameters required by a preset transformation algorithm for transforming the coordinates of the reference points in the image into the coordinates of the reference points in the corrected image, based on the localization parameters corresponding to the preset number of reference points and the coordinates of the preset number of reference points in the corrected image, wherein the preset transformation algorithm comprises one of an affine transformation algorithm, a perspective transformation algorithm or a thin plate spline transformation algorithm; and
      
      wherein obtaining the respective coordinates in the corrected image for all pixels of the image based on the spatial transformation relationship to obtain the corrected image, comprises;
      
      calculating, from coordinates of all the pixels of the image, the respective coordinates in the corrected image for all the pixels, by using the preset transformation algorithm with the transformation parameters to obtain the corrected image.
  - 20. The storage medium according to claim 16, wherein inputting the corrected image into the preset recognition network to obtain the object classification result for the image, comprises:
    - performing feature extraction on the corrected image using convolutional layers in the preset recognition network to generate a feature map containing image features of the corrected image; and
      
      classifying the image features in the feature map of the corrected image using a fully connected layer in the preset recognition network to obtain the object classification result for the image.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hangzhou Hikvision Digital Technology Company Limited
Original Assignee
Hangzhou Hikvision Digital Technology Company Limited
Inventors
XU, Yunlu, ZHENG, Gang, CHENG, Zhanzhan, NIU, Yi

Granted Patent

US 11,126,888 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 18/2136   based on sparsity criteria,...

G06F 18/24   Classification techniques

G06N 3/02   Neural networks

G06V 10/247   by affine transforms, e.g. ...

G06V 10/454   Integrating the filters int...

G06V 10/764   using classification, e.g. ...

G06V 10/82   using neural networks

TARGET RECOGNITION METHOD AND APPARATUS FOR A DEFORMED IMAGE

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

TARGET RECOGNITION METHOD AND APPARATUS FOR A DEFORMED IMAGE

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links