Learning method and learning device for object detector based on CNN using 1×1 convolution to be used for hardware optimization, and testing method and testing device using the same

US 10,395,140 B1
Filed: 01/23/2019
Issued: 08/27/2019
Est. Priority Date: 01/23/2019
Status: Active Grant

First Claim

Patent Images

1. A method for learning parameters of an object detector based on a CNN, comprising steps of:

(a) a learning device, if at least one training image is acquired, (i) instructing one or more convolutional layers to generate at least one initial feature map by applying one or more convolution operations to the training image, (ii) instructing an RPN to generate one or more proposals corresponding to each of one or more objects in the training image by using the initial feature map, and (iii) (iii-1) instructing a pooling layer to apply one or more pooling operations to each region, corresponding to each of the proposals, on the initial feature map, to thereby generate pooled feature maps per each of the proposals, and instructing a first transposing layer to concatenate each of pixels, per each of the proposals, in each of corresponding same locations on the pooled feature maps per each of the proposals, to thereby generate an integrated feature map, or (iii-2) instructing the pooling layer to apply the pooling operations to each region, corresponding to each of the proposals, on the initial feature map, to thereby generate the pooled feature maps per each of the proposals, and instructing the pooling layer to concatenate each of the pixels, per each of the proposals, in each of the corresponding same locations on the pooled feature maps per each of the proposals, to thereby generate the integrated feature map;

(b) the learning device instructing a first 1×

1 convolutional layer to apply a 1×

1 convolution operation to the integrated feature map, to thereby generate a first adjusted feature map whose volume is adjusted, and instructing a second 1×

1 convolutional layer to apply the 1×

1 convolution operation to the first adjusted feature map, to thereby generate a second adjusted feature map whose volume is adjusted; and

(c) the learning device (c1) (i) instructing a second transposing layer to divide the second adjusted feature map by each of the pixels, to thereby generate pixel-wise feature maps per each of the proposals, and instructing a classifying layer to generate object class information on each of the proposals by using the pixel-wise feature maps per each of the proposals, or (ii) instructing the classifying layer to divide the second adjusted feature map by each of the pixels, to thereby generate the pixel-wise feature maps per each of the proposals, and instructing the classifying layer to generate the object class information on each of the proposals by using the pixel-wise feature maps per each of the proposals, (c2) instructing a detecting layer to generate object detection information corresponding to the objects in the training image by referring to the object class information and the pixel-wise feature maps per each of the proposals, and (c3) instructing a detection loss layer to calculate one or more object detection losses by referring to the object detection information and its corresponding GT, to thereby learn at least part of parameters of the second 1×

1 convolutional layer, the first 1×

1 convolutional layer, and the convolutional layers by backpropagating the object detection losses.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for learning parameters of an object detector based on a CNN is provided to be used for hardware optimization which satisfies KPI. The method includes steps of: a learning device instructing a first transposing layer or a pooling layer to generate an integrated feature map by concatenating pixels per each proposal; and instructing a second transposing layer or a classifying layer to divide volume-adjusted feature map, generated by using the integrated feature map, by pixel, and instructing the classifying layer to generate object class information. By this method, size of a chip can be decreased as convolution operations and fully connected layer operations can be performed by a same processor. Accordingly, there are advantages such as no need to build additional lines in a semiconductor manufacturing process, power saving, more space to place other modules instead of an FC module in a die, and the like.

Citations

28 Claims

1. A method for learning parameters of an object detector based on a CNN, comprising steps of:
- (a) a learning device, if at least one training image is acquired, (i) instructing one or more convolutional layers to generate at least one initial feature map by applying one or more convolution operations to the training image, (ii) instructing an RPN to generate one or more proposals corresponding to each of one or more objects in the training image by using the initial feature map, and (iii) (iii-1) instructing a pooling layer to apply one or more pooling operations to each region, corresponding to each of the proposals, on the initial feature map, to thereby generate pooled feature maps per each of the proposals, and instructing a first transposing layer to concatenate each of pixels, per each of the proposals, in each of corresponding same locations on the pooled feature maps per each of the proposals, to thereby generate an integrated feature map, or (iii-2) instructing the pooling layer to apply the pooling operations to each region, corresponding to each of the proposals, on the initial feature map, to thereby generate the pooled feature maps per each of the proposals, and instructing the pooling layer to concatenate each of the pixels, per each of the proposals, in each of the corresponding same locations on the pooled feature maps per each of the proposals, to thereby generate the integrated feature map;
  
  (b) the learning device instructing a first 1×
  
  1 convolutional layer to apply a 1×
  
  1 convolution operation to the integrated feature map, to thereby generate a first adjusted feature map whose volume is adjusted, and instructing a second 1×
  
  1 convolutional layer to apply the 1×
  
  1 convolution operation to the first adjusted feature map, to thereby generate a second adjusted feature map whose volume is adjusted; and
  
  (c) the learning device (c1) (i) instructing a second transposing layer to divide the second adjusted feature map by each of the pixels, to thereby generate pixel-wise feature maps per each of the proposals, and instructing a classifying layer to generate object class information on each of the proposals by using the pixel-wise feature maps per each of the proposals, or (ii) instructing the classifying layer to divide the second adjusted feature map by each of the pixels, to thereby generate the pixel-wise feature maps per each of the proposals, and instructing the classifying layer to generate the object class information on each of the proposals by using the pixel-wise feature maps per each of the proposals, (c2) instructing a detecting layer to generate object detection information corresponding to the objects in the training image by referring to the object class information and the pixel-wise feature maps per each of the proposals, and (c3) instructing a detection loss layer to calculate one or more object detection losses by referring to the object detection information and its corresponding GT, to thereby learn at least part of parameters of the second 1×
  
  1 convolutional layer, the first 1×
  
  1 convolutional layer, and the convolutional layers by backpropagating the object detection losses.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein, after the step of (a), the learning device instructs an RPN loss layer to calculate one or more RPN losses by referring to the proposals and their corresponding ground truths, to thereby learn at least part of parameters of the RPN by backpropagating the RPN losses.
  - 3. The method of claim 1, wherein, supposing that the number of the proposals is N, and that a width of the pooled feature maps per each of the proposals is M1 and a height thereof is M2, and that the number of channels of the pooled feature maps per each of the proposals is J, at the step of (a), the learning device (i) instructs the first transposing layer to convert the pooled feature maps per each of the proposals into the integrated feature map having a width of N, a height of 1, and a channel of M1·
    - M2·
      
      J, or (ii) instructs the pooling layer to convert the pooled feature maps per each of the proposals into the integrated feature map having the width of N, the height of 1, and the channel of M1·
      
      M2·
      
      J.
  - 4. The method of claim 3, wherein, supposing that the number of filters in the first 1×
    - 1 convolutional layer is K, and that the number of filters in the second 1×
      
      1 convolutional layer is L, at the step of (b), the learning device instructs the first 1×
      
      1 convolutional layer to generate the first adjusted feature map having a volume of N·
      
      1·
      
      K resulting from a width of N, a height of 1, and a channel of K, and instructs the second 1×
      
      1 convolutional layer to generate the second adjusted feature map having a volume of N·
      
      1·
      
      L resulting from the width of N, the height of 1, and a channel of L.
  - 5. The method of claim 4, wherein, at the step of (c), the learning device (i) instructs the second transposing layer to, convert the second adjusted feature map into the pixel-wise feature maps per each of the proposals having a volume of 1·
    - 1·
      
      L, resulting from a width of 1, a height of 1, and a channel of L, corresponding to each of N proposals, or (ii) instructs the classifying layer to convert the second adjusted feature map into the pixel-wise feature maps per each of the proposals having the volume of 1·
      
      1·
      
      L, resulting from the width of 1, the height of 1, and the channel of L, corresponding to each of the N proposals.
  - 6. The method of claim 1, wherein the classifying layer uses at least one softmax algorithm.
  - 7. The method of claim 1, wherein the detecting layer uses at least one non-maximum suppression algorithm.

8. A method for testing an object detector based on a CNN, comprising steps of:
- (a) on condition that a learning device (1) (i) has instructed one or more convolutional layers to generate at least one initial feature map for training by applying one or more convolution operations to at least one training image, (ii) has instructed an RPN to generate one or more proposals for training corresponding to each of one or more objects for training in the training image by using the initial feature map for training, and (iii) (iii-1) has instructed a pooling layer to apply one or more pooling operations to each region, corresponding to each of the proposals for training, on the initial feature map for training, to thereby generate pooled feature maps for training per each of the proposals for training, and has instructed a first transposing layer to concatenate each of pixels, per each of the proposals for training, in each of corresponding same locations on the pooled feature maps for training per each of the proposals for training, to thereby generate an integrated feature map for training, or (iii-2) has instructed the pooling layer to apply the pooling operations to each region, corresponding to each of the proposals for training, on the initial feature map for training, to thereby generate the pooled feature maps for training per each of the proposals for training, and has instructed the pooling layer to concatenate each of the pixels, per each of the proposals for training, in each of the corresponding same locations on the pooled feature maps for training per each of the proposals for training, to thereby generate the integrated feature map for training, (2) has instructed a first 1×
  
  1 convolutional layer to apply a 1×
  
  1 convolution operation to the integrated feature map for training, to thereby generate a first adjusted feature map for training whose volume is adjusted, and has instructed a second 1×
  
  1 convolutional layer to apply the 1×
  
  1 convolution operation to the first adjusted feature map for training, to thereby generate a second adjusted feature map for training whose volume is adjusted, and (3) (3-1) (i) has instructed a second transposing layer to divide the second adjusted feature map for training by each of the pixels, to thereby generate pixel-wise feature maps for training per each of the proposals for training, and has instructed a classifying layer to generate object class information for training on each of the proposals for training by using the pixel-wise feature maps for training per each of the proposals for training, or (ii) has instructed the classifying layer to divide the second adjusted feature map for training by each of the pixels, to thereby generate the pixel-wise feature maps for training per each of the proposals for training, and has instructed the classifying layer to generate the object class information for training on each of the proposals for training by using the pixel-wise feature maps for training per each of the proposals for training, (3-2) has instructed a detecting layer to generate object detection information for training corresponding to the objects for training in the training image by referring to the object class information for training and the pixel-wise feature maps for training per each of the proposals for training, and (3-3) has instructed a detection loss layer to calculate one or more object detection losses by referring to the object detection information for training and its corresponding GT, to thereby learn at least part of parameters of the second 1×
  
  1 convolutional layer, the first 1×
  
  1 convolutional layer, and the convolutional layers by backpropagating the object detection losses;
  
  a testing device, if at least one test image is acquired, (i) instructing the convolutional layers to generate at least one initial feature map for testing by applying the convolution operations to the test image, (ii) instructing the RPN to generate one or more proposals for testing corresponding to each of one or more objects for testing in the test image by using the initial feature map for testing, and (iii) (iii-1) instructing the pooling layer to apply the pooling operations to each region, corresponding to each of the proposals for testing, on the initial feature map for testing, to thereby generate pooled feature maps for testing per each of the proposals for testing, and instructing a first transposing layer to concatenate each of pixels, per each of the proposals for testing, in each of corresponding same locations on the pooled feature maps for testing per each of the proposals for testing, to thereby generate an integrated feature map for testing, or (iii-2) instructing the pooling layer to apply the pooling operations to each region, corresponding to each of the proposals for testing, on the initial feature map for testing, to thereby generate the pooled feature maps for testing per each of the proposals for testing, and instructing the pooling layer to concatenate each of the pixels, per each of the proposals for testing, in each of the corresponding same locations on the pooled feature maps for testing per each of the proposals for testing, to thereby generate the integrated feature map for testing;
  
  (b) the testing device instructing the first 1×
  
  1 convolutional layer to apply the 1×
  
  1 convolution operation to the integrated feature map for testing, to thereby generate a first adjusted feature map for testing whose volume is adjusted, and instructing the second 1×
  
  1 convolutional layer to apply the 1×
  
  1 convolution operation to the first adjusted feature map for testing, to thereby generate a second adjusted feature map for testing whose volume is adjusted; and
  
  (c) the testing device (c1) (i) instructing the second transposing layer to divide the second adjusted feature map for testing by each of the pixels, to thereby generate pixel-wise feature maps for testing per each of the proposals for testing, and instructing the classifying layer to generate object class information for testing on each of the proposals for testing by using the pixel-wise feature maps for testing per each of the proposals for testing, or (ii) instructing the classifying layer to divide the second adjusted feature map for testing by each of the pixels, to thereby generate the pixel-wise feature maps for testing per each of the proposals for testing, and instructing the classifying layer to generate the object class information for testing on each of the proposals for testing by using the pixel-wise feature maps for testing per each of the proposals for testing, and (c2) instructing the detecting layer to generate object detection information for testing corresponding to the objects for testing in the test image by referring to the object class information for testing and the pixel-wise feature maps for testing per each of the proposals for testing.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method of claim 8, wherein, after the process of (1), the learning device has instructed an RPN loss layer to calculate one or more RPN losses by referring to the proposals for testing and their corresponding ground truths, to thereby learn at least part of parameters of the RPN by backpropagating the RPN losses.
  - 10. The method of claim 8, wherein, supposing that the number of the proposals for testing is N, and that a width of the pooled feature maps for testing per each of the proposals for testing is M1 and a height thereof is M2, and that the number of channels of the pooled feature maps for testing per each of the proposals for testing is J, at the step of (a), the testing device (i) instructs the first transposing layer to convert the pooled feature maps for testing per each of the proposals for testing into the integrated feature map for testing having a width of N, a height of 1, and a channel of M1·
    - M2·
      
      J, or (ii) instructs the pooling layer to convert the pooled feature maps for testing per each of the proposals for testing into the integrated feature map for testing having the width of N, the height of 1, and the channel of M1·
      
      M2·
      
      J.
  - 11. The method of claim 10, wherein, supposing that the number of filters in the first 1×
    - 1 convolutional layer is K, and that the number of filters in the second 1×
      
      1 convolutional layer is L, at the step of (b), the testing device instructs the first 1×
      
      1 convolutional layer to generate the first adjusted feature map for testing having a volume of N·
      
      1·
      
      K resulting from a width of N, a height of 1, and a channel of K, and instructs the second 1×
      
      1 convolutional layer to generate the second adjusted feature map for testing having a volume of N·
      
      1·
      
      L resulting from the width of N, the height of 1, and a channel of L.
  - 12. The method of claim 11, wherein, at the step of (c), the testing device (i) instructs the second transposing layer to convert the second adjusted feature map for testing into the pixel-wise feature maps for testing per each of the proposals for testing having a volume of 1·
    - 1·
      
      L, resulting from a width of 1, a height of 1, and a channel of L, corresponding to each of N proposals for testing, or (ii) instructs the classifying layer to convert the second adjusted feature map for testing into the pixel-wise feature maps for testing per each of the proposals for testing having the volume of 1·
      
      1·
      
      L, resulting from the width of 1, the height of 1, and the channel of L, corresponding to each of the N proposals for testing.
  - 13. The method of claim 8, wherein the classifying layer uses at least one softmax algorithm.
  - 14. The method of claim 8, wherein the detecting layer uses at least one non-maximum suppression algorithm.

15. A learning device for learning parameters of an object detector based on a CNN, comprising:
- at least one memory that stores instructions; and
  
  at least one processor configured to execute the instructions to;
  
  perform processes of (I) (i) instructing one or more convolutional layers to generate at least one initial feature map by applying one or more convolution operations to at least one training image, (ii) instructing an RPN to generate one or more proposals corresponding to each of one or more objects in the training image by using the initial feature map, and (iii) (iii-1) instructing a pooling layer to apply one or more pooling operations to each region, corresponding to each of the proposals, on the initial feature map, to thereby generate pooled feature maps per each of the proposals, and instructing a first transposing layer to concatenate each of pixels, per each of the proposals, in each of corresponding same locations on the pooled feature maps per each of the proposals, to thereby generate an integrated feature map, or (iii-2) instructing the pooling layer to apply the pooling operations to each region, corresponding to each of the proposals, on the initial feature map, to thereby generate the pooled feature maps per each of the proposals, and instructing the pooling layer to concatenate each of the pixels, per each of the proposals, in each of the corresponding same locations on the pooled feature maps per each of the proposals, to thereby generate the integrated feature map, (II) instructing a first 1×
  
  1 convolutional layer to apply a 1×
  
  1 convolution operation to the integrated feature map, to thereby generate a first adjusted feature map whose volume is adjusted, and instructing a second 1×
  
  1 convolutional layer to apply the 1×
  
  1 convolution operation to the first adjusted feature map, to thereby generate a second adjusted feature map whose volume is adjusted, and (III) (III-1) (i) instructing a second transposing layer to divide the second adjusted feature map by each of the pixels, to thereby generate pixel-wise feature maps per each of the proposals, and instructing a classifying layer to generate object class information on each of the proposals by using the pixel-wise feature maps per each of the proposals, or (ii) instructing the classifying layer to divide the second adjusted feature map by each of the pixels, to thereby generate the pixel-wise feature maps per each of the proposals, and instructing the classifying layer to generate the object class information on each of the proposals by using the pixel-wise feature maps per each of the proposals, (III-2) instructing a detecting layer to generate object detection information corresponding to the objects in the training image by referring to the object class information and the pixel-wise feature maps per each of the proposals, and (III-3) instructing a detection loss layer to calculate one or more object detection losses by referring to the object detection information and its corresponding GT, to thereby learn at least part of parameters of the second 1×
  
  1 convolutional layer, the first 1×
  
  1 convolutional layer, and the convolutional layers by backpropagating the object detection losses.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The learning device of claim 15, wherein, after the process of (I), the processor instructs an RPN loss layer to calculate one or more RPN losses by referring to the proposals and their corresponding ground truths, to thereby learn at least part of parameters of the RPN by backpropagating the RPN losses.
  - 17. The learning device of claim 15, wherein, supposing that the number of the proposals is N, and that a width of the pooled feature maps per each of the proposals is M1 and a height thereof is M2, and that the number of channels of the pooled feature maps per each of the proposals is J, at the process of (I), the processor (i) instructs the first transposing layer to convert the pooled feature maps per each of the proposals into the integrated feature map having a width of N, a height of 1, and a channel of M1·
    - M2·
      
      J, or (ii) instructs the pooling layer to convert the pooled feature maps per each of the proposals into the integrated feature map having the width of N, the height of 1, and the channel of M1·
      
      M2·
      
      J.
  - 18. The learning device of claim 17, wherein, supposing that the number of filters in the first 1×
    - 1 convolutional layer is K, and that the number of filters in the second 1×
      
      1 convolutional layer is L, at the process of (II), the processor instructs the first 1×
      
      1 convolutional layer to generate the first adjusted feature map having a volume of N·
      
      1·
      
      K resulting from a width of N, a height of 1, and a channel of K, and instructs the second 1×
      
      1 convolutional layer to generate the second adjusted feature map having a volume of N·
      
      1·
      
      L resulting from the width of N, the height of 1, and a channel of L.
  - 19. The learning device of claim 18, wherein, at the process of (III), the processor (i) instructs the second transposing layer to convert the second adjusted feature map into the pixel-wise feature maps per each of the proposals having a volume of 1·
    - 1·
      
      L, resulting from a width of 1, a height of 1, and a channel of L, corresponding to each of N proposals, or (ii) instructs the classifying layer to convert the second adjusted feature map into the pixel-wise feature maps per each of the proposals having the volume of 1·
      
      1·
      
      L, resulting from the width of 1, the height of 1, and the channel of L, corresponding to each of the N proposals.
  - 20. The learning device of claim 15, wherein the classifying layer uses at least one softmax algorithm.
  - 21. The learning device of claim 15, wherein the detecting layer uses at least one non-maximum suppression algorithm.

22. A testing device for testing an object detector based on a CNN, comprising:
- at least one memory that stores instructions; and
  
  at least one processor, on condition that a learning device (1) (i) has instructed one or more convolutional layers to generate at least one initial feature map for training by applying one or more convolution operations to at least one training image, (ii) has instructed an RPN to generate one or more proposals for training corresponding to each of one or more objects for training in the training image by using the initial feature map for training, and (iii) (iii-1) has instructed a pooling layer to apply one or more pooling operations to each region, corresponding to each of the proposals for training, on the initial feature map for training, to thereby generate pooled feature maps for training per each of the proposals for training, and has instructed a first transposing layer to concatenate each of pixels, per each of the proposals for training, in each of corresponding same locations on the pooled feature maps for training per each of the proposals for training, to thereby generate an integrated feature map for training, or (iii-2) has instructed the pooling layer to apply the pooling operations to each region, corresponding to each of the proposals for training, on the initial feature map for training, to thereby generate the pooled feature maps for training per each of the proposals for training, and has instructed the pooling layer to concatenate each of the pixels, per each of the proposals for training, in each of the corresponding same locations on the pooled feature maps for training per each of the proposals for training, to thereby generate the integrated feature map for training, (2) has instructed a first 1×
  
  1 convolutional layer to apply a 1×
  
  1 convolution operation to the integrated feature map for training, to thereby generate a first adjusted feature map for training whose volume is adjusted, and has instructed a second 1×
  
  1 convolutional layer to apply the 1×
  
  1 convolution operation to the first adjusted feature map for training, to thereby generate a second adjusted feature map for training whose volume is adjusted, and (3) (3-1) (i) has instructed a second transposing layer to divide the second adjusted feature map for training by each of the pixels, to thereby generate pixel-wise feature maps for training per each of the proposals for training, and has instructed a classifying layer to generate object class information for training on each of the proposals for training by using the pixel-wise feature maps for training per each of the proposals for training, or (ii) has instructed the classifying layer to divide the second adjusted feature map for training by each of the pixels, to thereby generate the pixel-wise feature maps for training per each of the proposals for training, and has instructed the classifying layer to generate the object class information for training on each of the proposals for training by using the pixel-wise feature maps for training per each of the proposals for training, (3-2) has instructed a detecting layer to generate object detection information for training corresponding to the objects for training in the training image by referring to the object class information for training and the pixel-wise feature maps for training per each of the proposals for training, and (3-3) has instructed a detection loss layer to calculate one or more object detection losses by referring to the object detection information for training and its corresponding GT, to thereby learn at least part of parameters of the second 1×
  
  1 convolutional layer, the first 1×
  
  1 convolutional layer, and the convolutional layers by backpropagating the object detection losses;
  
  configured to execute the instructions to;
  
  perform processes of (I) (i) instructing the convolutional layers to generate at least one initial feature map for testing by applying the convolution operations to at least one test image, (ii) instructing the RPN to generate one or more proposals for testing corresponding to each of one or more objects for testing in the test image by using the initial feature map for testing, and (iii) (iii-1) instructing the pooling layer to apply the pooling operations to each region, corresponding to each of the proposals for testing, on the initial feature map for testing, to thereby generate pooled feature maps for testing per each of the proposals for testing, and instructing a first transposing layer to concatenate each of pixels, per each of the proposals for testing, in each of corresponding same locations on the pooled feature maps for testing per each of the proposals for testing, to thereby generate an integrated feature map for testing, or (iii-2) instructing the pooling layer to apply the pooling operations to each region, corresponding to each of the proposals for testing, on the initial feature map for testing, to thereby generate the pooled feature maps for testing per each of the proposals for testing, and instructing the pooling layer to concatenate each of the pixels, per each of the proposals for testing, in each of the corresponding same locations on the pooled feature maps for testing per each of the proposals for testing, to thereby generate the integrated feature map for testing, (II) instructing the first 1×
  
  1 convolutional layer to apply the 1×
  
  1 convolution operation to the integrated feature map for testing, to thereby generate a first adjusted feature map for testing whose volume is adjusted, and instructing the second 1×
  
  1 convolutional layer to apply the 1×
  
  1 convolution operation to the first adjusted feature map for testing, to thereby generate a second adjusted feature map for testing whose volume is adjusted, and (III) (III-1) (i) instructing the second transposing layer to divide the second adjusted feature map for testing by each of the pixels, to thereby generate pixel-wise feature maps for testing per each of the proposals for testing, and instructing the classifying layer to generate object class information for testing on each of the proposals for testing by using the pixel-wise feature maps for testing per each of the proposals for testing, or (ii) instructing the classifying layer to divide the second adjusted feature map for testing by each of the pixels, to thereby generate the pixel-wise feature maps for testing per each of the proposals for testing, and instructing the classifying layer to generate the object class information for testing on each of the proposals for testing by using the pixel-wise feature maps for testing per each of the proposals for testing, and (III-2) instructing the detecting layer to generate object detection information for testing corresponding to the objects for testing in the test image by referring to the object class information for testing and the pixel-wise feature maps for testing per each of the proposals for testing.
- View Dependent Claims (23, 24, 25, 26, 27, 28)
- - 23. The testing device of claim 22, wherein, after the process of (1), the learning device has instructed an RPN loss layer to calculate one or more RPN losses by referring to the proposals for testing and their corresponding ground truths, to thereby learn at least part of parameters of the RPN by backpropagating the RPN losses.
  - 24. The testing device of claim 22, wherein, supposing that the number of the proposals for testing is N, and that a width of the pooled feature maps for testing per each of the proposals for testing is M1 and a height thereof is M2, and that the number of channels of the pooled feature maps for testing per each of the proposals for testing is J, at the process of (I), the processor (i) instructs the first transposing layer to convert the pooled feature maps for testing per each of the proposals for testing into the integrated feature map for testing having a width of N, a height of 1, and a channel of M1·
    - M2·
      
      J, or (ii) instructs the pooling layer to convert the pooled feature maps for testing per each of the proposals for testing into the integrated feature map for testing having the width of N, the height of 1, and the channel of M1·
      
      M2·
      
      J.
  - 25. The testing device of claim 24, wherein, supposing that the number of filters in the first 1×
    - 1 convolutional layer is K, and that the number of filters in the second 1×
      
      1 convolutional layer is L, at the process of (II), the processor instructs the first 1×
      
      1 convolutional layer to generate the first adjusted feature map for testing having a volume of N·
      
      1·
      
      K resulting from a width of N, a height of 1, and a channel of K, and instructs the second 1×
      
      1 convolutional layer to generate the second adjusted feature map for testing having a volume of N·
      
      1·
      
      L resulting from the width of N, the height of 1, and a channel of L.
  - 26. The testing device of claim 25, wherein, at the process of (III), the processor (i) instructs the second transposing layer to convert the second adjusted feature map for testing into the pixel-wise feature maps for testing per each of the proposals for testing having a volume of 1·
    - 1·
      
      L, resulting from a width of 1, a height of 1, and a channel of L, corresponding to each of N proposals for testing, or (ii) instructs the classifying layer to convert the second adjusted feature map for testing into the pixel-wise feature maps for testing per each of the proposals for testing having the volume of 1·
      
      1·
      
      L, resulting from the width of 1, the height of 1, and the channel of L, corresponding to each of the N proposals for testing.
  - 27. The testing device of claim 22, wherein the classifying layer uses at least one softmax algorithm.
  - 28. The testing device of claim 22, wherein the detecting layer uses at least one non-maximum suppression algorithm.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Stradvision, Inc.
Original Assignee
Stradvision, Inc.
Inventors
Kim, Kye-Hyeon, Kim, Yongjoong, Kim, Insu, Kim, Hak-Kyoung, Nam, Woonhyun, Boo, SukHoon, Sung, Myungchul, Yeo, Donghun, Ryu, Wooju, Jang, Taewoong, Jeong, Kyungjoong, Je, Hongmo, Cho, Hojin
Primary Examiner(s)
Strege, John B

Application Number

US16/254,887
Time in Patent Office

216 Days
Field of Search
US Class Current
CPC Class Codes

G06F 18/2148   characterised by the proces...

G06F 18/217   Validation; Performance eva...

G06F 18/241   relating to the classificat...

G06N 3/045   Combinations of networks

G06N 3/084   Backpropagation, e.g. using...

G06V 10/764   using classification, e.g. ...

G06V 10/82   using neural networks

G06V 20/10   Terrestrial scenes scenes u...

G06V 2201/06   Recognition of objects for ...

Learning method and learning device for object detector based on CNN using 1×1 convolution to be used for hardware optimization, and testing method and testing device using the same

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Learning method and learning device for object detector based on CNN using 1×1 convolution to be used for hardware optimization, and testing method and testing device using the same

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links