Systems and methods for classifying activities captured within images

US 10,185,895 B1
Filed: 03/23/2017
Issued: 01/22/2019
Est. Priority Date: 03/23/2017
Status: Active Grant

First Claim

Patent Images

1. A system for classifying activities captured within images, the system comprising:

one or more physical processors configured by machine-readable instructions to;

access an image, the image including a visual capture of a scene;

process the image through a convolutional neural network, the convolutional neural network generating a set of two-dimensional feature maps based on the image;

process the set of two-dimensional feature maps through a contextual long short-term memory unit, the contextual long short-term memory unit generating a set of two-dimensional outputs based on the set of two-dimensional feature maps, wherein the contextual long short-term memory unit includes a loss function characterized by a non-overlapping loss, an entropy loss, and a cross-entropy loss and the non-overlapping loss, the entropy loss, and the cross-entropy loss are combined into the loss function through a linear combination with a first hyper parameter for the non-overlapping loss, a second hyper parameter for the entropy loss, and a third hyper parameter for the cross-entropy loss;

generate a set of attention-masks for the image based on the set of two-dimensional outputs and the set of two-dimensional feature maps, the set of attention-masks defining dimensional portions of the image; and

classify the scene based on the set of two-dimensional outputs.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An image including a visual capture of a scene may be accessed. The image may be processed through a convolutional neural network. The convolutional neural network may generate a set of two-dimensional feature maps based on the image. The set of two-dimensional feature maps may be processed through a contextual long short-term memory unit. The contextual long short-term memory unit may generate a set of two-dimensional outputs based on the set of two-dimensional feature maps. A set of attention-masks for the image may be generated based on the set of two-dimensional outputs and the set of two-dimensional feature maps. The set of attention-masks may define dimensional portions of the image. The scene may be classified based on the two-dimensional outputs.

172 Citations

16 Claims

1. A system for classifying activities captured within images, the system comprising:
- one or more physical processors configured by machine-readable instructions to;
  
  access an image, the image including a visual capture of a scene;
  
  process the image through a convolutional neural network, the convolutional neural network generating a set of two-dimensional feature maps based on the image;
  
  process the set of two-dimensional feature maps through a contextual long short-term memory unit, the contextual long short-term memory unit generating a set of two-dimensional outputs based on the set of two-dimensional feature maps, wherein the contextual long short-term memory unit includes a loss function characterized by a non-overlapping loss, an entropy loss, and a cross-entropy loss and the non-overlapping loss, the entropy loss, and the cross-entropy loss are combined into the loss function through a linear combination with a first hyper parameter for the non-overlapping loss, a second hyper parameter for the entropy loss, and a third hyper parameter for the cross-entropy loss;
  
  generate a set of attention-masks for the image based on the set of two-dimensional outputs and the set of two-dimensional feature maps, the set of attention-masks defining dimensional portions of the image; and
  
  classify the scene based on the set of two-dimensional outputs.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The system of claim 1, wherein the convolutional neural network includes a plurality of convolution layers, and the set of two-dimensional feature maps is generated by a last convolution layer in the convolutional neural network.
  - 3. The system of claim 2, wherein the set of two-dimensional feature maps is obtained from the convolutional neural network before the set of two-dimensional feature maps is flattened.
  - 4. The system of claim 1, wherein the set of two-dimensional outputs is used to visualize the dimensional portions of the image.
  - 5. The system of claim 1, wherein the set of two-dimensional outputs is used to constrain the dimensional portions of the image.
  - 6. The system of claim 1, wherein the classification of the scene is performed by a fully connected layer that takes as input the set of two-dimensional outputs.
  - 7. The system of claim 1, wherein the loss function discourages the set of attention masks defining a same dimensional portion of the image across multiple time-steps.

8. A method for classifying activities captured within images, the method comprising:
- accessing an image, the image including a visual capture of a scene;
  
  processing the image through a convolutional neural network, the convolutional neural network generating a set of two-dimensional feature maps based on the image;
  
  processing the set of two-dimensional feature maps through a contextual long short-term memory unit, the contextual long short-term memory unit generating a set of two-dimensional outputs based on the set of two-dimensional feature maps, wherein the contextual long short-term memory unit includes a loss function characterized by a non-overlapping loss, an entropy loss, and a cross-entropy loss and the non-overlapping loss, the entropy loss, and the cross-entropy loss are combined into the loss function through a linear combination with a first hyper parameter for the non-overlapping loss, a second hyper parameter for the entropy loss, and a third hyper parameter for the cross-entropy loss;
  
  generating a set of attention-masks for the image based on the set of two-dimensional outputs and the set of two-dimensional feature maps, the set of attention-masks defining dimensional portions of the image; and
  
  classifying the scene based on the set of two-dimensional outputs.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method of claim 8, wherein the convolutional neural network includes a plurality of convolution layers, and the set of two-dimensional feature maps is generated by a last convolution layer in the convolutional neural network.
  - 10. The method of claim 9, wherein the set of two-dimensional feature maps is obtained from the convolutional neural network before the set of two-dimensional feature maps is flattened.
  - 11. The method of claim 8, wherein the set of two-dimensional outputs is used to visualize the dimensional portions of the image.
  - 12. The method of claim 8, wherein the set of two-dimensional outputs is used to constrain the dimensional portions of the image.
  - 13. The method of claim 8, wherein the classification of the scene is performed by a fully connected layer that takes as input the set of two-dimensional outputs.
  - 14. The method of claim 8, wherein the loss function discourages the set of attention masks defining a same dimensional portion of the image across multiple time-steps.

15. A system for classifying activities captured within images, the system comprising:
- one or more physical processors configured by machine-readable instructions to;
  
  access an image, the image including a visual capture of a scene;
  
  process the image through a convolutional neural network, the convolutional neural network generating a set of two-dimensional feature maps based on the image;
  
  process the set of two-dimensional feature maps through a contextual long short-term memory unit, the contextual long short-term memory unit generating a set of two-dimensional outputs based on the set of two-dimensional feature maps, wherein;
  
  the contextual long short-term memory unit includes a loss function characterized by a non-overlapping loss, an entropy loss, and a cross-entropy loss; and
  
  the non-overlapping loss, the entropy loss, and the cross-entropy loss are combined into the loss function through a linear combination with a first hyper parameter for the non-overlapping loss, a second hyper parameter for the entropy loss, and a third hyper parameter for the cross-entropy loss;
  
  generate a set of attention-masks for the image based on the set of two-dimensional outputs and the set of two-dimensional feature maps, the set of attention-masks defining dimensional portions of the image, wherein the loss function discourages the set of attention masks defining a same dimensional portion of the image across multiple time-steps; and
  
  classify the scene based on the set of two-dimensional outputs.
- View Dependent Claims (16)
- - 16. The system of claim 15, wherein the convolutional neural network includes a plurality of convolution layers, and the set of two-dimensional feature maps is generated by a last convolution layer in the convolutional neural network and is obtained from the convolutional neural network before the set of two-dimensional feature maps is flattened.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
GoPro, Inc.
Original Assignee
GoPro, Inc.
Inventors
Tse, Daniel, Chik, Desmond, Wu, Guanhang
Primary Examiner(s)
Safaipour, Bobbak

Application Number

US15/467,546
Time in Patent Office

670 Days
Field of Search
US Class Current
CPC Class Codes

G06F 18/2414   Smoothing the distance, e.g...

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06T 2207/20084   Artificial neural networks ...

G06T 5/20   using local operators

G06V 10/454   Integrating the filters int...

G06V 10/764   using classification, e.g. ...

G06V 10/82   using neural networks

G06V 20/41   Higher-level, semantic clus...

Systems and methods for classifying activities captured within images

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

172 Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for classifying activities captured within images

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

172 Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links