Method and System for Detecting Actions in Videos

US 20170255832A1
Filed: 03/02/2016
Published: 09/07/2017
Est. Priority Date: 03/02/2016
Status: Active Grant

First Claim

Patent Images

1. A method for detecting actions of an object in a scene, comprising steps:

acquiring a video of the scene as a sequence of images, wherein each image includes pixels, wherein the video is partitioned into chunks;

tracking the object in the video, and for each object and each chunk of the video, further comprising;

determining trajectories of the pixels within a bounding box located over the object;

using the bounding box to produce cropped trajectories and cropped images for one or more images in the chunk; and

passing the cropped trajectories and cropped images to a recurrent neural network (RNN) that outputs a relative score for each action of interest, wherein the steps are performed in a processor.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system detects actions of an object in a scene by first acquiring a video of the scene as a sequence of images, wherein each image includes pixels, wherein the video is partitioned into chunks. The object in the video is tracked. For each object and each chunk of the video, trajectories of the pixels within a bounding box located over the object are tracked, and cropped trajectories and cropped images for one or more images in the chunk are produced using the bounding box. Then, the cropped trajectories and cropped images are passed to a recurrent neural network (RNN) that outputs a relative score for each action of interest.

Citations

22 Claims

1. A method for detecting actions of an object in a scene, comprising steps:
- acquiring a video of the scene as a sequence of images, wherein each image includes pixels, wherein the video is partitioned into chunks;
  
  tracking the object in the video, and for each object and each chunk of the video, further comprising;
  
  determining trajectories of the pixels within a bounding box located over the object;
  
  using the bounding box to produce cropped trajectories and cropped images for one or more images in the chunk; and
  
  passing the cropped trajectories and cropped images to a recurrent neural network (RNN) that outputs a relative score for each action of interest, wherein the steps are performed in a processor.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 2. The method of claim 1, wherein the RNN includes convolutional neural network layers and one or more recurrent neural network layers.
  - 3. The method of claim 2, wherein the convolutional neural network layers operate on multiple streams, including the cropped trajectories and the cropped images as well as trajectories and images that have an entire spatial extent of the video.
  - 4. The method of claim 2, wherein the recurrent neural network layers include Long Short-Term Memory (LSTM) cells.
  - 5. The method of claim 3, wherein the recurrent neural network layers include bi-directional Long Short-Term Memory LSTM cells.
  - 6. The method of claim 1, wherein the trajectories are encoded as pixel trajectories.
  - 7. The method of claim 1, wherein the trajectories are encodes as stacked optical flow.
  - 8. The method of claim 1, wherein the tracking includes selecting a bounding box that maximizes a magnitude of the stacked optical flow inside the bounding box.
  - 9. The method of claim 8, wherein the tracking further comprises:
    - updating a location of the bounding box if a magnitude of the stacked optical flow inside the bounding box is greater than a threshold.
  - 10. The method of claim 1, wherein the trajectories for the pixels are determined from a central image in the chunk to each of K previous and K subsequent images.
  - 11. The method of claim 10, wherein K is 3.
  - 12. The method of claim 10, wherein a motion pattern for each pixel is determined using a 1×
    - 2K convolutional kernel.
  - 13. The method of claim 1, wherein the method is used for fine-grained action detection in the video.
  - 14. The method of claim 1, wherein the method includes training the RNN prior to the detecting.
  - 15. The method of claim 1, wherein the RNN has been previously trained.
  - 16. The method of claim 1, wherein the detecting comprises temporal action detection.
  - 17. The method of claim 1, wherein the detecting comprises spatio-temporal action detection.
  - 18. The method of claim 1, wherein the video is initially acquired in some form other than a sequence of images, and is converted to a sequence of images.
  - 19. The method of claim 1, in which the object is a person.
  - 20. The method of claim 1, in which the object is a robot.
  - 21. The method of claim 1, in which the object is an industrial robot.

22. A system for detecting actions of an object in a scene, comprising:
- means for acquiring a video of the scene as a sequence of images, wherein each image includes pixels, wherein the video is partitioned into chunks; and
  
  a processor configured to track the object in the video, and for each object and each chunk of the video, and wherein the processor is further configured to, for each object and each chunk of the video, determine trajectories of the pixels within a bounding box located over the object, use the bounding box to produce cropped trajectories and cropped images for one or more images in the chunk, and pass the cropped trajectories and cropped images to a recurrent neural network (RNN) that outputs a relative score for each action of interest.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Mitsubishi Electric Research Laboratories, Inc. (Mitsubishi Electric Corporation)
Original Assignee
Mitsubishi Electric Research Laboratories, Inc. (Mitsubishi Electric Corporation)
Inventors
Jones, Michael J., Marks, Tim, Tuzel, Oncel, Singh, Bharat

Granted Patent

US 10,242,266 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

G06T 2207/10016   Video; Image sequence

G06T 2207/20081   Training; Learning

G06T 2207/30232   Surveillance

G06T 7/20   Analysis of motion motion e...

G06V 10/454   Integrating the filters int...

G06V 10/82   using neural networks

G06V 20/41   Higher-level, semantic clus...

G06V 20/52   Surveillance or monitoring ...

G06V 2201/06   Recognition of objects for ...

G06V 40/20   Movements or behaviour, e.g...

Method and System for Detecting Actions in Videos

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Method and System for Detecting Actions in Videos

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links