Method and System for Detecting Actions in Videos
First Claim
1. A method for detecting actions of an object in a scene, comprising steps:
- acquiring a video of the scene as a sequence of images, wherein each image includes pixels, wherein the video is partitioned into chunks;
tracking the object in the video, and for each object and each chunk of the video, further comprising;
determining trajectories of the pixels within a bounding box located over the object;
using the bounding box to produce cropped trajectories and cropped images for one or more images in the chunk; and
passing the cropped trajectories and cropped images to a recurrent neural network (RNN) that outputs a relative score for each action of interest, wherein the steps are performed in a processor.
0 Assignments
0 Petitions
Accused Products
Abstract
A method and system detects actions of an object in a scene by first acquiring a video of the scene as a sequence of images, wherein each image includes pixels, wherein the video is partitioned into chunks. The object in the video is tracked. For each object and each chunk of the video, trajectories of the pixels within a bounding box located over the object are tracked, and cropped trajectories and cropped images for one or more images in the chunk are produced using the bounding box. Then, the cropped trajectories and cropped images are passed to a recurrent neural network (RNN) that outputs a relative score for each action of interest.
-
Citations
22 Claims
-
1. A method for detecting actions of an object in a scene, comprising steps:
-
acquiring a video of the scene as a sequence of images, wherein each image includes pixels, wherein the video is partitioned into chunks; tracking the object in the video, and for each object and each chunk of the video, further comprising; determining trajectories of the pixels within a bounding box located over the object; using the bounding box to produce cropped trajectories and cropped images for one or more images in the chunk; and passing the cropped trajectories and cropped images to a recurrent neural network (RNN) that outputs a relative score for each action of interest, wherein the steps are performed in a processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A system for detecting actions of an object in a scene, comprising:
-
means for acquiring a video of the scene as a sequence of images, wherein each image includes pixels, wherein the video is partitioned into chunks; and a processor configured to track the object in the video, and for each object and each chunk of the video, and wherein the processor is further configured to, for each object and each chunk of the video, determine trajectories of the pixels within a bounding box located over the object, use the bounding box to produce cropped trajectories and cropped images for one or more images in the chunk, and pass the cropped trajectories and cropped images to a recurrent neural network (RNN) that outputs a relative score for each action of interest.
-
Specification