Method and apparatus for annotating a video stream comprising a sequence of frames
First Claim
Patent Images
1. A method of training an image recognition tool for detecting images of a person:
- scanning a first frame in a video stream comprising a sequence of frames for images of a person;
generatinq a representation of the region of interest of the first frame likely to contain the image of the person;
forming a video track comprising the representation of a region of interest of the first frame likely to contain an image of the person;
scanning each subsequent frame in the sequence of frames for images of the person in each subsequent frame, wherein the scanning each frame begins at a location in each frame based on a location of the region of interest of a preceding frame;
for each subsequent frame in the sequence of frames;
generating a representation of the region of interest of the subsequent frame likely to contain the image of the person;
adding, to the video track, the representation of a region of interest of subsequent frame likely to contain the image of the person;
assigning a positive label to the video track when the representation of the region of interest in at least one of the first frame and the subsequent frames contains the person and no other people, the positive label identifying the video track as corresponding to the person; and
designating each representation of the region of interest in the positively labeled video track as a positive instance and providing each representation of the region of interest in the positively labeled video track to the image recognition tool for training a multiple-instance learning algorithm of the image recognition tool.
3 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are disclosed herein for annotating video tracks obtained from video data streams. Video tracks are treated as positive if they contain at least one region of interest containing a particular person, and negative if the video track does not contain a region of interest containing the particular person. Visual similarity models are trained using the positive bags.
20 Citations
24 Claims
-
1. A method of training an image recognition tool for detecting images of a person:
-
scanning a first frame in a video stream comprising a sequence of frames for images of a person; generatinq a representation of the region of interest of the first frame likely to contain the image of the person; forming a video track comprising the representation of a region of interest of the first frame likely to contain an image of the person; scanning each subsequent frame in the sequence of frames for images of the person in each subsequent frame, wherein the scanning each frame begins at a location in each frame based on a location of the region of interest of a preceding frame; for each subsequent frame in the sequence of frames; generating a representation of the region of interest of the subsequent frame likely to contain the image of the person; adding, to the video track, the representation of a region of interest of subsequent frame likely to contain the image of the person; assigning a positive label to the video track when the representation of the region of interest in at least one of the first frame and the subsequent frames contains the person and no other people, the positive label identifying the video track as corresponding to the person; and designating each representation of the region of interest in the positively labeled video track as a positive instance and providing each representation of the region of interest in the positively labeled video track to the image recognition tool for training a multiple-instance learning algorithm of the image recognition tool. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for training an image recognition tool for detecting images of a person, the system comprising:
-
a processor; a memory containing computer-readable instructions for execution by said processor, said instructions comprising; video analytics instructions for producing a video track, the video analytics instructions comprising; human body detection instructions for scanning image data in a video stream comprising a sequence of frames for a person and generating representations of regions of interest of frames in the sequence of frames likely to contain the image of the person; visual feature extraction instructions for adding, to the video track, representations of regions of interest of the sequence of frames likely to contain the person; human body tracking instructions for determining a starting location for said scanning in frames of said sequence based on a location of a region of interest in a preceding frame; labeling instructions for assigning a positive label to the video track when the representation of the region of interest in at least one of the first frame and the subsequent frames contains the person and no other people, the positive label identifying the video track as corresponding to the person; training instructions for designating each representation of the region of interest in the positively labeled video track as a positive instance and providing each representation of the region of interest in the positively labeled video track to the image recognition tool for training a multiple-instance learning algorithm of the image recognition tool; and a storage for storing the positively labeled video track and the trained image recognition tool. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A non-transitory computer-readable storage medium having stored thereon computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform a method of training an image recognition tool for detecting images of a person, the method comprising:
-
scanning a first frame in a video stream comprising a sequence of frames for images of a person; generating a representation of the region of interest of the first frame likely to contain the image of the person; forming a video track comprising the representation of a region of interest of the first frame likely to contain an image of the person; scanning each subsequent frame in the sequence of frames for images of the person, wherein the scanning each frame begins at a spatial location in each frame based on a location of the region of interest of a preceding frame; for each subsequent frame in the sequence of frames; generating a representation of the region of interest of the subsequent frame likely to contain the image of the person; adding, to the video track, the representation of a region of interest of the subsequent frame likely to contain the image of the person; assigning a positive label to the video track when the representation of the region of interest in at least one of the first frame and the subsequent frames contains the person and no other people, the positive label identifying the video track as corresponding to the person; and designating each representation of the region of interest in the positively labeled video track as a positive instance and providing each representation of the region of interest in the positively labeled video track to the image recognition tool for training a multiple-instance learning algorithm of the image recognition tool.
-
Specification