High-Confidence Labeling of Video Volumes in a Video Sharing Service
First Claim
1. A computer-implemented method comprising:
- identifying, in a plurality of digital videos, a plurality of candidate volumes representing spatio-temporal segments of the digital videos, wherein each of the candidate volumes corresponds to a contiguous sequence of spatial portions of the video frames having a starting time and an ending time, and potentially represents a discrete object or action within the video frames;
determining, for each of the identified candidate volumes, features characterizing the candidate volume, wherein the features are determined from visual properties of the spatial portions of the video frames contained in the candidate volumes; and
assigning a verified label to each volume of a plurality of the identified candidate volumes using the determined features, the verified label indicating a particular object or action represented by the volume to which the label is assigned.
2 Assignments
0 Petitions
Accused Products
Abstract
A volume identification system identifies a set of unlabeled spatio-temporal volumes within each of a set of videos, each volume representing a distinct object or action. The volume identification system further determines, for each of the videos, a set of volume-level features characterizing the volume as a whole. In one embodiment, the features are based on a codebook and describe the temporal and spatial relationships of different codebook entries of the volume. The volume identification system uses the volume-level features, in conjunction with existing labels assigned to the videos as a whole, to label with high confidence some subset of the identified volumes, e.g., by employing consistency learning or training and application of weak volume classifiers.
The labeled volumes may be used for a number of applications, such as training strong volume classifiers, improving video search (including locating individual volumes), and creating composite videos based on identified volumes.
54 Citations
21 Claims
-
1. A computer-implemented method comprising:
-
identifying, in a plurality of digital videos, a plurality of candidate volumes representing spatio-temporal segments of the digital videos, wherein each of the candidate volumes corresponds to a contiguous sequence of spatial portions of the video frames having a starting time and an ending time, and potentially represents a discrete object or action within the video frames; determining, for each of the identified candidate volumes, features characterizing the candidate volume, wherein the features are determined from visual properties of the spatial portions of the video frames contained in the candidate volumes; and assigning a verified label to each volume of a plurality of the identified candidate volumes using the determined features, the verified label indicating a particular object or action represented by the volume to which the label is assigned. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer-implemented method comprising:
-
transmitting from a client device a textual search query over a network to a video sharing service; obtaining at the client device, from the video sharing service, a video data result set comprising indicia of a plurality of volumes of a digital video, each volume representing a contiguous sequence of spatial portions of the video frames having a starting time and an ending time, representing a discrete object or action within the video frames, which discrete object or action has a label corresponding to the textual search query; and displaying the indicia separately in a user interface.
-
-
18. A computer-readable storage medium having executable computer program instructions embodied therein, actions of the computer program comprising:
-
identifying, in a plurality of digital videos, a plurality of candidate volumes representing spatio-temporal segments of the digital videos, wherein each of the candidate volumes corresponds to a contiguous sequence of spatial portions of the video frames having a starting time and an ending time, and potentially represents a discrete object or action within the video frames; determining, for each of the identified candidate volumes, features characterizing the candidate volume, wherein the features are determined from visual properties of the spatial portions of the video frames contained in the candidate volumes; and assigning a verified label to each volume of a plurality of the identified candidate volumes using the determined features, the verified label indicating a particular object or action represented by the volume to which the label is assigned. - View Dependent Claims (19)
-
-
20. A computer system comprising:
-
a computer processor; and a computer-readable storage medium having executable computer program instructions embodied therein that when executed by the computer processor perform actions comprising; identifying, in a plurality of digital videos, a plurality of candidate volumes representing spatio-temporal segments of the digital videos, wherein each of the candidate volumes corresponds to a contiguous sequence of spatial portions of the video frames having a starting time and an ending time, and potentially represents a discrete object or action within the video frames; determining, for each of the identified candidate volumes, features characterizing the candidate volume, wherein the features are determined from visual properties of the spatial portions of the video frames contained in the candidate volumes; and assigning a verified label to each volume of a plurality of the identified candidate volumes using the determined features, the verified label indicating a particular object or action represented by the volume to which the label is assigned.
-
-
21. A computer-implemented method comprising:
-
accessing a plurality of digital videos having video-level metadata; and using the video-level metadata to; automatically identify a temporally contiguous sequence of spatial portions of a first one of the videos, the temporally contiguous sequence of spatial portions representing both a temporal subset of the video and a spatial subset of the video, and automatically label the temporally contiguous sequence of spatial portions with an identifier of an object or action represented by the temporally contiguous sequence of spatial portions.
-
Specification