High-Confidence Labeling of Video Volumes in a Video Sharing Service

US 20130114902A1
Filed: 08/31/2012
Published: 05/09/2013
Est. Priority Date: 11/04/2011
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

identifying, in a plurality of digital videos, a plurality of candidate volumes representing spatio-temporal segments of the digital videos, wherein each of the candidate volumes corresponds to a contiguous sequence of spatial portions of the video frames having a starting time and an ending time, and potentially represents a discrete object or action within the video frames;

determining, for each of the identified candidate volumes, features characterizing the candidate volume, wherein the features are determined from visual properties of the spatial portions of the video frames contained in the candidate volumes; and

assigning a verified label to each volume of a plurality of the identified candidate volumes using the determined features, the verified label indicating a particular object or action represented by the volume to which the label is assigned.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A volume identification system identifies a set of unlabeled spatio-temporal volumes within each of a set of videos, each volume representing a distinct object or action. The volume identification system further determines, for each of the videos, a set of volume-level features characterizing the volume as a whole. In one embodiment, the features are based on a codebook and describe the temporal and spatial relationships of different codebook entries of the volume. The volume identification system uses the volume-level features, in conjunction with existing labels assigned to the videos as a whole, to label with high confidence some subset of the identified volumes, e.g., by employing consistency learning or training and application of weak volume classifiers.

The labeled volumes may be used for a number of applications, such as training strong volume classifiers, improving video search (including locating individual volumes), and creating composite videos based on identified volumes.

54 Citations

View as Search Results

21 Claims

1. A computer-implemented method comprising:
- identifying, in a plurality of digital videos, a plurality of candidate volumes representing spatio-temporal segments of the digital videos, wherein each of the candidate volumes corresponds to a contiguous sequence of spatial portions of the video frames having a starting time and an ending time, and potentially represents a discrete object or action within the video frames;
  
  determining, for each of the identified candidate volumes, features characterizing the candidate volume, wherein the features are determined from visual properties of the spatial portions of the video frames contained in the candidate volumes; and
  
  assigning a verified label to each volume of a plurality of the identified candidate volumes using the determined features, the verified label indicating a particular object or action represented by the volume to which the label is assigned.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The computer-implemented method of claim 1, wherein identifying the candidate volumes in the digital videos comprises:
    - stabilizing the digital videos using a video stabilization algorithm; and
      
      identifying, as a stable segment, a contiguous sequence of frames in one of the digital videos in which a degree of background motion is below a threshold, using a measure of background motion produced by the video stabilization algorithm.
  - 3. The computer-implemented method of claim 2, wherein identifying the candidate volumes in the digital videos further comprises extracting a candidate volume from the identified stable segment using hierarchical graph-based video segmentation.
  - 4. The computer-implemented method of claim 1, wherein determining features characterizing an identified candidate volume comprises:
    - dividing each of a plurality of the candidate volumes into fixed-length segments;
      
      for each of the determined fixed-length segments, determining a segment feature vector characterizing visual properties of the segment;
      
      forming a feature codebook by clustering the segment feature vectors; and
      
      determining features for the candidate volume using the feature codebook.
  - 5. The computer-implemented method of claim 1, wherein determining features characterizing an identified candidate volume comprises:
    - dividing the identified candidate volume into fixed-length segments;
      
      determining a segment feature vector for each of the segments;
      
      mapping each segment feature vector to a most similar codebook entry from a codebook of feature vectors; and
      
      forming a feature vector characterizing the candidate volume by normalizing the set of most similar codebook entries.
  - 6. The computer-implemented method of claim 1, wherein determining features characterizing an identified candidate volume comprises:
    - dividing the candidate volume into fixed-length segments;
      
      determining a segment feature vector for each of the segments;
      
      for at least one of the segment feature vectors;
      
      mapping each of a plurality of individual elements of the corresponding feature to a most similar codebook entry from a codebook of feature vectors; and
      
      forming, for the at least one of the segment feature vectors, a codebook entry histogram from the most similar codebook entries.
  - 7. The computer-implemented method of claim 1, wherein ones of the plurality of digital videos are associated with labels, and wherein assigning a verified label to a volume comprises:
    - associating, with each of the candidate volumes, preliminary labels associated with the digital video in which the candidate volume was identified;
      
      clustering the candidate volumes according to their determined features;
      
      determining, for each of the clusters, a degree of label consistency of the preliminary labels associated with the candidate volumes in the cluster; and
      
      assigning, as a verified label, to a candidate volume in a cluster with at least a given threshold degree of label consistency, a label with a high degree of occurrence in the cluster.
  - 8. The computer-implemented method of claim 1, wherein ones of the plurality of digital videos are associated with labels, and wherein assigning a verified label to a volume comprises:
    - associating, with each of the candidate volumes, preliminary labels associated with the digital video in which the candidate volume was identified;
      
      forming a union of all of the preliminary labels of all of the candidate volumes;
      
      for each label of a plurality of preliminary labels of the union;
      
      identifying the candidate volumes associated with the label;
      
      clustering the identified candidate volumes according to their determined features;
      
      determining, for each of the clusters, a degree of label consistency; and
      
      assigning the label as a verified label to a candidate volume responsive to the candidate volume belonging to a cluster with at least a given threshold degree of label consistency and being associated with the label.
  - 9. The computer-implemented method of claim 1, wherein ones of the plurality of digital videos are associated with labels, and wherein assigning a verified label to a volume comprises:
    - associating, with each of the candidate volumes, preliminary labels associated with the digital video in which the candidate volume was identified;
      
      forming a union of all of the preliminary labels of all of the candidate volumes;
      
      for each label of a plurality of the preliminary labels;
      
      identifying a training set of candidate volumes having the label;
      
      training a weak volume classifier using the determined features of the candidate volumes in the training set;
      
      obtaining candidate volume scores by applying the weak volume classifier to the candidate volumes;
      
      for each of the candidate volumes for which the corresponding score has at least some given threshold value, assigning the label to the candidate volume as a verified label.
  - 10. The computer-implemented method of claim 1, further comprising:
    - for a first one of the verified labels;
      
      forming a training set of candidate volumes to which the first one of the verified labels has been assigned; and
      
      training a classifier using the determined features of the candidate volumes in the training set, the classifier configured to receive a candidate volume and to output a score indicating a degree of likelihood that the candidate volume contains a representation of the verified label.
  - 11. The computer-implemented method of claim 10, further comprising:
    - identifying a volume in a digital video;
      
      obtaining a score by applying one of the trained classifiers to the volume; and
      
      responsive to the score having at least some given threshold value, adding the label corresponding to the classifier to video metadata of the video.
  - 12. The computer-implemented method of claim 1, further comprising:
    - receiving a textual search query;
      
      determining a match degree to which volumes of a digital video have a verified label matching the search query; and
      
      including the digital video in a search result set of videos matching the search query, based at least in part on the match degree.
  - 13. The computer-implemented method of claim 1, further comprising:
    - receiving a textual search query; and
      
      responsive to a volume of a digital video in a search result set of video data matching the textual search query having a verified label matching the search query, including the volume in a search result set of video data matching the search query.
  - 14. The computer-implemented method of claim 13, wherein the search result set includes a representation of the volume such that, when the representation is selected on a client device, playback of the video begins at a beginning frame of the volume.
  - 15. The computer-implemented method of claim 1, further comprising:
    - receiving, from a client device, an indication of a volume being displayed on the client device and a request for volumes similar to the displayed volume;
      
      identifying a verified label assigned to the indicated volume;
      
      identifying other volumes that also have the identified verified label; and
      
      providing, to the client device, a result set including the identified other volumes.
  - 16. The computer-implemented method of claim 15, further comprising providing to the client a visual representation of a digital video containing the volume, wherein the displayed volume is visually emphasized in the visual representation by at least one of an outline, a highlight, and obscuring portions of frames of the video not including the displayed volume.

17. A computer-implemented method comprising:
- transmitting from a client device a textual search query over a network to a video sharing service;
  
  obtaining at the client device, from the video sharing service, a video data result set comprising indicia of a plurality of volumes of a digital video, each volume representing a contiguous sequence of spatial portions of the video frames having a starting time and an ending time, representing a discrete object or action within the video frames, which discrete object or action has a label corresponding to the textual search query; and
  
  displaying the indicia separately in a user interface.

18. A computer-readable storage medium having executable computer program instructions embodied therein, actions of the computer program comprising:
- identifying, in a plurality of digital videos, a plurality of candidate volumes representing spatio-temporal segments of the digital videos, wherein each of the candidate volumes corresponds to a contiguous sequence of spatial portions of the video frames having a starting time and an ending time, and potentially represents a discrete object or action within the video frames;
  
  determining, for each of the identified candidate volumes, features characterizing the candidate volume, wherein the features are determined from visual properties of the spatial portions of the video frames contained in the candidate volumes; and
  
  assigning a verified label to each volume of a plurality of the identified candidate volumes using the determined features, the verified label indicating a particular object or action represented by the volume to which the label is assigned.
- View Dependent Claims (19)
- - 19. The computer-readable storage medium of claim 18, the actions further comprising:
    - receiving a textual search query; and
      
      responsive to a volume of a digital video in a search result set of video data matching the textual search query having a verified label matching the search query, including the volume in a search result set of video data matching the search query.

20. A computer system comprising:
- a computer processor; and
  
  a computer-readable storage medium having executable computer program instructions embodied therein that when executed by the computer processor perform actions comprising;
  
  identifying, in a plurality of digital videos, a plurality of candidate volumes representing spatio-temporal segments of the digital videos, wherein each of the candidate volumes corresponds to a contiguous sequence of spatial portions of the video frames having a starting time and an ending time, and potentially represents a discrete object or action within the video frames;
  
  determining, for each of the identified candidate volumes, features characterizing the candidate volume, wherein the features are determined from visual properties of the spatial portions of the video frames contained in the candidate volumes; and
  
  assigning a verified label to each volume of a plurality of the identified candidate volumes using the determined features, the verified label indicating a particular object or action represented by the volume to which the label is assigned.

21. A computer-implemented method comprising:
- accessing a plurality of digital videos having video-level metadata; and
  
  using the video-level metadata to;
  
  automatically identify a temporally contiguous sequence of spatial portions of a first one of the videos, the temporally contiguous sequence of spatial portions representing both a temporal subset of the video and a spatial subset of the video, andautomatically label the temporally contiguous sequence of spatial portions with an identifier of an object or action represented by the temporally contiguous sequence of spatial portions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Sukthankar, Rahul, Yagnik, Jay

Granted Patent

US 8,983,192 B2
Time in Patent Office

Days
Field of Search
US Class Current

382/190
CPC Class Codes

G06V 20/41   Higher-level, semantic clus...

G06V 20/46   Extracting features or char...

G06V 20/70   Labelling scene content, e....

H04N 21/23418   involving operations for an...

H04N 9/8205   involving the multiplexing ...

High-Confidence Labeling of Video Volumes in a Video Sharing Service

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

54 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

High-Confidence Labeling of Video Volumes in a Video Sharing Service

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

54 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links