Learning to geolocate videos
First Claim
1. A computer-implemented method for training video location classifiers, the method comprising:
- storing a set of locations, each location uniquely corresponding to a geographic area having a unique geographic placement;
providing a user interface for uploading a video, the user interface comprising a user interface element for specifying locations from the stored set of locations;
receiving, from users via the user interface, a set of uploaded videos, each uploaded video labeled with a location from the stored set of locations, the location specified using the user interface;
selecting, for each of a plurality of the locations, a location training set comprising ones of the uploaded videos that are labeled with the location;
for each of a plurality of video location classifiers, each video location classifier associated with one of the locations;
for each uploaded video of the location training set for the associated location, deriving a set of features associated with the uploaded video, the set of features comprising;
audiovisual features extracted from content of the uploaded video;
upload location information derived from an internet protocol (IP) address from which the uploaded video was uploaded;
landmark scores indicating whether the uploaded video contains landmark features, the landmark scores being produced by applying trained landmark classifiers to the uploaded video;
category scores indicating whether the uploaded video represents predetermined categories, the category scores produced by category classifiers that are trained based at least in part on a set of videos considered to represent the categories; and
textual features derived from metadata of the uploaded video;
training the video location classifier based at least in part on the features derived from the uploaded videos in the location training set;
for an unlabeled video not labeled with a location from the stored set of locations, and for a first one of the trained video location classifiers;
deriving a set of features comprising audiovisual features extracted from content of the unlabeled video, upload location information derived from the IP address from which the video was uploaded, landmark scores indicating whether the unlabeled video contains landmark features, category scores indicating whether the unlabeled video represents predetermined categories, and textual features derived from metadata of the unlabeled video;
applying the first one of the trained video location classifiers to the set of features derived for the unlabeled video, thereby producing a location score indicating how strongly the unlabeled video represents the location associated with the first one of the trained video location classifiers;
predicting based on the location score, that the unlabeled video represents the location associated with the first one of the trained video location classifiers; and
providing, to a user, a visual representation of a map, the map including a visual indication of the unlabeled video on a portion of the map corresponding to the location associated with the first one of the trained video location classifiers.
2 Assignments
0 Petitions
Accused Products
Abstract
A classifier training system trains classifiers for inferring the geographic locations of videos. A number of classifiers are provided, where each classifier corresponds to a particular location and is trained from a training set of videos that have been labeled as representing the location. In one embodiment, the training set is further restricted to those videos in which a landmark matching the location label is detected. The classifier training system extracts, from each of these videos, features that characterize the video, such as audiovisual features, text features, address features, landmark features, and category features. Based on these features, the classifier training system trains a location classifier for the corresponding location.
Each of the location classifiers can be applied to videos without associated location labels to predict whether, or how strongly, the video represents the corresponding location. The prediction can be used for a variety of purposes, such as automatic labeling of videos with locations, presentation of location-specific advertisements in association with videos, and display of video data on relevant portions of an electronic map.
31 Citations
20 Claims
-
1. A computer-implemented method for training video location classifiers, the method comprising:
-
storing a set of locations, each location uniquely corresponding to a geographic area having a unique geographic placement; providing a user interface for uploading a video, the user interface comprising a user interface element for specifying locations from the stored set of locations; receiving, from users via the user interface, a set of uploaded videos, each uploaded video labeled with a location from the stored set of locations, the location specified using the user interface; selecting, for each of a plurality of the locations, a location training set comprising ones of the uploaded videos that are labeled with the location; for each of a plurality of video location classifiers, each video location classifier associated with one of the locations; for each uploaded video of the location training set for the associated location, deriving a set of features associated with the uploaded video, the set of features comprising; audiovisual features extracted from content of the uploaded video; upload location information derived from an internet protocol (IP) address from which the uploaded video was uploaded; landmark scores indicating whether the uploaded video contains landmark features, the landmark scores being produced by applying trained landmark classifiers to the uploaded video; category scores indicating whether the uploaded video represents predetermined categories, the category scores produced by category classifiers that are trained based at least in part on a set of videos considered to represent the categories; and textual features derived from metadata of the uploaded video; training the video location classifier based at least in part on the features derived from the uploaded videos in the location training set; for an unlabeled video not labeled with a location from the stored set of locations, and for a first one of the trained video location classifiers; deriving a set of features comprising audiovisual features extracted from content of the unlabeled video, upload location information derived from the IP address from which the video was uploaded, landmark scores indicating whether the unlabeled video contains landmark features, category scores indicating whether the unlabeled video represents predetermined categories, and textual features derived from metadata of the unlabeled video; applying the first one of the trained video location classifiers to the set of features derived for the unlabeled video, thereby producing a location score indicating how strongly the unlabeled video represents the location associated with the first one of the trained video location classifiers; predicting based on the location score, that the unlabeled video represents the location associated with the first one of the trained video location classifiers; and providing, to a user, a visual representation of a map, the map including a visual indication of the unlabeled video on a portion of the map corresponding to the location associated with the first one of the trained video location classifiers. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-usable non-transitory medium having executable computer program instructions embodied therein for training video location classifiers, actions of the computer program instructions comprising:
-
storing a set of locations, each location uniquely corresponding to a geographic area having a unique geographic placement; providing a user interface for uploading a video, the user interface comprising a user interface element for specifying locations from the stored set of locations; receiving, from users via the user interface, a set of uploaded videos, each uploaded video labeled with a location from the stored set of locations, the location specified using the user interface; selecting, for each of a plurality of the locations, a location training set comprising ones of the uploaded videos that are labeled with the location; for each of a plurality of video location classifiers, each video location classifier associated with one of the locations; for each uploaded video of the location training set for the associated location, deriving a set of features associated with the uploaded video, the set of features comprising; audiovisual features extracted from content of the uploaded video; upload location information derived from an internet protocol (IP) address from which the uploaded video was uploaded; landmark scores indicating whether the uploaded video contains landmark features, the landmark scores being produced by applying trained landmark classifiers to the uploaded video; category scores indicating whether the uploaded video represents predetermined categories, the category scores produced by category classifiers that are trained based at least in part on a set of videos considered to represent the categories; and textual features derived from metadata of the uploaded video; training the video location classifier based at least in part on the features derived from the uploaded videos in the location training set; for an unlabeled video not labeled with a location from the stored set of locations, and for a first one of the trained video location classifiers; deriving a set of features comprising audiovisual features extracted from content of the unlabeled video, upload location information derived from the IP address from which the video was uploaded, landmark scores indicating whether the unlabeled video contains landmark features, category scores indicating whether the unlabeled video represents predetermined categories, and textual features derived from metadata of the unlabeled video; applying the first one of the trained video location classifiers to the set of features derived for the unlabeled video, thereby producing a location score indicating how strongly the unlabeled video represents the location associated with the first one of the trained video location classifiers; predicting based on the location score, that the unlabeled video represents the location associated with the first one of the trained video location classifiers; and providing, to a user, a visual representation of a map, the map including a visual indication of the unlabeled video on a portion of the map corresponding to the location associated with the first one of the trained video location classifiers. - View Dependent Claims (12, 14, 15, 16, 17, 18, 19, 20)
-
-
13. A computer system for training video location classifiers, the system comprising:
-
a computer processor; and a computer-readable storage medium storing data comprising; a set of locations, each location uniquely corresponding to a geographic area having a unique geographic placement, a set of uploaded videos labeled as representing one or more of the locations, and a computer program executable by the computer processor and performing actions comprising; providing a user interface for uploading a video, the user interface comprising a user interface element for specifying locations from the stored set of locations; receiving, from users via the user interface, a set of uploaded videos, each uploaded video labeled with a location from the stored set of locations, the location specified using the user interface; storing the set of uploaded videos; selecting, for each of a plurality of the locations, a location training set comprising ones of the uploaded videos that are labeled with the location; for each of a plurality of video location classifiers, each video location classifier associated with one of the locations; for each uploaded video of the location training set for the associated location, deriving a set of features associated with the uploaded video, the set of features comprising; audiovisual features extracted from content of the uploaded video; upload location information derived from an internet protocol (IP) address from which the uploaded video was uploaded; landmark scores indicating whether the uploaded video contains landmark features, the landmark scores being produced by applying trained landmark classifiers to the uploaded video; category scores indicating whether the uploaded video represents predetermined categories, the category scores produced by category classifiers that are trained based at least in part on a set of videos considered to represent the categories; and textual features derived from metadata of the uploaded video; generating the video location classifier based at least in part on the features derived from the uploaded videos in the location training set for an unlabeled video not labeled with a location from the stored set of locations, and for a first one of the trained video location classifiers; deriving a set of features comprising audiovisual features extracted from content of the unlabeled video, upload location information derived from the IP address from which the video was uploaded, landmark scores indicating whether the unlabeled video contains landmark features, category scores indicating whether the unlabeled video represents predetermined categories, and textual features derived from metadata of the unlabeled video; applying the first one of the trained video location classifiers to the set of features derived for the unlabeled video, thereby producing a location score indicating how strongly the unlabeled video represents the location associated with the first one of the trained video location classifiers; predicting based on the location score, that the unlabeled video represents the location associated with the first one of the trained video location classifiers; and providing, to a user, a visual representation of a map, the map including a visual indication of the unlabeled video on a portion of the map corresponding to the location associated with the first one of the trained video location classifiers.
-
Specification