Learning to geolocate videos

US 8,990,134 B1
Filed: 09/13/2010
Issued: 03/24/2015
Est. Priority Date: 09/13/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for training video location classifiers, the method comprising:

storing a set of locations, each location uniquely corresponding to a geographic area having a unique geographic placement;

providing a user interface for uploading a video, the user interface comprising a user interface element for specifying locations from the stored set of locations;

receiving, from users via the user interface, a set of uploaded videos, each uploaded video labeled with a location from the stored set of locations, the location specified using the user interface;

selecting, for each of a plurality of the locations, a location training set comprising ones of the uploaded videos that are labeled with the location;

for each of a plurality of video location classifiers, each video location classifier associated with one of the locations;

for each uploaded video of the location training set for the associated location, deriving a set of features associated with the uploaded video, the set of features comprising;

audiovisual features extracted from content of the uploaded video;

upload location information derived from an internet protocol (IP) address from which the uploaded video was uploaded;

landmark scores indicating whether the uploaded video contains landmark features, the landmark scores being produced by applying trained landmark classifiers to the uploaded video;

category scores indicating whether the uploaded video represents predetermined categories, the category scores produced by category classifiers that are trained based at least in part on a set of videos considered to represent the categories; and

textual features derived from metadata of the uploaded video;

training the video location classifier based at least in part on the features derived from the uploaded videos in the location training set;

for an unlabeled video not labeled with a location from the stored set of locations, and for a first one of the trained video location classifiers;

deriving a set of features comprising audiovisual features extracted from content of the unlabeled video, upload location information derived from the IP address from which the video was uploaded, landmark scores indicating whether the unlabeled video contains landmark features, category scores indicating whether the unlabeled video represents predetermined categories, and textual features derived from metadata of the unlabeled video;

applying the first one of the trained video location classifiers to the set of features derived for the unlabeled video, thereby producing a location score indicating how strongly the unlabeled video represents the location associated with the first one of the trained video location classifiers;

predicting based on the location score, that the unlabeled video represents the location associated with the first one of the trained video location classifiers; and

providing, to a user, a visual representation of a map, the map including a visual indication of the unlabeled video on a portion of the map corresponding to the location associated with the first one of the trained video location classifiers.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A classifier training system trains classifiers for inferring the geographic locations of videos. A number of classifiers are provided, where each classifier corresponds to a particular location and is trained from a training set of videos that have been labeled as representing the location. In one embodiment, the training set is further restricted to those videos in which a landmark matching the location label is detected. The classifier training system extracts, from each of these videos, features that characterize the video, such as audiovisual features, text features, address features, landmark features, and category features. Based on these features, the classifier training system trains a location classifier for the corresponding location.

Each of the location classifiers can be applied to videos without associated location labels to predict whether, or how strongly, the video represents the corresponding location. The prediction can be used for a variety of purposes, such as automatic labeling of videos with locations, presentation of location-specific advertisements in association with videos, and display of video data on relevant portions of an electronic map.

31 Citations

View as Search Results

20 Claims

1. A computer-implemented method for training video location classifiers, the method comprising:
- storing a set of locations, each location uniquely corresponding to a geographic area having a unique geographic placement;
  
  providing a user interface for uploading a video, the user interface comprising a user interface element for specifying locations from the stored set of locations;
  
  receiving, from users via the user interface, a set of uploaded videos, each uploaded video labeled with a location from the stored set of locations, the location specified using the user interface;
  
  selecting, for each of a plurality of the locations, a location training set comprising ones of the uploaded videos that are labeled with the location;
  
  for each of a plurality of video location classifiers, each video location classifier associated with one of the locations;
  
  for each uploaded video of the location training set for the associated location, deriving a set of features associated with the uploaded video, the set of features comprising;
  
  audiovisual features extracted from content of the uploaded video;
  
  upload location information derived from an internet protocol (IP) address from which the uploaded video was uploaded;
  
  landmark scores indicating whether the uploaded video contains landmark features, the landmark scores being produced by applying trained landmark classifiers to the uploaded video;
  
  category scores indicating whether the uploaded video represents predetermined categories, the category scores produced by category classifiers that are trained based at least in part on a set of videos considered to represent the categories; and
  
  textual features derived from metadata of the uploaded video;
  
  training the video location classifier based at least in part on the features derived from the uploaded videos in the location training set;
  
  for an unlabeled video not labeled with a location from the stored set of locations, and for a first one of the trained video location classifiers;
  
  deriving a set of features comprising audiovisual features extracted from content of the unlabeled video, upload location information derived from the IP address from which the video was uploaded, landmark scores indicating whether the unlabeled video contains landmark features, category scores indicating whether the unlabeled video represents predetermined categories, and textual features derived from metadata of the unlabeled video;
  
  applying the first one of the trained video location classifiers to the set of features derived for the unlabeled video, thereby producing a location score indicating how strongly the unlabeled video represents the location associated with the first one of the trained video location classifiers;
  
  predicting based on the location score, that the unlabeled video represents the location associated with the first one of the trained video location classifiers; and
  
  providing, to a user, a visual representation of a map, the map including a visual indication of the unlabeled video on a portion of the map corresponding to the location associated with the first one of the trained video location classifiers.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The computer-implemented method of claim 1, wherein the user interface element comprises an electronic map, such that clicking on a portion of the electronic map specifies, as the location of the uploaded video, coordinates corresponding to the portion.
  - 3. The computer-implemented method of claim 1, wherein the user interface element comprises a text area for specifying the location of the uploaded video via a text string describing the location.
  - 4. The computer-implemented method of claim 1, further comprising identifying a set of unique locations specified via the user interface element for specifying a location and storing the identified set of unique locations as the stored set of locations.
  - 5. The computer-implemented method of claim 1, wherein the stored set of locations comprises a manually specified, hierarchically arranged set of locations.
  - 6. The computer-implemented method of claim 1, further comprising:
    - responsive to predicting that the unlabeled video represents the location, prompting a user with a recommendation to label the unlabeled video with the location.
  - 7. The computer-implemented method of claim 1, further comprising:
    - responsive to predicting that the unlabeled video represents the location, adding a label corresponding to the location to metadata of the unlabeled video.
  - 8. The computer-implemented method of claim 1, further comprising:
    - receiving a query from a user for videos, the query comprising text associated with the location;
      
      responsive to determining that the unlabeled video represents the location associated with the first one of the video location classifiers, adding the video to a query result set; and
      
      providing the query result set to the user.
  - 9. The computer-implemented method of claim 1, further comprising:
    - receiving from a user a request to view the unlabeled video; and
      
      responsive to determining that the unlabeled video represents the location associated with the first one of the video location classifiers;
      
      selecting an advertisement associated with the location;
      
      providing the unlabeled video and the advertisement to the user.
  - 10. The computer-implemented method of claim 1, further comprising limiting the location training set by:
    - detecting a landmark in a video;
      
      identifying a location corresponding to the detected landmark; and
      
      excluding the video from the location training set responsive to the location corresponding to the detected landmark differing from the location which the video is labeled as representing.

11. A computer-usable non-transitory medium having executable computer program instructions embodied therein for training video location classifiers, actions of the computer program instructions comprising:
- storing a set of locations, each location uniquely corresponding to a geographic area having a unique geographic placement;
  
  providing a user interface for uploading a video, the user interface comprising a user interface element for specifying locations from the stored set of locations;
  
  receiving, from users via the user interface, a set of uploaded videos, each uploaded video labeled with a location from the stored set of locations, the location specified using the user interface;
  
  selecting, for each of a plurality of the locations, a location training set comprising ones of the uploaded videos that are labeled with the location;
  
  for each of a plurality of video location classifiers, each video location classifier associated with one of the locations;
  
  for each uploaded video of the location training set for the associated location, deriving a set of features associated with the uploaded video, the set of features comprising;
  
  audiovisual features extracted from content of the uploaded video;
  
  upload location information derived from an internet protocol (IP) address from which the uploaded video was uploaded;
  
  landmark scores indicating whether the uploaded video contains landmark features, the landmark scores being produced by applying trained landmark classifiers to the uploaded video;
  
  category scores indicating whether the uploaded video represents predetermined categories, the category scores produced by category classifiers that are trained based at least in part on a set of videos considered to represent the categories; and
  
  textual features derived from metadata of the uploaded video;
  
  training the video location classifier based at least in part on the features derived from the uploaded videos in the location training set;
  
  for an unlabeled video not labeled with a location from the stored set of locations, and for a first one of the trained video location classifiers;
  
  deriving a set of features comprising audiovisual features extracted from content of the unlabeled video, upload location information derived from the IP address from which the video was uploaded, landmark scores indicating whether the unlabeled video contains landmark features, category scores indicating whether the unlabeled video represents predetermined categories, and textual features derived from metadata of the unlabeled video;
  
  applying the first one of the trained video location classifiers to the set of features derived for the unlabeled video, thereby producing a location score indicating how strongly the unlabeled video represents the location associated with the first one of the trained video location classifiers;
  
  predicting based on the location score, that the unlabeled video represents the location associated with the first one of the trained video location classifiers; and
  
  providing, to a user, a visual representation of a map, the map including a visual indication of the unlabeled video on a portion of the map corresponding to the location associated with the first one of the trained video location classifiers.
- View Dependent Claims (12, 14, 15, 16, 17, 18, 19, 20)
- - 12. The computer-usable non-transitory medium of claim 11, wherein the user interface element comprises an electronic map, such that clicking on a portion of the electronic map specifies, as the location of the uploaded video, coordinates corresponding to the portion.
  - 14. The computer-usable non-transitory medium of claim 11, wherein the user interface element comprises a text area for specifying the location of the uploaded video via a text string describing the location.
  - 15. The computer-usable non-transitory medium of claim 11, the actions further comprising identifying a set of unique locations specified via the user interface element for specifying a location and storing the identified set of unique locations as the stored set of locations.
  - 16. The computer-usable non-transitory medium of claim 11, wherein the stored set of locations comprises a manually specified, hierarchically arranged set of locations.
  - 17. The computer-usable non-transitory medium of claim 11, the actions further comprising:
    - responsive to predicting that the unlabeled video represents the location, prompting a user with a recommendation to label the unlabeled video with the location.
  - 18. The computer-usable non-transitory medium of claim 11, the actions further comprising:
    - responsive to predicting that the unlabeled video represents the location, adding a label corresponding to the location to metadata of the unlabeled video.
  - 19. The computer-usable non-transitory medium of claim 11, the actions further comprising:
    - receiving a query from a user for videos, the query comprising text associated with the location;
      
      responsive to determining that the unlabeled video represents the location associated with the first one of the video location classifiers, adding the video to a query result set; and
      
      providing the query result set to the user.
  - 20. The computer-usable non-transitory medium of claim 11, the actions further comprising:
    - receiving from a user a request to view the unlabeled video; and
      
      responsive to determining that the unlabeled video represents the location associated with the first one of the video location classifiers;
      
      selecting an advertisement associated with the location;
      
      providing the unlabeled video and the advertisement to the user.

13. A computer system for training video location classifiers, the system comprising:
- a computer processor; and
  
  a computer-readable storage medium storing data comprising;
  
  a set of locations, each location uniquely corresponding to a geographic area having a unique geographic placement,a set of uploaded videos labeled as representing one or more of the locations, anda computer program executable by the computer processor and performing actions comprising;
  
  providing a user interface for uploading a video, the user interface comprising a user interface element for specifying locations from the stored set of locations;
  
  receiving, from users via the user interface, a set of uploaded videos, each uploaded video labeled with a location from the stored set of locations, the location specified using the user interface;
  
  storing the set of uploaded videos;
  
  selecting, for each of a plurality of the locations, a location training set comprising ones of the uploaded videos that are labeled with the location;
  
  for each of a plurality of video location classifiers, each video location classifier associated with one of the locations;
  
  for each uploaded video of the location training set for the associated location, deriving a set of features associated with the uploaded video, the set of features comprising;
  
  audiovisual features extracted from content of the uploaded video;
  
  upload location information derived from an internet protocol (IP) address from which the uploaded video was uploaded;
  
  landmark scores indicating whether the uploaded video contains landmark features, the landmark scores being produced by applying trained landmark classifiers to the uploaded video;
  
  category scores indicating whether the uploaded video represents predetermined categories, the category scores produced by category classifiers that are trained based at least in part on a set of videos considered to represent the categories; and
  
  textual features derived from metadata of the uploaded video;
  
  generating the video location classifier based at least in part on the features derived from the uploaded videos in the location training setfor an unlabeled video not labeled with a location from the stored set of locations, and for a first one of the trained video location classifiers;
  
  deriving a set of features comprising audiovisual features extracted from content of the unlabeled video, upload location information derived from the IP address from which the video was uploaded, landmark scores indicating whether the unlabeled video contains landmark features, category scores indicating whether the unlabeled video represents predetermined categories, and textual features derived from metadata of the unlabeled video;
  
  applying the first one of the trained video location classifiers to the set of features derived for the unlabeled video, thereby producing a location score indicating how strongly the unlabeled video represents the location associated with the first one of the trained video location classifiers;
  
  predicting based on the location score, that the unlabeled video represents the location associated with the first one of the trained video location classifiers; and
  
  providing, to a user, a visual representation of a map, the map including a visual indication of the unlabeled video on a portion of the map corresponding to the location associated with the first one of the trained video location classifiers.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Sbaiz, Luciano, Aradhye, Hrishikesh, Toderici, George, Snoek, Jasper
Primary Examiner(s)
Chaki, Kakali
Assistant Examiner(s)
Zidanic, Michael

Application Number

US12/881,078
Time in Patent Office

1,653 Days
Field of Search

None
US Class Current

706/45
CPC Class Codes

G06F 16/783 using metadata automaticall...

G06V 20/38 Outdoor scenes

Learning to geolocate videos

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

31 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Learning to geolocate videos

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

31 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links