CLUSTERING VIDEOS BY LOCATION
First Claim
1. In a computing environment, a method comprising, processing input video comprised of a plurality of shots, including determining similarity between shots indicative of whether the shots were captured in a same location, and using the similarity as part of a global energy function to cluster shots together by location.
2 Assignments
0 Petitions
Accused Products
Abstract
Described is a technology in which video shots are clustered based upon the location at which the shots were captured. A global energy function is optimized, including a first term that computes clusters so as to be reasonably dense and well connected, to match the possible shots that are captured at a location, e.g., based on similarity scores between pairs of shots. A second term is a temporal prior that encourages subsequent shots to be placed in the same cluster. The shots may be represented as nodes of a minimum spanning tree having edges with weights that are based on the similarity score between the shots represented by their respective nodes. Agglomerative clustering is performed by selecting pairs of available clusters, merging the pairs and keeping the pair with the lowest cost. Clusters are iteratively merged until a stopping criterion or criteria is met (e.g., only a single cluster remains).
-
Citations
20 Claims
- 1. In a computing environment, a method comprising, processing input video comprised of a plurality of shots, including determining similarity between shots indicative of whether the shots were captured in a same location, and using the similarity as part of a global energy function to cluster shots together by location.
- 10. In a computing environment, a system comprising, a clustering mechanism that clusters shots representative of video frames into clusters of shots having similar locations, including by optimizing a global energy function using agglomerative clustering based upon similarity scores between pairs of shots.
-
18. One or more computer-readable media having computer-executable instructions, which when executed perform steps, comprising:
-
separating video into sets of frames based upon shot boundary detection; selecting at least one keyframe from each set of frames; computing a similarity score based on similarity between the keyframe or keyframes of each set; computing temporal data based upon whether a keyframe is temporally consistent with another keyframe; and using the similarity score and the temporal data to cluster shots, as represented by their keyframes, together. - View Dependent Claims (19, 20)
-
Specification