CLUSTERING VIDEOS BY LOCATION

US 20100254614A1
Filed: 04/01/2009
Published: 10/07/2010
Est. Priority Date: 04/01/2009
Status: Active Grant

First Claim

Patent Images

1. In a computing environment, a method comprising, processing input video comprised of a plurality of shots, including determining similarity between shots indicative of whether the shots were captured in a same location, and using the similarity as part of a global energy function to cluster shots together by location.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Described is a technology in which video shots are clustered based upon the location at which the shots were captured. A global energy function is optimized, including a first term that computes clusters so as to be reasonably dense and well connected, to match the possible shots that are captured at a location, e.g., based on similarity scores between pairs of shots. A second term is a temporal prior that encourages subsequent shots to be placed in the same cluster. The shots may be represented as nodes of a minimum spanning tree having edges with weights that are based on the similarity score between the shots represented by their respective nodes. Agglomerative clustering is performed by selecting pairs of available clusters, merging the pairs and keeping the pair with the lowest cost. Clusters are iteratively merged until a stopping criterion or criteria is met (e.g., only a single cluster remains).

Citations

20 Claims

1. In a computing environment, a method comprising, processing input video comprised of a plurality of shots, including determining similarity between shots indicative of whether the shots were captured in a same location, and using the similarity as part of a global energy function to cluster shots together by location.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 wherein using the similarity as part of the global energy function comprises processing minimum spanning trees that represent the cost of clustering shots together.
  - 3. The method of claim 1 wherein the global energy function comprises a temporal prior term, and further comprising, applying the temporal prior term to penalize neighboring shots in a temporal sequence that are in different clusters.
  - 4. The method of claim 1 further comprising, separating the input video into a plurality of sets of frames, and selecting at least one keyframe from each set of frames as the shot or shots representative of that set.
  - 5. The method of claim 4 wherein the keyframe of the set comprises a frame that is centered or substantially centered in time within that set of frames.
  - 6. The method of claim 4 wherein selecting at least one keyframe comprises sampling a plurality of keyframes from the set of frames, and further comprising, initially clustering together the plurality of keyframes sampled from the set.
  - 7. The method of claim 1 wherein determining the similarity between the shots comprises determining a texton histogram for each of the shots.
  - 8. The method of claim 1 wherein determining the similarity between the shots comprises computing a vector representative of each of the shot, in which the vector emphasizes background information in the shot over foreground information in the shot.
  - 9. The method of claim 1 wherein using the similarity comprises selecting pairs of clusters, merging each pair into a merged candidate cluster, keeping the merged candidate cluster with a lowest cost, and iterating to further merge clusters until a stopping criterion or criteria is met.

10. In a computing environment, a system comprising, a clustering mechanism that clusters shots representative of video frames into clusters of shots having similar locations, including by optimizing a global energy function using agglomerative clustering based upon similarity scores between pairs of shots.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The system of claim 10 wherein the clustering mechanism further optimizes the global energy function based upon temporal consistency between shots.
  - 12. The system of claim 11 wherein the global energy function is based upon a sum of similarity score data and temporal consistency data, in which a weighting factor is used to control how much the similarity score data and temporal consistency data contribute to the sum relative to one another.
  - 13. The system of claim 10 wherein the clustering mechanism arranges the shots as nodes of a minimum spanning tree having edges with weights that are based at least in part on the similarity score between the shots represented by their respective nodes.
  - 14. The system of claim 10 wherein the clustering mechanism computes the similarity scores between two frames by determining a texton histogram for each frame.
  - 15. The system of claim 14 wherein the texton histograms are computed to emphasize background information of the frame relative to foreground information of the frame.
  - 16. The system of claim 10 wherein the clustering mechanism samples a plurality of shots from a set of frames that is separated by a shot boundary from another set of frames, and wherein the clustering mechanism clusters together the plurality of shots that is in that set before clustering them with a shot or shots of any other set of frames.
  - 17. The system of claim 10 wherein the clustering mechanism performs the agglomerative clustering by selecting pairs of clusters, merging each pair into a merged candidate cluster, keeping the merged candidate cluster with a lowest cost, and iterating to further merge clusters until a stopping criterion or criteria is met.

18. One or more computer-readable media having computer-executable instructions, which when executed perform steps, comprising:
- separating video into sets of frames based upon shot boundary detection;
  
  selecting at least one keyframe from each set of frames;
  
  computing a similarity score based on similarity between the keyframe or keyframes of each set;
  
  computing temporal data based upon whether a keyframe is temporally consistent with another keyframe; and
  
  using the similarity score and the temporal data to cluster shots, as represented by their keyframes, together.
- View Dependent Claims (19, 20)
- - 19. The one or more computer-readable media of claim 18 wherein the similarity score and the temporal data for a pair of keyframes correspond to a cost, and wherein using the similarity score and the temporal data to cluster shots comprises, selecting pairs of clusters, merging each pair into a merged candidate cluster, keeping the merged candidate cluster with a lowest cost, and iterating to further merge clusters until a stopping criterion or criteria is met.
  - 20. The one or more computer-readable media of claim 18 wherein computing the similarity score includes emphasizing background information of the keyframe relative to foreground information of the keyframe.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Schroff, Gerhard Florian, Zitnick, Charles Lawrence III, Baker, Simon J.

Granted Patent

US 8,184,913 B2
Time in Patent Office

Days
Field of Search
US Class Current

382/218
CPC Class Codes

G06F 16/70   of video data

G06F 16/7867   using information manually ...

G06F 16/787   using geographical or spati...

G06F 18/231   Hierarchical techniques, i....

G06V 10/7625   Hierarchical techniques, i....

G06V 20/41   Higher-level, semantic clus...

Y02D 10/00   Energy efficient computing,...

CLUSTERING VIDEOS BY LOCATION

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

CLUSTERING VIDEOS BY LOCATION

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links