Combined-media scene tracking for audio-video summarization
First Claim
1. A method, comprising:
- determining two or more combined similarity measures by using two or more text similarities of two or more audio-video scenes and two or more video similarities of the two or more audio-video scenes; and
determining whether there are similar audio-video scenes in the two or more audio-video scenes by using the two or more combined similarity measures.
18 Assignments
0 Petitions
Accused Products
Abstract
? Techniques are presented for analyzing audio-video segments, usually from multiple sources. A combined similarity measure is determined from text similarities and video similarities. The text and video similarities measure similarity between audio-video scenes for text and video, respectively. The combined similarity measure is then used to determine similar scenes in the audio-video segments. When the audio-video segments are from multiple audio-video sources, the similar scenes are common scenes in the audio-video segments. Similarities may be converted to or measured by distance. Distance matrices may be determined by using the similarity matrices. The text and video distance matrices are normalized before the combined similarity matrix is determined. Clustering is performed using distance values determined from the combined similarity matrix. Resulting clusters are examined and a cluster is considered to represent a common scene between two or more different audio-video segments when scenes in the cluster are similar.
92 Citations
31 Claims
-
1. A method, comprising:
-
determining two or more combined similarity measures by using two or more text similarities of two or more audio-video scenes and two or more video similarities of the two or more audio-video scenes; and
determining whether there are similar audio-video scenes in the two or more audio-video scenes by using the two or more combined similarity measures. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A method for determining common audio-video segments, each audio-video segment comprised of a plurality of audio-video scenes, comprising:
-
combining a first metric and a second metric to create a third metric, wherein the first metric is based on comparisons of text segments from at least two audio-video scenes, and wherein the second metric is based on comparisons of video segments from the at least two audio-video scenes; and
selecting common audio-video segments by using the at least two audio-video scenes and the third metric.
-
-
23. A method for comparing and, if desired, summarizing audio-video segments, comprising:
-
determining a plurality of text similarities by comparing text portions from the audio-video scenes, each text portion corresponding to an audio-video scene;
determining a plurality of video similarities by comparing video portions from the audio-video scene, each video portion corresponding to an audio-video scene;
normalizing the text similarities;
normalizing the video similarities;
determining a plurality of combined similarities by using the normalized text and video similarities;
clustering the audio-video scenes into a plurality of clusters by using the combined similarities; and
determining which clusters, if any, have audio-video scenes from multiple audio-video segments. - View Dependent Claims (24, 25, 26, 27, 28, 29)
-
-
30. An article of manufacture comprising:
a computer-readable medium having computer-readable code means embodied thereon, said computer-readable program code means comprising;
a step to determine two or more combined similarity measures by using two or more text similarities of two or more audio-video scenes and two or more video similarities of the two or more audio-video scenes; and
a step to determine whether there are similar audio-video scenes in the two or more audio-video scenes by using the two or more combined similarity measures.
-
31. An apparatus comprising:
-
a memory that stores computer-readable code; and
a processor operatively coupled to the memory, said processor configured to implement the computer-readable code, said computer-readable code configured to;
determine two or more combined similarity measures by using two or more text similarities of two or more audio-video scenes and two or more video similarities of the two or more audio-video scenes; and
determine whether there are similar audio-video scenes in the two or more audio-video scenes by using the two or more combined similarity measures.
-
Specification