System for interactive organization and browsing of video
First Claim
1. A system for interactively organizing and browsing raw video to facilitate browsing of video archives, comprising:
- automatic video organizing means for automatically organizing a raw video into a hierarchical structure that depicts the video'"'"'s organized contents, said automatic video organizing means including shot grouping means for automatically grouping shots, which represent a continuous action in time and space, into groups of visually similar shots, each group of shots capturing a given structure in the raw video; and
user interface means for allowing a user to view and manually edit the hierarchical structure, said user interface means including tree view interface means for allowing the user to view the groups of visually similar shots, create new groups of visually similar shots, and modify the groups of visually similar shots, said tree view interface means including update means for determining whether any image portions of a shot merged with another shot by the user are similar, wherein when said update means finds similar image portions, said update means generates partial match templates which blocks dissimilar image portions of remaining shots and automatically merges the remaining shots that have similar image portions, wherein when said update means does not find any similar image portions, said update means determines whether audio streams of the two merged shots are similar, wherein when said update means finds similar audio streams, said update means searches other shots in the groups for similar audio streams and automatically merges other shots with similar audio streams together, wherein when said update means does not find any similar audio streams in the two merged shots, the update means repeats the user'"'"'s action on all siblings of the merged shot.
3 Assignments
0 Petitions
Accused Products
Abstract
A system for interactively organizing and browsing video automatically processes video, creating a video table of contents (VTOC), while providing easy-to-use interfaces for verification, correction, and augmentation of the automatically extracted video structure. Shot detection, shot grouping and VTOC generation are automatically determined without making restrictive assumptions about the structure or content of the video. A nonstationary time series model of difference metrics is used for shot boundary detention. Color and edge similarities are used for shot grouping. Observation about the structure of a wide class of videos are used for the generating the table of contents. The use of automatic processing in conjuction with input from the user provides a meaningful video organization.
187 Citations
21 Claims
-
1. A system for interactively organizing and browsing raw video to facilitate browsing of video archives, comprising:
-
automatic video organizing means for automatically organizing a raw video into a hierarchical structure that depicts the video'"'"'s organized contents, said automatic video organizing means including shot grouping means for automatically grouping shots, which represent a continuous action in time and space, into groups of visually similar shots, each group of shots capturing a given structure in the raw video; and
user interface means for allowing a user to view and manually edit the hierarchical structure, said user interface means including tree view interface means for allowing the user to view the groups of visually similar shots, create new groups of visually similar shots, and modify the groups of visually similar shots, said tree view interface means including update means for determining whether any image portions of a shot merged with another shot by the user are similar, wherein when said update means finds similar image portions, said update means generates partial match templates which blocks dissimilar image portions of remaining shots and automatically merges the remaining shots that have similar image portions, wherein when said update means does not find any similar image portions, said update means determines whether audio streams of the two merged shots are similar, wherein when said update means finds similar audio streams, said update means searches other shots in the groups for similar audio streams and automatically merges other shots with similar audio streams together, wherein when said update means does not find any similar audio streams in the two merged shots, the update means repeats the user'"'"'s action on all siblings of the merged shot. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method used in automatically organizing video for automatically grouping shots into groups of visually similar shots, each group of shots capturing structure in a raw video, the shots generated by detecting abrupt scene changes in raw frames of the video which represent a continuous action in time and space, the method comprising the steps of:
-
providing a predetermined list of color names;
describing image colors in each of the shots using the predetermined list of color names;
clustering the shots into visually similar groups based on the image colors described in each of the shots; and
using image edge information from each of the shots to identify and remove incorrectly clustered shots from the groups. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
providing a single one of the groups containing a first shot;
getting a color feature vector of a new shot, the color feature vector based on the predetermined list of color names;
finding a nearest match between the vector and group means of existing groups; and
determining if the nearest match is less than a predetermined color threshold.
-
-
12. The method according to claim 11, wherein the step of clustering further includes the steps of:
-
adding the new shot to a group producing the nearest match if the nearest match is less than the color threshold; and
updating means of the group producing the nearest match.
-
-
13. The method according to claim 11, wherein the step of clustering further includes the steps of
creating a new group with the color feature vector as its mean if the nearest match is not less than the predetermined color threshold; - and
adding the shot to the new group.
- and
-
14. The method according to claim 7, wherein the step of using image edge information includes the steps of:
-
computing edge feature vectors E for each shot of groups having more than one shot; and
computing mean edge feature vector M for each group having more than one shot.
-
-
15. The method according to claim 14, wherein the step of using image edge information further includes the steps of:
-
finding a shot of a group which gives a maximum absolute value for (M−
E); and
determining if the maximum absolute value for (M−
E) is greater than a predetermined edge threshold.
-
-
16. The method according to claim 15, wherein the step of using image edge information further includes the steps of:
-
deleting the shot of the group if the maximum absolute value for (M−
E) is greater than the predetermined threshold and place the shot in a new group; and
recomputing the mean edge feature vector of the group with the removed shot.
-
-
17. The method according to claim 16, further comprising the step of writing a merge-list which specifies a group number for each of the shots.
-
18. A method for interactively organizing and browsing video, the method comprising the steps of:
-
automatically organizing a raw video into a hierarchical structure that depicts the video'"'"'s organized contents, said step of automatically organizing including the steps of;
automatically detecting abrupt scene changes in raw frames of the video, automatically organizing the raw frames into a list of shots, each of the shots representing a continuous action in time and space, and automatically grouping shots, which represent a continuous action in time and space, into groups of visually similar shots, each group of shots capturing a given structure in the raw video; and
viewing and manually editing the hierarchical structure to make the hierarchical structure substantially useful and meaningful to the user, said step of viewing including the steps of;
viewing the shots, manually editing the shots, viewing the groups of visually similar shots, and manually editing the groups of shots; and
wherein the step of manually editing the groups of shots includes the step of determining whether any image portions of a first shot merged with a second shot by the user are similar, wherein when similar image portions are found, partial match templates are generated which block dissimilar image portions of remaining shots and automatically merges the remaining shots having image portions which are similar to the similar image portions of the merged first and second shots, wherein when similar image portions in the merged first and second shots are not found, determining whether audio streams of the merged first and second shots are similar, wherein when similar audio streams are found in the merged first and second shots, searching audio streams of the remaining shots to determine if they are are similar to the audio streams of the merged first and second shots and automatically merging the remaining shots, having audio streams that are similar to the audio streams of the merged first and second shots, with the merged first and second shots, wherein when similar audio streams are not found in the merged first and second shots, the action taken by the user on the first shot is automatically repeated on all siblings of the first shot. - View Dependent Claims (19, 20, 21)
viewing the hierarchical structure; and
modifying the hierarchical structure to make the hierarchical structure substantially useful and meaningful to the user.
-
-
21. The method according to claim 18, wherein the step of viewing includes the step of playing the video along with the video'"'"'s audio from any point in the video.
Specification