Automated video interpretation system
First Claim
1. A method of interpreting a digital video signal, wherein said digital video signal has contextual data, said method comprising the steps of:
- segmenting said digital video signal into one or more video segments, each segment having one or more video frames, and each said video frame having a corresponding portion of said contextual data;
determining a plurality of regions for a video frame of a respective video segment;
processing said regions for each video segment to provide a region adjacency graph at one or more temporal instances in the respective video segment, said region adjacency graph representing adjacencies between regions for a corresponding frame of said respective video segment; and
analyzing said region adjacency graphs to produce a corresponding labeled region adjacency graph comprising at least one semantic label, said analysis being dependent upon a corresponding portion of said contextual data, wherein said labeled region adjacency graph represents an interpretation of said digital video signal.
1 Assignment
0 Petitions
Accused Products
Abstract
Digital image signal interpretation is the process of understanding the content of an image through the identification of significant objects or regions in the image and analysing their spatial arrangement. Traditionally the task of image interpretation required human analysis. This is expensive and time consuming, consequently considerable research has been directed towards constructing automated image interpretation systems. A method of interpreting a digital video signal is disclosed whereby the digital video signal has contextual data. The method comprising the steps of firstly, segmenting the digital video signal into one or more video segments, each segment having a corresponding portion of the contextual data. Secondly, analysing each video segment to provide a graph at one or more temporal instances in the respective video segment dependent upon the corresponding portion of the contextual data.
135 Citations
44 Claims
-
1. A method of interpreting a digital video signal, wherein said digital video signal has contextual data, said method comprising the steps of:
-
segmenting said digital video signal into one or more video segments, each segment having one or more video frames, and each said video frame having a corresponding portion of said contextual data;
determining a plurality of regions for a video frame of a respective video segment;
processing said regions for each video segment to provide a region adjacency graph at one or more temporal instances in the respective video segment, said region adjacency graph representing adjacencies between regions for a corresponding frame of said respective video segment; and
analyzing said region adjacency graphs to produce a corresponding labeled region adjacency graph comprising at least one semantic label, said analysis being dependent upon a corresponding portion of said contextual data, wherein said labeled region adjacency graph represents an interpretation of said digital video signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
audio data;
electromagnetic data;
focal point data;
exposure data aperture data operator gaze location data environmental data;
time lapse or sequential image data;
motion data; and
textual tokens.
-
-
6. The method according to claim 5, wherein said textual tokens include textual annotation, phrases and keywords associated with at least a portion of a respective video segment.
-
7. The method according to claim 1 said analyzing step comprising the further sub-step of:
biasing a statistical or probabilistic interpretation model dependent upon said corresponding portion of contextual data.
-
8. The method according to 7, wherein said biasing step includes the step of selecting a predetermined application domain from a plurality of application domains dependent upon said corresponding portion of contextual data, said application domain comprising a set of semantic labels appropriate for use with said application domain.
-
9. The method according to claim 7, wherein said biasing step includes the step of adjusting at least one prior probability of a respective one of a plurality of semantic labels in at least one application domain.
-
10. The method according to claim 7, wherein each said analyzing step is dependent upon said biased statistical or probabilistic interpretation model.
-
11. The method according to claim 10, wherein each said analyzing step is dependent upon said at least one application domain.
-
12. The method according to claim 7, wherein each said analyzing step includes the step of analyzing regions of said region adjacency graph using said biased statistical or probabilistic interpretation model to provide said labeled region adjacency graph.
-
13. The method according to claim 12, wherein said region analysis step is dependent upon adjusted prior probabilities of a plurality of labels to provide said labelled region adjacency graph.
-
14. The method to claim 7, wherein the statistical or probabilistic interpretation model is a Markov Random Field.
-
15. The method according to claim 1, further comprising the step of encoding said region adjacency graph to form a bitstream representation of said region adjacency graph.
-
16. The method according to claim 15, further comprising the step of:
representing motion between two successive video frames using a predetermined motion model comprising encoded parameters.
-
17. The method according to claim 16, further comprising the step of combining each encoded video segment and respective encoded region adjacency graph to provide a motion encoded digital video signal.
-
18. The method according to claim 1, further comprising the step of providing metadata associated with each video segment, wherein said metadata includes said region adjacency graphs of each video segment.
-
19. The method according to claim 1, wherein said analysis step is carried out on a video frame and a temporal region of interest of the respective contextual data.
-
20. The method according to claim 1, wherein said digital video signal is generated using a digital video recoding device.
-
21. The method according to claim 20, wherein one or more portions of said contextual data is generated by one or more sensors of said digital video recording device.
-
22. An apparatus for interpreting a digital video signal, wherein said digital video signal has contextual data, said apparatus comprising:
-
means for segmenting said digital video signal into one or more video segments, each segment having one or more video frames, and each said video frame having a corresponding portion of said contextual data;
means for determining a plurality of regions for a video frame of a respective video segment;
means for processing said regions for each video segment to provide a region adjacency graph at one or more temporal instances in the respective video, said region adjacency graph representing adjacencies between regions for a corresponding frame of said respective video segment; and
means for analyzing said region adjacency graphs to produce a corresponding labeled region adjacency graph comprising at least one semantic label said analysis being dependent won a corresponding portion of said contextual data, wherein said labeled region adjacency graph represents an interpretation of said digital video signal. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. A computer program stored in a computer readable medium, said computer program being configured for interpreting a digital video signal, wherein said digital video signal has contextual data, said computer program comprising:
-
code for segmenting said digital video signal into one or more video segments, each segment having one or more video frames, and each said video frame having a corresponding portion of said contextual data;
code for determining a plurality of regions for at least one video frame of a respective video segment;
code for processing said regions for each video segment to provide a region adjacency graph at one or more temporal instances in the respective video segment, said region adjacency graph representing adjacencies between regions for a corresponding frame of said respective video segment; and
code for analyzing said region adjacency graphs to produce a corresponding labeled region adjacency graph comprising at least one semantic label, said analysis being dependent upon a corresponding portion of said contextual data, wherein said labeled region adjacency graph represents an interpretation of said digital video signal. - View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40, 41)
-
-
42. A method of interpreting a digital video signal, wherein said digital video signal has contextual data, said method comprising the steps of;
-
segmenting said digital video signal into one or more video segments, each segment having one or more video frames and a corresponding portion of said contextual data;
determining a plurality of regions for at least one video frame of a respective video segment;
processing said regions for each video segment to provide a region adjacency graph at one or more temporal instances in the respective video segment dependent upon said corresponding portion of said contextual data, said region adjacency graph representing adjacencies between regions for a corresponding frame of said respective video segment; and
analyzing said region adjacency graphs to interpret said digital video signal.
-
-
43. An apparatus for interpreting a digital video signal, wherein said digital video signal has contextual data, said apparatus comprising:
-
means for segmenting said digital video signal into one or more video segments, each segment having one or more video frames and a corresponding portion of said contextual data;
means for determining a plurality of regions for at least one video frame of a respective video segment;
means for processing said regions for each video segment to provide a region adjacency graph at one or more temporal instances in the respective video segment dependent upon said corresponding portion of said contextual data, said region adjacency graph representing adjacencies between regions for a corresponding frame of said respective video segment; and
means for analyzing said region adjacency graphs to interpret said digital video signal.
-
-
44. A computer program stored in a computer readable medium, said computer program being configured for interpreting a digital video signal, wherein said digital video signal has contextual data, said computer program comprising:
-
code for segmenting said digital video signal into one or more video segments, each segment having one or more video frames and a corresponding portion of said contextual data;
code for determining a plurality of regions for at least one video frame of a respective video segment;
code for processing said regions for each video segment to provide a region adjacency graph at one or more temporal instances in the respective video segment dependent upon said corresponding portion of said contextual data, said region adjacency graph representing adjacencies between regions for a corresponding frame of said respective video segment; and
code for analyzing said region adjacency graphs to interpret said digital video signal.
-
Specification