Visual attention model
First Claim
1. An improved visual attention model of the type that segments a frame of a video sequence into regions for processing by a plurality of spatial features to produce a corresponding plurality of spatial importance maps, that compares the frame with a previous frame for processing to produce a temporal importance map, and that combines the spatial and temporal importance maps to produce a total importance map for the frame, wherein the improvement comprises the steps of:
- adaptively segmenting the frame into the regions using color along with luminance;
processing the regions with a plurality of spatial features to produce the plurality of spatial importance maps;
processing the frame with the previous frame to produce the temporal importance map that is compensated for camera motion; and
combining the spatial and temporal importance maps based upon a weighting function derived from eye movement studies to produce the total importance map for the frame.
6 Assignments
0 Petitions
Accused Products
Abstract
An improved visual attention model uses a robust adaptive segmentation algorithm to divide a current frame of a video sequence into a plurality of regions based upon both color and luminance, with each region being processed in parallel by a plurality of spatial feature algorithms including color and skin to produce respective spatial importance maps. The current frame and a previous frame are also processed to produce motion vectors for each block of the current frame, the motion vectors being compensated for camera motion, and the compensated motion vectors being converted to produce a temporal importance map. The spatial and temporal importance maps are combined using weighting based upon eye movement studies.
-
Citations
20 Claims
-
1. An improved visual attention model of the type that segments a frame of a video sequence into regions for processing by a plurality of spatial features to produce a corresponding plurality of spatial importance maps, that compares the frame with a previous frame for processing to produce a temporal importance map, and that combines the spatial and temporal importance maps to produce a total importance map for the frame, wherein the improvement comprises the steps of:
-
adaptively segmenting the frame into the regions using color along with luminance;
processing the regions with a plurality of spatial features to produce the plurality of spatial importance maps;
processing the frame with the previous frame to produce the temporal importance map that is compensated for camera motion; and
combining the spatial and temporal importance maps based upon a weighting function derived from eye movement studies to produce the total importance map for the frame. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
splitting the frame hierarchically into the regions based upon luminance variance, color variance and size of interim regions; and
merging interim regions to form the regions when the mean luminance and color variances within the interim regions are less than respective adaptive thresholds and the change in luminance and change in color within the interim regions are less than respective thresholds or the luminance and color change within the interim regions are less than respective thresholds.
-
-
3. The visual attention model as recited in claim 2 wherein the adaptive segmenting step further comprises the step of clipping the borders of the frame prior to the splitting step.
-
4. The visual attention model as recited in claim 1 wherein the spatial features comprise at least two selected from the set consisting of size, background, location, contrast, shape, color and skin.
-
5. The visual attention model as recited in claim 4 wherein the processing step for the contrast spatial feature is based on absolute values for the mean graylevels of a region being processed and its neighboring regions that share a 4-connected border, is limited to a constant multiplied by the number of 4-connected neighboring pixels, and takes into account Weber and deVries-Rose effects.
-
6. The visual attention model as recited in claim 4 wherein the processing step for the color spatial feature calculates the color contrast of a region being processed with respect to its background.
-
7. The visual attention model as recited in claim 4 wherein the processing step for the skin spatial feature uses a narrow range of color values and respective thresholds for min and max values for each element of the color values.
-
8. The visual attention model as recited in claim 4 wherein the processing step for the size spatial feature comprises the step of implementing a four threshold algorithm so that regions too small and too large are minimized.
-
9. The visual attention model as recited in claim 4 wherein the processing step for the background spatial feature comprises the step of using a minimum of the number of pixels in a region that shares a four-connected border with another region or of the number of pixels in a region that also borders a truncated edge of the frame.
-
10. The visual attention model as recited in claim 4 wherein the processing step for the location spatial feature comprises the step of considering various zones about a central area of the frame with lesser weights per zone decreasing from the central area.
-
11. The visual attention model as recited in claim 4 wherein the processing step for the space spatial feature comprises the step of reducing shape importance in regions that have many neighboring regions.
-
12. The visual attention model as recited in claim 1 wherein the combining step comprises the steps of:
-
weighting each spatial importance map according to weights determined empirically from eye movement studies to produce a resultant spatial importance map;
smoothing the resultant spatial importance map from frame to frame using a temporal smoothing algorithm to reduce noise and improve temporal consistency to produce a spatial importance map; and
combining the spatial importance map with the temporal importance map to produce the total importance map.
-
-
13. The visual attention model as recited in claim 12 wherein the step of combining the spatial importance map with the temporal importance map comprises the step of linear weighting the spatial importance and temporal importance maps, the linear weighting step using a constant determined from the eye movement studies.
-
14. The visual attention model as recited in claim 1 the temporal importance map processing step comprises the steps of:
-
calculating motion vectors for each block of the current frame using a hierarchical block matching algorithm;
estimating from the motion vectors parameters of camera motion;
compensating the motion vectors based upon the parameters of camera motion; and
converting the compensated motion vectors into the temporal importance map.
-
-
15. The visual attention model as recited in claim 14 wherein the temporal importance map processing step further comprises the step of determining a flatness for each block so that motion vectors in texturally flat errors are set to zero in the compensated motion vectors prior to the converting step.
-
16. The visual attention model as recited in claim 14 further comprising the step of calculating an adaptive threshold for assigning importance to a particular motion of a region over a temporal window.
-
17. The visual attention model as recited in claim 16 wherein the adaptively calculating step includes the steps of:
-
assigning a lower threshold value as the adaptive threshold when there are few and slow moving regions in the frame; and
assigning a higher threshold value as the adaptive threshold when there are many and fast moving regions in the frame.
-
-
18. The visual attention model as recited in claim 14 further comprising the step of assigning further importance in the total importance map to a central area of the frame when the camera motion parameters indicate camera motion selected from the group consisting of zoom and pan.
-
19. The visual attention model as recited in claim 14 further comprising the step of assigning further importance in the total importance map to a central area of the frame when there is very high motion the video sequence.
-
20. The visual attention model as recited in claim 14 further comprising the step of assigning further importance in the total importance map to skin areas that are undergoing motion.
Specification