Systems and methods for generating a comprehensive user attention model
First Claim
Patent Images
1. A computer-implemented method for generating a comprehensive user attention model, the method comprisingextracting feature components from a video data sequence:
- generating attention data based on application of multiple attention models to the feature components;
integrating the attention data to create the comprehensive user attention model; and
wherein the comprehensive user attention model is represented as;
A=wv·
Mv+wa·
Ma+wl·
Ml,wv, wa, wl representing weights for linear combination, and Mv, Ma, and Ml indicating normalized visual, audio, and linguistic attention models.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods to generate an attention model for computational analysis of video data are described. In one aspect, feature components from a video data sequence are extracted. Attention data is generated by applying multiple attention models to the extracted feature components. The generated attention data is integrated into a comprehensive user attention model for the computational analysis of the video data sequence.
82 Citations
32 Claims
-
1. A computer-implemented method for generating a comprehensive user attention model, the method comprising
extracting feature components from a video data sequence: -
generating attention data based on application of multiple attention models to the feature components; integrating the attention data to create the comprehensive user attention model; and wherein the comprehensive user attention model is represented as;
A=wv·
Mv +wa·
Ma +wl·
Ml ,wv, wa, wl representing weights for linear combination, and Mv ,Ma , andMl indicating normalized visual, audio, and linguistic attention models.- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
wherein wi, wj, wk are weights in visual, audio, and linguistic attention models respectively, wherein Mcm comprises a normalized camera attention used as a visual attention model magnifier, and wherein Scm comprises a magnifier switch that is based on multiple criteria.
-
-
10. The method of claim 9, wherein the multiple criteria comprise:
-
if Scm>
=1, the magnifier is turned on;if Scm=0, the magnifier is turned off; and wherein a large Scm value indicates a more powerful magnifier than a low Scm value.
-
-
11. A computer-implemented method for generating a comprehensive user attention model, the method comprising
extracting feature components from a video data sequence; -
generating attention data based on application of multiple attention models to the feature components; integrating the attention data to create the comprehensive user attention model; and wherein the multiple attention models comprise a camera attention model and one or more other visual attention models, and wherein generating the attention data further comprises multiplying a sum of the one or more other visual attention models by quantized factors to determine emphasis of the camera attention model with respect to the other visual attention model(s), the quantized factors being camera attention factors. - View Dependent Claims (12)
-
-
13. A tangible computer-readable medium storing computer-executable instructions executable by a processor to generate an attention model, the computer-executable instructions comprising instructions for:
-
extracting feature components from a video data sequence; generating attention data based on application of at least visual and audio attention models to the feature components; linearly combining the attention data to generate a generic user attention model that integrates results of the multiple visual, audio, and linguistic attention models; and wherein the generic user attention model is represented as;
A=wv·
Mv +wa·
Ma +wl·
Ml ,wv, wa, wl representing weights for linear combination, and wherein Mv ,Ma , andMl represent normalized visual, audio, and linguistic attention models.- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
wi, wj, wk being weighted values of visual, audio, and linguistic attention models respectively, Mcm representing a normalized camera attention used as a visual attention model magnifier, and Scm identifying a magnifier switch tat is based on multiple criteria.
-
-
19. The computer-readable medium of claim 18, wherein the multiple criteria comprise:
-
if Scm>
=1, the magnifier is turned on;if Scm=0, the magnifier is turned off; and wherein a large Scm value indicates a more powerful magnifier than a low Scm value.
-
-
20. The computer-readable medium of claim 13, wherein the visual attention models comprise motion, static, face, and/or camera attention models.
-
21. The computer-readable medium of claim 20, wherein the camera attention model is based at least in part on the following criteria:
-
during camera zooming operations, frame importance increases temporally and is a function of zooming speed such that a first frame generated during a fast zooming operation is of higher relative importance that a second frame generated during a slower zooming operation; and during camera panning operations, frame importance is an inverse of panning speed and a function of panning direction.
-
-
22. The computer-readable medium of claim 21, wherein frames generated during a horizontal camera panning operation are calculated to be of lesser relative importance as compared to frames generated during a vertical panning operation.
-
23. The computer-readable medium of claim 21, wherein calculated importance of a frame generated during panning or zooming operations is reduced from a higher importance to a lower importance as a function of ending the panning or zooming operation and passage of a certain period of time.
-
24. A computing device for creating a comprehensive user attention model, the computing device comprising:
-
a processer; a memory coupled to the processor the memory comprising computer-program instructions executable by the processor for; generating visual, audio, and linguistic attention data based on application of multiple attention models to a plurality of video data sequence feature components, the feature components comprising image sequence, audio, and text-related features; integrating the visual, audio, and linguistic attention data to create the comprehensive user attention model; wherein the comprehensive user attention model is a computational representation of elements of the video data sequence tat attract user attention; and wherein the computational representation is defined as;
A=wv·
Mv +wa·
Ma +wl·
Ml ,wv, Wa, wl representing weights for linear combination, and wherein Mv ,Ma , andMl represent normalized visual, audio, and linguistic attention models.- View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32)
wi, wj, wk being weighted values of visual, audio, and linguistic attention models respectively, Mcm representing a normalized camera attention used as a visual attention model magnifier, and Scm identifying a magnifier switch that is based on multiple criteria.
-
-
29. The computing device of claim 28, wherein the multiple criteria comprise:
-
if Scm>
=1, the magnifier is open;if Scm=0, the magnifier is closed; and wherein a large Scm value indicates a more powerful magnifier than a low Scm value.
-
-
30. The computing device of claim 24, wherein the multiple attention models comprise a camera attention model, and wherein the computer-program instructions for generating the visual attention data generate camera attention data based at least in part on the following criteria:
-
during camera zooming operations, frame importance increases temporally and is a function of zooming speed such that a first frame generated during a fast zooming operation is of higher relative importance that a second frame generated during a slower zooming operation; and during camera panning operations, frame importance is an inverse of panning speed and a function of panning direction.
-
-
31. The computing device of claim 30, wherein frames generated during a horizontal camera panning operation are calculated to be of lesser relative importance as compared to frames generated during a vertical panning operation.
-
32. The computing device of claim 30, wherein calculated importance of a frame generated during panning or zooming operations is reduced from a higher importance to a lower importance as a function of ending the panning or zooming operation and passage of a certain period of time.
Specification