Systems and methods for generating a comprehensive user attention model

US 7,274,741 B2
Filed: 11/01/2002
Issued: 09/25/2007
Est. Priority Date: 11/01/2002
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-implemented method for generating a comprehensive user attention model, the method comprisingextracting feature components from a video data sequence:

generating attention data based on application of multiple attention models to the feature components;

integrating the attention data to create the comprehensive user attention model; and

wherein the comprehensive user attention model is represented as;

A=w_v·

M_v+w_a·

M_a+w_l·

M_l,w_v, w_a, w_lrepresenting weights for linear combination, and M_v, M_a, and M_l indicating normalized visual, audio, and linguistic attention models.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods to generate an attention model for computational analysis of video data are described. In one aspect, feature components from a video data sequence are extracted. Attention data is generated by applying multiple attention models to the extracted feature components. The generated attention data is integrated into a comprehensive user attention model for the computational analysis of the video data sequence.

82 Citations

View as Search Results

32 Claims

1. A computer-implemented method for generating a comprehensive user attention model, the method comprisingextracting feature components from a video data sequence:
- generating attention data based on application of multiple attention models to the feature components;
  
  integrating the attention data to create the comprehensive user attention model; and
  
  wherein the comprehensive user attention model is represented as;
  
  A=w_v·
  
  M_v+w_a·
  
  M_a+w_l·
  
  M_l,w_v, w_a, w_lrepresenting weights for linear combination, and M_v, M_a, and M_l indicating normalized visual, audio, and linguistic attention models.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the feature components comprise image sequence, audio, and textual components.
  - 3. The method of claim 1, wherein the multiple attention models comprise a combination of visual, audio, and/or linguistic attention models.
  - 4. The method of claim 1, wherein the multiple attention models comprise a combination of static and dynamic attention models.
  - 5. The method of claim 1, wherein the multiple attention models comprise motion, static, face, and/or camera attention models.
  - 6. The method of claim 1, wherein the multiple attention models comprise saliency, speech, and/or music attention models.
  - 7. The method of claim 1, wherein the multiple attention models comprise closed caption, and/or automated speech recognition attention models.
  - 8. The method of claim 1, wherein integrating the attention data ifirther comprises integrating the attention data via linear combination.
  - 9. The method of claim 1, wherein M_l, M_a, and M_l are defined as follows:
10. The method of claim 9, wherein the multiple criteria comprise:
- if S_cm>
  
  =1, the magnifier is turned on;
  
  if S_cm=0, the magnifier is turned off; and
  
  wherein a large S_cmvalue indicates a more powerful magnifier than a low S_cmvalue.

11. A computer-implemented method for generating a comprehensive user attention model, the method comprisingextracting feature components from a video data sequence;
- generating attention data based on application of multiple attention models to the feature components;
  
  integrating the attention data to create the comprehensive user attention model; and
  
  wherein the multiple attention models comprise a camera attention model and one or more other visual attention models, and wherein generating the attention data further comprises multiplying a sum of the one or more other visual attention models by quantized factors to determine emphasis of the camera attention model with respect to the other visual attention model(s), the quantized factors being camera attention factors.
- View Dependent Claims (12)
- - 12. The method of claim claim 11, wherein quantized factor values range from zero (0) to two (2).

13. A tangible computer-readable medium storing computer-executable instructions executable by a processor to generate an attention model, the computer-executable instructions comprising instructions for:
- extracting feature components from a video data sequence;
  
  generating attention data based on application of at least visual and audio attention models to the feature components;
  
  linearly combining the attention data to generate a generic user attention model that integrates results of the multiple visual, audio, and linguistic attention models; and
  
  wherein the generic user attention model is represented as;
  
  A=w_v·
  
  M_v+w_a·
  
  M_a+w_l·
  
  M_l,w_v, w_a, w_lrepresenting weights for linear combination, and wherein M_v, M_a, and M_l represent normalized visual, audio, and linguistic attention models.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 14. The computer-readable medium of claim 13, wherein generating attention data is further based on application of a linguistic attention model to one or more portions of the feature components.
  - 15. The computer-readable medium of claim 13, wherein the feature components comprise image sequence, audio, and textual components.
  - 16. The computer-readable medium of claim 13, wherein the audio attention models comprise saliency, speech, and/or music attention models.
  - 17. The computer-readable medium of claim 13, wherein the linguistic attention models comprise closed caption, and/or automated speech recognition attention models.
  - 18. The computer-readable medium of claim 13, wherein M_v, M_a, and M_l are defined as follows:
19. The computer-readable medium of claim 18, wherein the multiple criteria comprise:
- if S_cm>
  
  =1, the magnifier is turned on;
  
  if S_cm=0, the magnifier is turned off; and
  
  wherein a large S_cmvalue indicates a more powerful magnifier than a low S_cmvalue.
20. The computer-readable medium of claim 13, wherein the visual attention models comprise motion, static, face, and/or camera attention models.
21. The computer-readable medium of claim 20, wherein the camera attention model is based at least in part on the following criteria:
- during camera zooming operations, frame importance increases temporally and is a function of zooming speed such that a first frame generated during a fast zooming operation is of higher relative importance that a second frame generated during a slower zooming operation; and
  
  during camera panning operations, frame importance is an inverse of panning speed and a function of panning direction.
22. The computer-readable medium of claim 21, wherein frames generated during a horizontal camera panning operation are calculated to be of lesser relative importance as compared to frames generated during a vertical panning operation.
23. The computer-readable medium of claim 21, wherein calculated importance of a frame generated during panning or zooming operations is reduced from a higher importance to a lower importance as a function of ending the panning or zooming operation and passage of a certain period of time.

24. A computing device for creating a comprehensive user attention model, the computing device comprising:
- a processer;
  
  a memory coupled to the processor the memory comprising computer-program instructions executable by the processor for;
  
  generating visual, audio, and linguistic attention data based on application of multiple attention models to a plurality of video data sequence feature components, the feature components comprising image sequence, audio, and text-related features;
  
  integrating the visual, audio, and linguistic attention data to create the comprehensive user attention model;
  
  wherein the comprehensive user attention model is a computational representation of elements of the video data sequence tat attract user attention; and
  
  wherein the computational representation is defined as;
  
  A=w_v·
  
  M_v+w_a·
  
  M_a+w_l·
  
  M_l,w_v, W_a, w_lrepresenting weights for linear combination, and wherein M_v, M_a, and M_l represent normalized visual, audio, and linguistic attention models.
- View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32)
- - 25. The computing device of claim 24, wherein the computer-program instructions for generating further comprise instructions for creating the visual attention data with motion, static, face, and/or camera attention models.
  - 26. The computing device of claim 24, wherein the computer-program instructions for generating further comprise instructions for creating the audio attention data with saliency, speech, and/or music attention models.
  - 27. The computing device of claim 24, wherein the computer-program instructions for generating further comprise instructions for creating the linguistic attention data with closed caption, and/or automated speech recognition attention models.
  - 28. The computing device of claim 24, wherein M_v, M_a, and M_l are defined as follows:
29. The computing device of claim 28, wherein the multiple criteria comprise:
- if S_cm>
  
  =1, the magnifier is open;
  
  if S_cm=0, the magnifier is closed; and
  
  wherein a large S_cmvalue indicates a more powerful magnifier than a low S_cmvalue.
30. The computing device of claim 24, wherein the multiple attention models comprise a camera attention model, and wherein the computer-program instructions for generating the visual attention data generate camera attention data based at least in part on the following criteria:
- during camera zooming operations, frame importance increases temporally and is a function of zooming speed such that a first frame generated during a fast zooming operation is of higher relative importance that a second frame generated during a slower zooming operation; and
  
  during camera panning operations, frame importance is an inverse of panning speed and a function of panning direction.
31. The computing device of claim 30, wherein frames generated during a horizontal camera panning operation are calculated to be of lesser relative importance as compared to frames generated during a vertical panning operation.
32. The computing device of claim 30, wherein calculated importance of a frame generated during panning or zooming operations is reduced from a higher importance to a lower importance as a function of ending the panning or zooming operation and passage of a certain period of time.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Zhang, Hong-Jiang, Lu, Lie, Ma, Yu-Fei
Primary Examiner(s)
PHILIPPE, GIMS S

Application Number

US10/286,053
Publication Number

US 20040088726A1
Time in Patent Office

1,789 Days
Field of Search

375/240.16, 375/240.08, 375/240.09, 345/629, 345/419, 382/190, 382/156, 382/203, 707/1
US Class Current

375/240.08
CPC Class Codes

G06T 7/00   Image analysis

G06T 7/40   Analysis of texture depth o...

G06V 10/462   Salient features, e.g. scal...

G06V 20/40   in video content extracting...

H04H 60/56   Arrangements characterised ...

H04N 21/8453   by locking or enabling a se...

H04N 21/854   Content authoring

Systems and methods for generating a comprehensive user attention model

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

82 Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for generating a comprehensive user attention model

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

82 Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links