Video annotation method by integrating visual features and frequent patterns

US 20080059872A1
Filed: 03/05/2007
Published: 03/06/2008
Est. Priority Date: 09/05/2006
Status: Active Grant

First Claim

Patent Images

1. A video annotation method by integrating visual features and frequent patterns, comprising:

providing a plurality of fundamental words;

providing an annotated video clip, wherein said annotated video clip is composed of a plurality of first shots, and each of said first shots is composed of a plurality of first frames, and each of said first shots is corresponding to at least one first annotation word of said fundamental words;

performing a data preprocessing step, said data preprocessing step comprising;

selecting a plurality of first critical frames respectively with respect to said first shots from said first frames of each of said first shots;

dividing each of said first critical frames into a plurality of first image blocks;

respectively extracting low-level features of said first image blocks of each of said first sots, thereby obtaining a plurality of first block feature vectors of each of said first critical frames;

respectively extracting low-level features of each of said first critical frames, thereby obtaining a plurality of first feature vectors of said first shots;

performing a grouping step for dividing said first feature vectors into a plurality of shot groups, wherein said shot groups have a plurality of identification codes respectively;

corresponding said first feature vectors to said identification codes respectively; and

combining said identification codes of said shot groups as at least one first scene;

building a statistical model by using said first block feature vectors and said at least one first annotation word with respect to each of said first shots in accordance with a Gaussian Mixtures Model and conditional probabilities, wherein said statistical model has a statistical probability list used for indicating the respective appearing probabilities of said fundamental words corresponding to said first block feature vectors of each of said first shots;

building a sequential model, comprising;

finding frequent patterns of said shot groups in said first scene in accordance with a continuous relevance algorithm, thereby obtaining a plurality of first sequential rules, wherein said first sequential rules are the sequential transaction combinations of any two identification codes arbitrarily selected in each of said at least one first scene; and

building said sequential model in accordance with each of said first sequential rules and said at least one first annotation word corresponding thereto, wherein said sequential model has a sequential probability list used for indicating the respective appearing probabilities of said fundamental words corresponding to each of said first sequential rules;

performing a predicting stage for inputting a second shot desired to be annotated into said statistical model and said sequential model, thereby obtaining a keyword statistical probability list and a keyword sequential probability list, wherein said keyword statistical probability list is used for indicating the respective appearing probabilities of said fundamental words corresponding to a plurality of second block feature vectors of said second shot, and said keyword sequential probability list is used for indicating the respective appearing probabilities of said fundamental words corresponding to a plurality of second sequential rules of said second shot, and said second shot belongs to a second scene and is composed of a plurality of second frames.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A video annotation method by integrating visual features and frequent patterns is disclosed. This method is featured in integrating a statistical model based on visual features with a sequential model and an association model constructed by data mining skills for automatically annotating unknown videos. This method takes both of visual features and semantic patterns into consideration simultaneously through the combination of three different models so as to enhance the accuracy of annotation.

Citations

17 Claims

1. A video annotation method by integrating visual features and frequent patterns, comprising:
- providing a plurality of fundamental words;
  
  providing an annotated video clip, wherein said annotated video clip is composed of a plurality of first shots, and each of said first shots is composed of a plurality of first frames, and each of said first shots is corresponding to at least one first annotation word of said fundamental words;
  
  performing a data preprocessing step, said data preprocessing step comprising;
  
  selecting a plurality of first critical frames respectively with respect to said first shots from said first frames of each of said first shots;
  
  dividing each of said first critical frames into a plurality of first image blocks;
  
  respectively extracting low-level features of said first image blocks of each of said first sots, thereby obtaining a plurality of first block feature vectors of each of said first critical frames;
  
  respectively extracting low-level features of each of said first critical frames, thereby obtaining a plurality of first feature vectors of said first shots;
  
  performing a grouping step for dividing said first feature vectors into a plurality of shot groups, wherein said shot groups have a plurality of identification codes respectively;
  
  corresponding said first feature vectors to said identification codes respectively; and
  
  combining said identification codes of said shot groups as at least one first scene;
  
  building a statistical model by using said first block feature vectors and said at least one first annotation word with respect to each of said first shots in accordance with a Gaussian Mixtures Model and conditional probabilities, wherein said statistical model has a statistical probability list used for indicating the respective appearing probabilities of said fundamental words corresponding to said first block feature vectors of each of said first shots;
  
  building a sequential model, comprising;
  
  finding frequent patterns of said shot groups in said first scene in accordance with a continuous relevance algorithm, thereby obtaining a plurality of first sequential rules, wherein said first sequential rules are the sequential transaction combinations of any two identification codes arbitrarily selected in each of said at least one first scene; and
  
  building said sequential model in accordance with each of said first sequential rules and said at least one first annotation word corresponding thereto, wherein said sequential model has a sequential probability list used for indicating the respective appearing probabilities of said fundamental words corresponding to each of said first sequential rules;
  
  performing a predicting stage for inputting a second shot desired to be annotated into said statistical model and said sequential model, thereby obtaining a keyword statistical probability list and a keyword sequential probability list, wherein said keyword statistical probability list is used for indicating the respective appearing probabilities of said fundamental words corresponding to a plurality of second block feature vectors of said second shot, and said keyword sequential probability list is used for indicating the respective appearing probabilities of said fundamental words corresponding to a plurality of second sequential rules of said second shot, and said second shot belongs to a second scene and is composed of a plurality of second frames.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The video annotation method of claim 1, wherein the low-level features of said first image blocks of each of said first shots and the low-level features of each of said first critical frames are selected from the group consisting of a shape descriptor, a scalable color descriptor, a homogeneous texture descriptor and any combinations thereof.
  - 3. The video annotation method of claim 1, further comprising:
    - building an association model, comprising;
      
      removing said identification codes repeated in each of said at least one first scene;
      
      sorting said identification codes in each of said at least one first scene;
      
      finding the entire frequent patterns of said shot groups in said at least one first scene in accordance with an association rules algorithm, thereby obtaining a plurality of first association rules, wherein the final item in each of said first association rules only has one single identification code; and
      
      building said association model in accordance with each of said first association rules and said at least one first annotation word corresponding thereto, wherein said association model has an associative probability list used for indicating the respective appearing probabilities of said fundamental words corresponding to each of said first associative rules; and
      
      performing said predicating stage for inputting said second shot desired to be annotated into said association model, thereby obtaining a keyword associative probability list used for indicating the respective appearing probabilities of said fundamental words corresponding to associative rules of said second shot.
  - 4. The video annotation method of claim 3, wherein said performing said predicting stage further comprises:
    - selecting a second critical frame from said second frames of said second shot;
      
      respectively extracting low-level features of said second critical frame, thereby obtaining a plurality of second feature vectors;
      
      performing said grouping step on said second feature vectors in accordance with a statistic distance algorithm, and corresponding said second feature vectors to said identification codes respectively;
      
      removing said identification codes repeated in said second scene;
      
      sorting said identification codes in said second scene;
      
      finding the entire frequent patterns of shot groups in said second scene in accordance with said association rules algorithm, thereby obtaining a plurality of second association rules, wherein the final item in each of said second association rules only has one single identification code; and
      
      inputting said second association rules into said association model, thereby obtaining said keyword associative probability list used for indicating the respective appearing probabilities of said fundamental words corresponding to said second associative rules regarding said second shot.
  - 5. The video annotation method of claim 4, wherein said statistic distance algorithm is a Euclidean Distance method.
  - 6. The video annotation method of claim 1, wherein said performing said predicting stage further comprises:
    - selecting a second critical frame from said second frames of said second shot;
      
      dividing said second critical frame into a plurality of second image blocks;
      
      respectively extracting low-level features of said second image blocks, thereby obtaining said second block feature vectors of said second critical frame of said second shot and; and
      
      inputting said second block feature vectors into said statistical model, thereby obtaining said keyword statistical probability list used for indicating the respective appearing probabilities of said fundamental words corresponding to said second block feature vectors.
  - 7. The video annotation method of claim 6, wherein said second critical frame is divided into N×
    - M units of second image blocks, wherein N and M are the integers greater than 0.
  - 8. The video annotation method of claim 6, wherein the low-level features of said second image blocks and the low-level features of said second critical frame are selected from the group consisting of a shape descriptor, a scalable color descriptor, a homogeneous texture descriptor and any combinations thereof.
  - 9. The video annotation method of claim 6, wherein said performing said predicting stage further comprises:
    - inputting at least one third shot antecedent to said second shot in said second scene, and respectively selecting at least one third critical frame of said at least one third shot;
      
      respectively extracting low-level features of said second critical frame and low-level features of said at least one third critical frame, thereby obtaining a plurality of second feature vectors;
      
      performing said grouping step on said second feature vectors in accordance with a statistic distance algorithm, and corresponding said second feature vectors to said identification codes respectively;
      
      finding frequent patterns of said shot groups in said second scene in accordance with the continuous relevance algorithm, thereby obtaining a plurality of second sequential rules, wherein said second sequential rules are the sequential transaction combinations of any two identification codes arbitrarily selected in said second scene; and
      
      inputting said second sequential rules into said sequential model, thereby obtaining said keyword sequential probability list used for indicating the respective appearing probabilities of said fundamental words corresponding to said second sequential rules with respect to said second feature vectors.
  - 10. The video annotation method of claim 9, wherein said statistic distance algorithm is an Euclidean Distance method.
  - 11. The video annotation method of claim 9, wherein the low-level features of said second critical frame and the low-level features of said at least one third critical frame are selected from the group consisting of a shape descriptor, a scalable color descriptor, a homogeneous texture descriptor and any combinations thereof.
  - 12. The video annotation method of claim 1, wherein said performing said predicting stage further comprises:
    - adding up the respective appearing probabilities of said fundamental words in said keyword statistical probability list and said keyword sequential probability list, thereby obtaining a keyword appearing probability list; and
      
      selecting at least one second annotation word from said keyword appearing probability list in accordance with a predetermined lower limit, wherein said at least one second annotation word is used as an annotation to said second shot.
  - 13. The video annotation method of claim 1, wherein said performing said predicting stage further comprises:
    - adding up the respective appearing probabilities of said fundamental words in said keyword statistical probability list and said keyword associative probability list, thereby obtaining a keyword appearing probability list; and
      
      selecting at least one second annotation word from said keyword appearing probability list in accordance with a predetermined lower limit, wherein said at least one second annotation word is used as an annotation to said second shot.
  - 14. The video annotation method of claim 1, wherein said performing said predicting stage further comprises:
    - adding up the respective appearing probabilities of said fundamental words in said keyword statistical probability list, said keyword sequential probability list and said keyword associative probability list, thereby obtaining a keyword appearing probability list; and
      
      selecting at least one second annotation word from said keyword appearing probability list in accordance with a predetermined lower limit, wherein said at least one second annotation word is used as an annotation to said second shot.
  - 15. The video annotation method of claim 6, wherein each of said first critical frames is divided into N×
    - M units of second image blocks, wherein N and M are the integers greater than 0.
  - 16. The video annotation method of claim 1, wherein said first block feature vectors are corresponding to said at least one first annotation word, and each of said first feature vectors is corresponding to said at least one first annotation word.
  - 17. The video annotation method of claim 1, wherein said fundamental words are selected from the standard category tree provided by NIST (National Institute of Standards and Technology).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
National Cheng KUNG University (Government of The Republic of China)
Original Assignee
National Cheng KUNG University (Government of The Republic of China)
Inventors
Huang, Jhih-Hong, Tseng, Shin-Mu, Su, Ja-Hwung

Granted Patent

US 7,894,665 B2
Time in Patent Office

Days
Field of Search
US Class Current

715/231
CPC Class Codes

G06F 16/7847 using low-level visual feat...

Video annotation method by integrating visual features and frequent patterns

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Video annotation method by integrating visual features and frequent patterns

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links