Collective media annotation using undirected random field models

US 7,986,842 B2
Filed: 11/10/2006
Issued: 07/26/2011
Est. Priority Date: 11/10/2006
Status: Expired due to Fees

First Claim

Patent Images

1. A method for detecting two or more concepts in a source digital video which includes one or more digital video frames comprising:

(a) segmenting the source digital video into a plurality of shots, wherein each shot includes one or more of the digital video frames;

(b) identifying a keyframe within each shot, wherein the keyframe is one of the one or more of the digital video frames;

(c) extracting low level features from the keyframe, wherein the low level features are representative of the two or more concepts, and are related in a graph of concepts, and wherein each concept is semi-automatically generated text associated with the one or more digital video frames, and wherein semi-automatically generated text is automatically generated text which has been manually revised;

(d) training a discriminative classifier for each concept using a set of the low level features, wherein the discriminative classifier is a support vector machine;

(e) building a collective annotation model combining each of the discriminative classifiers;

(f) defining in the collective annotation model one or more interaction potential to model interdependence between related concepts;

(g) receiving a second source digital video;

(h) applying the discriminative classifiers to the second source digital video; and

(i) determining a probability of a presence or absence of the two or more concepts in the low level features extracted from the second source digital video using the collective annotation model and the defined interaction potentials.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In an embodiment, the present invention relates to a method for semantic analysis of digital multimedia. In an embodiment of the invention, low level features are extracted representative of one or more concepts. A discriminative classifier is trained using these low level features. A collective annotation model is built based on the discriminative classifiers. In various embodiments of the invention, the frame work is totally generic and can be applied with any number of low-level features or discriminative classifiers. Further, the analysis makes no domain specific assumptions, and can be applied to activity analysis or other scenarios without modification. The framework admits the inclusion of a broad class of potential functions, hence enabling multi-modal analysis and the fusion of heterogeneous information sources.

6 Citations

View as Search Results

19 Claims

1. A method for detecting two or more concepts in a source digital video which includes one or more digital video frames comprising:
- (a) segmenting the source digital video into a plurality of shots, wherein each shot includes one or more of the digital video frames;
  
  (b) identifying a keyframe within each shot, wherein the keyframe is one of the one or more of the digital video frames;
  
  (c) extracting low level features from the keyframe, wherein the low level features are representative of the two or more concepts, and are related in a graph of concepts, and wherein each concept is semi-automatically generated text associated with the one or more digital video frames, and wherein semi-automatically generated text is automatically generated text which has been manually revised;
  
  (d) training a discriminative classifier for each concept using a set of the low level features, wherein the discriminative classifier is a support vector machine;
  
  (e) building a collective annotation model combining each of the discriminative classifiers;
  
  (f) defining in the collective annotation model one or more interaction potential to model interdependence between related concepts;
  
  (g) receiving a second source digital video;
  
  (h) applying the discriminative classifiers to the second source digital video; and
  
  (i) determining a probability of a presence or absence of the two or more concepts in the low level features extracted from the second source digital video using the collective annotation model and the defined interaction potentials.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 19)
- - 2. The method of claim 1, wherein the one or more digital video frames further includes one or more forms of data selected from the group consisting of aligned digital text, tag information, text transcripts and web page links.
  - 3. The method of claim 1, wherein the set of low level features are selected from the group consisting of color histograms, texture features, edge features, motion analysis, face detection and aligned text data.
  - 4. The method of claim 1, wherein the output of the support vector machine is transformed to a probability using a logistic mapping.
  - 5. The method of claim 1, wherein the collective annotation model is selected from the group consisting of discriminative random field (DRF) model, conditional random filed (CMU) model, discriminative output independent concept detection (SVM) model, inter concept co-occurrence (CML+I) model and concept feature co-occurrence (CMLT+I) model.
  - 6. The method of claim 1, wherein the defined interaction potential is a function of each pair of concepts Yi, Yj.
  - 7. The method of claim 6, wherein the defined interaction potential distinguishes all four binary combinations of Yi and Yj .
  - 8. The method of claim 1, wherein the defined interaction potential is a function of each pair of concepts Yi, Yj and low level feature data.
  - 9. The method of claim 1, wherein a discriminative classifier applies to a single concept, wherein a set of discriminative classifiers applies to a set of concepts.
  - 10. The method of claim 9, wherein the set of discriminative classifiers is integrated in a framework for collective multimedia annotation.
  - 11. The method of claim 1, further comprising supplying one or both of a confidence measure and a ranking associated with the identified concepts.
  - 12. The method of claim 11, wherein one or both of the confidence measure and the ranking can vary with time of the one or more digital video frames.
  - 13. The method of claim 11, wherein one or both of the confidence measure and the ranking can be used for recommending an annotation to a user in the form of a ranked list.
  - 14. The method of claim 1, further comprising using a discriminative classifier trained for each concept using a labeled training set of low-level features to improve detection of concepts.
  - 15. The method of claim 1, further comprising evaluating the probability P(Yi|X) for each low level feature (Yi), where iε
    - C, where C represents all observed concept combinations for the training sample X to improve detection of concepts.
  - 16. The method of claim 1, further comprising quantization of the low level features during the training, such that each low level feature (Yi) belongs to the set 0, 1.
  - 19. The method of claim 1 wherein the defined interaction potential is a function of each pair of related concepts Yi, Yj, wherein each pair of related concepts is connected by an edge in the graph of concepts, and wherein a total number of pairs of related concepts is less than a total number of pairs of concepts.

17. A system to identify two or more concepts in digital media comprising:
- a computer including a processing component for extracting low level features representative of the two or more concepts from a source digital video, wherein each concept is semi-automatically generated text associated with one or more digital video frames of the source digital video, and wherein semi-automatically generated text is automatically generated text which has been manually revised;
  
  a processing component for training a discriminative classifier for each concept using a set of the low level features, wherein the discriminative classifier is a support vector machine, and wherein the two or more concepts are related in a graph of concepts;
  
  a processing component capable of building a collective annotation model based on each of the discriminative classifiers;
  
  one or more defined interaction potential defined in the collective annotation model used to model interdependence between related concepts; and
  
  a processing component capable of identifying a probability of a presence or absence of the two or more concepts in the low level features extracted from a second source digital video using the collective annotation model and the defined interaction potentials.

18. A non-transitory machine readable medium having instructions stored thereon that when executed by a processor cause a system to:
- segment the source digital video into a plurality of shots, wherein each shot includes one or more of the digital video frames;
  
  identify a keyframe within each shot, wherein the keyframe is one of the one or more of the digital video frames;
  
  extract low level features representative of the two or more concepts, wherein the low level features are representative of the two or more concepts, and are related in a graph of concepts, and wherein each concept is semi-automatically generated text associated with one or more digital video frames, and wherein semi-automatically generated text is automatically generated text which has been manually revised;
  
  train a discriminative classifier for each concept using a set of the low level features, wherein the discriminative classifier is a support vector machine;
  
  build a collective annotation model based on each of the discriminative classifiers;
  
  define in the collective annotation model one or more interaction potential to identify related concepts;
  
  receive a second source digital video;
  
  apply the discriminative classifiers to the second source digital video; and
  
  determine a probability of a presence or absence or absence of the two or more concepts in the low level features extracted from the second source digital video using the collective annotation model and the defined interaction potentials.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fuji Xerox Company Limited (Xerox Holdings Corp.)
Original Assignee
Fuji Xerox Company Limited (Xerox Holdings Corp.)
Inventors
Cooper, Matthew L.
Primary Examiner(s)
Ahmed; Samir
Assistant Examiner(s)
Li; Ruiping

Application Number

US11/558,826
Publication Number

US 20080112625A1
Time in Patent Office

1,719 Days
Field of Search

382/275, 382/228, 382/299, 382/289
US Class Current

382/228
CPC Class Codes

G06F 18/295 Markov models or related mo...

Collective media annotation using undirected random field models

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

6 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Collective media annotation using undirected random field models

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

6 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links