Automated video interpretation system

US 6,516,090 B1
Filed: 04/23/1999
Issued: 02/04/2003
Est. Priority Date: 05/07/1998
Status: Expired due to Fees

First Claim

Patent Images

1. A method of interpreting a digital video signal, wherein said digital video signal has contextual data, said method comprising the steps of:

segmenting said digital video signal into one or more video segments, each segment having one or more video frames, and each said video frame having a corresponding portion of said contextual data;

determining a plurality of regions for a video frame of a respective video segment;

processing said regions for each video segment to provide a region adjacency graph at one or more temporal instances in the respective video segment, said region adjacency graph representing adjacencies between regions for a corresponding frame of said respective video segment; and

analyzing said region adjacency graphs to produce a corresponding labeled region adjacency graph comprising at least one semantic label, said analysis being dependent upon a corresponding portion of said contextual data, wherein said labeled region adjacency graph represents an interpretation of said digital video signal.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Digital image signal interpretation is the process of understanding the content of an image through the identification of significant objects or regions in the image and analysing their spatial arrangement. Traditionally the task of image interpretation required human analysis. This is expensive and time consuming, consequently considerable research has been directed towards constructing automated image interpretation systems. A method of interpreting a digital video signal is disclosed whereby the digital video signal has contextual data. The method comprising the steps of firstly, segmenting the digital video signal into one or more video segments, each segment having a corresponding portion of the contextual data. Secondly, analysing each video segment to provide a graph at one or more temporal instances in the respective video segment dependent upon the corresponding portion of the contextual data.

135 Citations

44 Claims

1. A method of interpreting a digital video signal, wherein said digital video signal has contextual data, said method comprising the steps of:
- segmenting said digital video signal into one or more video segments, each segment having one or more video frames, and each said video frame having a corresponding portion of said contextual data;
  
  determining a plurality of regions for a video frame of a respective video segment;
  
  processing said regions for each video segment to provide a region adjacency graph at one or more temporal instances in the respective video segment, said region adjacency graph representing adjacencies between regions for a corresponding frame of said respective video segment; and
  
  analyzing said region adjacency graphs to produce a corresponding labeled region adjacency graph comprising at least one semantic label, said analysis being dependent upon a corresponding portion of said contextual data, wherein said labeled region adjacency graph represents an interpretation of said digital video signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 2. The method according to claim 1, wherein said contextual data comprises information generated by one or more separate sources of said information.
  - 3. The method according to claim 2, wherein the corresponding portion of said contextual data is obtained from a temporal region of interest for each source of contextual information, said temporal region of interest being relative to a video segment being analyzed.
  - 4. The method according to claim 2, wherein said contextual data includes portions of the video signal.
  - 5. The method according to claim 2, wherein said contextual data includes at least one data type selected from the group consisting of:
6. The method according to claim 5, wherein said textual tokens include textual annotation, phrases and keywords associated with at least a portion of a respective video segment.
7. The method according to claim 1 said analyzing step comprising the further sub-step of:
- biasing a statistical or probabilistic interpretation model dependent upon said corresponding portion of contextual data.
8. The method according to 7, wherein said biasing step includes the step of selecting a predetermined application domain from a plurality of application domains dependent upon said corresponding portion of contextual data, said application domain comprising a set of semantic labels appropriate for use with said application domain.
9. The method according to claim 7, wherein said biasing step includes the step of adjusting at least one prior probability of a respective one of a plurality of semantic labels in at least one application domain.
10. The method according to claim 7, wherein each said analyzing step is dependent upon said biased statistical or probabilistic interpretation model.
11. The method according to claim 10, wherein each said analyzing step is dependent upon said at least one application domain.
12. The method according to claim 7, wherein each said analyzing step includes the step of analyzing regions of said region adjacency graph using said biased statistical or probabilistic interpretation model to provide said labeled region adjacency graph.
13. The method according to claim 12, wherein said region analysis step is dependent upon adjusted prior probabilities of a plurality of labels to provide said labelled region adjacency graph.
14. The method to claim 7, wherein the statistical or probabilistic interpretation model is a Markov Random Field.
15. The method according to claim 1, further comprising the step of encoding said region adjacency graph to form a bitstream representation of said region adjacency graph.
16. The method according to claim 15, further comprising the step of:
- representing motion between two successive video frames using a predetermined motion model comprising encoded parameters.
17. The method according to claim 16, further comprising the step of combining each encoded video segment and respective encoded region adjacency graph to provide a motion encoded digital video signal.
18. The method according to claim 1, further comprising the step of providing metadata associated with each video segment, wherein said metadata includes said region adjacency graphs of each video segment.
19. The method according to claim 1, wherein said analysis step is carried out on a video frame and a temporal region of interest of the respective contextual data.
20. The method according to claim 1, wherein said digital video signal is generated using a digital video recoding device.
21. The method according to claim 20, wherein one or more portions of said contextual data is generated by one or more sensors of said digital video recording device.

22. An apparatus for interpreting a digital video signal, wherein said digital video signal has contextual data, said apparatus comprising:
- means for segmenting said digital video signal into one or more video segments, each segment having one or more video frames, and each said video frame having a corresponding portion of said contextual data;
  
  means for determining a plurality of regions for a video frame of a respective video segment;
  
  means for processing said regions for each video segment to provide a region adjacency graph at one or more temporal instances in the respective video, said region adjacency graph representing adjacencies between regions for a corresponding frame of said respective video segment; and
  
  means for analyzing said region adjacency graphs to produce a corresponding labeled region adjacency graph comprising at least one semantic label said analysis being dependent won a corresponding portion of said contextual data, wherein said labeled region adjacency graph represents an interpretation of said digital video signal.
- View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31)
- - 23. The apparatus according to claim 22, further comprising means for biasing a statistical or probabilistic interpretation model dependent upon said corresponding portion of contextual data.
  - 24. The apparatus according to claim 23, wherein said biasing means includes means for selecting a predetermined application domain from a plurality of application domains dependent upon said corresponding portion of contextual data, said application domain comprising a set of semantic labels appropriate for use with said application domain.
  - 25. The apparatus according to claim 23, wherein said biasing means includes means for adjusting at least one prior probability of a respective one of a plurality of semantic labels in at least one application domain.
  - 26. The apparatus according to claim 23, wherein said analysing means utilises said biased statistical or probabilistic interpretation model.
  - 27. The apparatus according to claim 26, wherein said analysing means includes at least one application domain.
  - 28. The apparatus according to claim 23, wherein said analysing means includes means for analysing regions of a region adjacency graph using said biased statistical or probabilistic interpretation model to provide a labelled region adjancey graph.
  - 29. The apparatus according to claim 28, wherein said region analysis means uses adjusted prior probabilities of a plurality of labels to provide said labelled region adjacency graph.
  - 30. The apparatus according to claim 22, wherein said digital video signal is generated using a digital video recording device.
  - 31. The apparatus to claim 30, wherein one or more portions of said contextual data is generated by one or more sensors of said digital video recording device.

32. A computer program stored in a computer readable medium, said computer program being configured for interpreting a digital video signal, wherein said digital video signal has contextual data, said computer program comprising:
- code for segmenting said digital video signal into one or more video segments, each segment having one or more video frames, and each said video frame having a corresponding portion of said contextual data;
  
  code for determining a plurality of regions for at least one video frame of a respective video segment;
  
  code for processing said regions for each video segment to provide a region adjacency graph at one or more temporal instances in the respective video segment, said region adjacency graph representing adjacencies between regions for a corresponding frame of said respective video segment; and
  
  code for analyzing said region adjacency graphs to produce a corresponding labeled region adjacency graph comprising at least one semantic label, said analysis being dependent upon a corresponding portion of said contextual data, wherein said labeled region adjacency graph represents an interpretation of said digital video signal.
- View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40, 41)
- - 33. The computer program according to claim 32, further comprising code for biasing a statistical or probabilistic interpretation model dependent upon said corresponding portion of contextual data.
  - 34. The computer program according to claim 33, wherein said biasing code includes code for selecting a predetermined application domain from a plurality of application domains dependent upon said corresponding portion of contextual data, said application domain comprising a set of semantic labels appropriate for use with said application domain.
  - 35. The computer program according to claim 33, wherein said biasing code includes code for adjusting at least one prior probability of a respective one of a plurality of semantic labels in at least one application domain.
  - 36. The computer program according to claim 33, wherein said analyzing code utilizes said biased statistical or probabilistic interpretation model.
  - 37. The computer program according to claim 36, wherein said analyzing code includes at least one application domain.
  - 38. The computer program according to claim 33, wherein said analyzing code includes code for analyzing regions of a region adjacency graph using said biased statistical or probabilistic interpretation model to provide a labeled region adjacency graph.
  - 39. The computer program according to claim 38, wherein said region analysis code uses adjusted prior probabilities of a plurality of labels to provide said labeled region adjacency graph.
  - 40. The computer program according to claim 32, wherein said digital video signal is generated using a digital video recording device.
  - 41. The computer program according to claim 40, wherein one or more portions of said contextual data is generated by one or more sensors of said digital video recording device.

42. A method of interpreting a digital video signal, wherein said digital video signal has contextual data, said method comprising the steps of;
- segmenting said digital video signal into one or more video segments, each segment having one or more video frames and a corresponding portion of said contextual data;
  
  determining a plurality of regions for at least one video frame of a respective video segment;
  
  processing said regions for each video segment to provide a region adjacency graph at one or more temporal instances in the respective video segment dependent upon said corresponding portion of said contextual data, said region adjacency graph representing adjacencies between regions for a corresponding frame of said respective video segment; and
  
  analyzing said region adjacency graphs to interpret said digital video signal.

43. An apparatus for interpreting a digital video signal, wherein said digital video signal has contextual data, said apparatus comprising:
- means for segmenting said digital video signal into one or more video segments, each segment having one or more video frames and a corresponding portion of said contextual data;
  
  means for determining a plurality of regions for at least one video frame of a respective video segment;
  
  means for processing said regions for each video segment to provide a region adjacency graph at one or more temporal instances in the respective video segment dependent upon said corresponding portion of said contextual data, said region adjacency graph representing adjacencies between regions for a corresponding frame of said respective video segment; and
  
  means for analyzing said region adjacency graphs to interpret said digital video signal.

44. A computer program stored in a computer readable medium, said computer program being configured for interpreting a digital video signal, wherein said digital video signal has contextual data, said computer program comprising:
- code for segmenting said digital video signal into one or more video segments, each segment having one or more video frames and a corresponding portion of said contextual data;
  
  code for determining a plurality of regions for at least one video frame of a respective video segment;
  
  code for processing said regions for each video segment to provide a region adjacency graph at one or more temporal instances in the respective video segment dependent upon said corresponding portion of said contextual data, said region adjacency graph representing adjacencies between regions for a corresponding frame of said respective video segment; and
  
  code for analyzing said region adjacency graphs to interpret said digital video signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Original Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Inventors
Lennon, Alison Joan, Le, Delphine Anh Dao
Primary Examiner(s)
Patel, Jayanti K.

Application Number

US09/296,612
Time in Patent Office

1,383 Days
Field of Search

382/100, 382/103, 382/107, 382/155, 382/162, 382/160, 382/171-172, 382/173, 382/181, 382/232, 382/305, 382/228, 382/229, 345/613, 345/630, 345/804, 345/619, 345/440, 704/3, 704/9, 358/403, 358/538, 348/700, 348/228, 707/513
US Class Current

382/173
CPC Class Codes

G06V 10/75 Organisation of the matchin...

H04N 19/543 using regions

Automated video interpretation system

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

135 Citations

44 Claims

Specification

Solutions

Use Cases

Quick Links

Automated video interpretation system

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

135 Citations

44 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links