Object-based parsing and indexing of compressed video streams

US 6,389,168 B2
Filed: 10/13/1998
Issued: 05/14/2002
Est. Priority Date: 10/13/1998
Status: Expired due to Fees

First Claim

Patent Images

1. A method for object-based parsing and indexing compressed video streams comprising the steps of:

identifying first composition of first frame video objects in a first video frame of a compressed video stream, each said first frame video object in said first composition being a video representation of a physical entity that was imaged during capture of said first video frame, including assigning each of said first frame video objects at least one associated first quantitative attribute value and including determining a first orientation of said first frame video objects;

identifying a second composition of second frame video objects in a second video frame of said compressed video stream, each said second frame video object in said second composition being a video representation of a physical entity that was imaged during capture of said second video frame, including assigning each second frame video object at least one associated second quantitative attribute value and including determining a second orientation of said second frame video objects;

comparing at least one first quantitative attribute value to at least one second quantitative attribute value to determine if a predetermined threshold has been exceeded, said predetermined threshold being related to a difference between attribute values, including comparing said first and said second orientations; and

as a response to said determination of whether said predetermined threshold has been exceeded, selectively indexing a video frame selected from a portion of said compressed video stream bounded by said first video frame and said second video frame.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for object-based video retrieval and indexing include a configuration detection processor for deriving quantitative attribute information for video frames in a compressed video stream. The quantitative attribute information includes object data for a video frame, including the number of objects and their orientation within the video frame and the size, shape, texture, and motion of each object. A configuration comparison processor compares object data from first and second frames to determine differences between first frame video objects and second frame video objects. The configuration comparison processor has a shot boundary detection mode in which it cooperates with a shot boundary detector to identify shot boundaries within a video sequence. In a key frame selection mode, the configuration comparison processor cooperates with a key frame selector to select key frames from the video sequence. A key instance selector communicates with the configuration comparison processor during a key instance selection mode to select key instances of video objects based on differences between first and second instances of video objects. The configuration comparison processor cooperates with a camera operation detector to identify camera operations such as zoom, tracking, and panning within the video sequence. A special effects detector cooperates with the configuration comparison processor to detect special effects video edits such as wipe, dissolve, and fade. The configuration comparison processor and a query match detector enable a user to configure object-based queries and to retrieve video sequences or video frames which include a query video object.

Citations

13 Claims

1. A method for object-based parsing and indexing compressed video streams comprising the steps of:
- identifying first composition of first frame video objects in a first video frame of a compressed video stream, each said first frame video object in said first composition being a video representation of a physical entity that was imaged during capture of said first video frame, including assigning each of said first frame video objects at least one associated first quantitative attribute value and including determining a first orientation of said first frame video objects;
  
  identifying a second composition of second frame video objects in a second video frame of said compressed video stream, each said second frame video object in said second composition being a video representation of a physical entity that was imaged during capture of said second video frame, including assigning each second frame video object at least one associated second quantitative attribute value and including determining a second orientation of said second frame video objects;
  
  comparing at least one first quantitative attribute value to at least one second quantitative attribute value to determine if a predetermined threshold has been exceeded, said predetermined threshold being related to a difference between attribute values, including comparing said first and said second orientations; and
  
  as a response to said determination of whether said predetermined threshold has been exceeded, selectively indexing a video frame selected from a portion of said compressed video stream bounded by said first video frame and said second video frame.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 wherein said step of assigning said first and second quantitative attribute values includes determining attribute values related to at least one of motion, shape, texture, color, and size of a first instance of a first frame video object in said first video frame and a second instance of said first frame video object in said second frame, said comparing step including determining if a difference between said first and said second quantitative attribute values exceeds a key instance threshold, said indexing step including indexing a key instance of said first video object in response to a determination that said difference between said first and said second quantitative attribute values exceeds said key instance threshold.
  - 3. The method of claim 2 further comprising the steps of:
4. The method of claim 2 further comprising the steps of:
- receiving an image retrieval query which includes an identification of a query video object having a query quantitative attribute value;
  
  calculating a similarity value between said query quantitative attribute value and a quantitative attribute value of said indexed key instance; and
  
  presenting said similarity value in a ranking of similarity values generated by comparing said query quantitative attribute value to quantitative attribute values of other key instances.
5. The method of claim 1 wherein said comparing step includes comparing first quantitative attribute values of first frame video objects to second quantitative attribute values of second frame video objects to determine if a key frame threshold is exceeded, said selective indexing step including selectively indexing a key frame as a response to a determination that said key frame threshold has been exceeded.
6. The method of claim 1 wherein said step of identifying said first composition of said first video frame and said step of identifying said second composition of said second video frame include calculating a motion histogram at least partially based on first quantitative attribute-values associated with a first occurrence of a subset of said first frame video objects in said first video frame and second quantitative attribute values associated with a second occurrence of said subset of said first frame video objects in said second video frame, the method further comprising a step of comparing said calculated motion histogram to a predetermined ideal motion histogram to determine if said video sequence which includes said first and said second video frames comprises one of a zoom camera operation, a panning camera operation, and a tracking camera operation.
7. The method of claim 6, wherein said step of calculating said motion histogram occurs after a determination of whether said video sequence bounded by said first video frame and said second video frame includes a shot boundary.
8. The method of claim 1 wherein said step of identifying said first composition of video objects includes assigning said each of said first frame video objects an object intensity, said step of identifying said second composition of video objects including assigning said each second frame video object an object intensity, said comparing step including comparing said object intensities of said first frame video objects to said object intensities of said second frame video objects to determine if a special effects video edit threshold has been exceeded.

9. A method for indexing a video sequence within a compressed video stream and for video retrieval comprising the steps of:
- extracting key instances of video objects within each video shot defined by consecutive shot boundaries, said key instance extraction including the steps of;
  
  a) identifying a first set of quantitative attributes associated with a first instance of a video object in a first video frame, said first instance of a video object being a video representation of a physical entity that was imaged during capture of said first video frame, said first set of quantitative attributes including at least one of motion, size, shape, color, and texture;
  
  b) identifying a second set of quantitative attributes associated with a second instance of said video object in a corresponding second video frame, said second instance of a video object being a video representation of a physical entity that was imaged during capture of said second video frame, said second set of quantitative values including at least one of motion, size, shape, color, and texture;
  
  c) comparing said first set of quantitative attributes to said second set of quantitative attributes to determine if a difference between said first and said second set of quantitative attributes exceeds a key instance threshold; and
  
  d) indexing a key instance of said video object if said key instance threshold is exceeded;
  
  establishing said shot boundaries within said video sequence in said compressed video stream, including the steps of;
  
  a) selecting first video frames and second video frames within said compressed video stream such that each first video frame corresponds to a second video frame, thereby identifying corresponding first and second video frames;
  
  b) calculating video object quantity differentials between said first video frames and said second video frames;
  
  c) for each said corresponding first and second video frames, determining if an object quantity differential exceeds a shot boundary threshold; and
  
  d) indexing a shot boundary within each video sub-sequence defined by each said corresponding first and second video frames having an object quantity differential which exceeds said shot boundary threshold; and
  
  extracting key frames within each video shot defined by consecutive shot boundaries, including the steps of;
  
  a) for each said corresponding first and second video frames within a subset of said corresponding first and second video frames determined not to define a shot boundary, determining if one of a quantitative attribute differential and said object quantity differential exceeds a key frame threshold; and
  
  b) indexing at least one key frame for each shot having said corresponding first and second video frames determined to have one of an associated quantitative attribute differential and object quantity differential in excess of said key frame threshold.
- View Dependent Claims (10, 11)
- - 10. The method of claim 9, further comprising the steps of:
11. The method of claim 9, wherein said compressed video stream is an MPEG-4 video stream.

12. A method for object-based parsing and indexing compressed video streams comprising the steps of:
- identifying a first composition of first frame video objects in a first video frame of a compressed video stream, each said first frame video object in said first composition being a video representation of a physical entity that was imaged during capture of said first video frame, including assigning each of said first frame video objects at least one associated first quantitative attribute value and including determining a first orientation of said first frame video objects;
  
  identifying a second composition of second frame video objects in a second video frame of said compressed video stream, each said second frame video object in said second composition being a video representation of a physical entity that was imaged during capture of said second video frame, including assigning each second frame video object at least one associated second quantitative attribute value;
  
  comparing at least one first quantitative attribute value to at least one second quantitative attribute value to determine if a predetermined threshold has been exceeded, said predetermined threshold being related to a difference between attribute values; and
  
  as a response to said determination of whether said predetermined threshold has been exceeded, selectively indexing a video frame selected from a portion of said compressed video stream bounded by said first video frame and said second video frame;
  
  wherein said step of identifying said first composition of said first video frame and said step of identifying said second composition of said second video frame include calculating a motion histogram at least partially based on first quantitative attribute values associated with a first occurrence of a subset of said first frame video objects in said first video frame and second quantitative attribute values associated with a second occurrence of said subset of said first frame video objects in said second video frame, the method further comprising a step of comparing said calculated motion histogram to a predetermined ideal motion histogram to determine if said video sequence which includes said first and said second video frames comprises one of a zoom camera operation, a panning camera operation, and a tracking camera operation.
- View Dependent Claims (13)
- - 13. The method of claim 12 wherein said step of calculating said motion histogram occurs after a determination of whether said video sequence bounded by said first video frame and said second video frame includes a shot boundary.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hewlett-Packard Development Company, L.P. (HP Inc.)
Original Assignee
Hewlett-Packard Company (HP Inc.)
Inventors
Altunbasak, Yucel, Zhang, HongJiang
Primary Examiner(s)
Chen, Wenpeng

Application Number

US09/172,399
Time in Patent Office

1,309 Days
Field of Search

382/232, 382/236, 382/224, 348/231, 348/222, 348/722, 348/7, 348/416, 348/700, 345/328, 345/440, 345/723, 707/104, 375/240.08
US Class Current

382/224
CPC Class Codes

G06F 16/71   Indexing; Data structures t...

G06F 16/7837   using objects detected or r...

G06F 16/786   using motion, e.g. object m...

G06T 9/005   Statistical coding, e.g. Hu...

G06V 20/40   in video content extracting...

H04N 5/147   Scene change detection

Object-based parsing and indexing of compressed video streams

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Object-based parsing and indexing of compressed video streams

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links