Analytics-modulated coding of surveillance video

US 9,215,467 B2
Filed: 11/17/2009
Issued: 12/15/2015
Est. Priority Date: 11/17/2008
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

receiving, at an analytics module implemented in at least one of a memory or a processing device, a video frame having a plurality of pixels;

assigning, at the analytics module and without user intervention, based on a type of a foreground object from the video frame having the plurality of pixels, a class from a plurality of predetermined classes to the foreground object;

adjusting a quantization parameter value associated with the foreground object based on a weight associated with the class assigned to the foreground object, a size of the foreground object, and a target bit rate associated with the video frame, the weight being based on a coding priority associated with the class assigned to the foreground object;

producing a plurality of DCT coefficients for pixels from the plurality of pixels of the video frame associated with the foreground object;

quantizing, at a quantization module, the DCT coefficients associated with the foreground object based on the adjusted quantization parameter value;

coding the quantized DCT coefficients associated with the foreground object to produce coded quantized DCT coefficients; and

sending, to a storage module, a representation of the coded quantized DCT coefficients.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for encoding surveillance video where one or more regions of interest are identified and the encoding parameter values associated with those regions are specified in accordance with intermediate outputs of a video analytics process. Such an analytics-modulated video compression approach allows the coding process to adapt dynamically based on the content of the surveillance images. In this manner, the fidelity of the region of interest is increased relative to that of a background region such that the coding efficiency is improved, including instances when no target objects appear in the scene. Better compression results can be achieved by assigning different coding priority levels to different types of detected objects.

142 Citations

33 Claims

1. A method, comprising:
- receiving, at an analytics module implemented in at least one of a memory or a processing device, a video frame having a plurality of pixels;
  
  assigning, at the analytics module and without user intervention, based on a type of a foreground object from the video frame having the plurality of pixels, a class from a plurality of predetermined classes to the foreground object;
  
  adjusting a quantization parameter value associated with the foreground object based on a weight associated with the class assigned to the foreground object, a size of the foreground object, and a target bit rate associated with the video frame, the weight being based on a coding priority associated with the class assigned to the foreground object;
  
  producing a plurality of DCT coefficients for pixels from the plurality of pixels of the video frame associated with the foreground object;
  
  quantizing, at a quantization module, the DCT coefficients associated with the foreground object based on the adjusted quantization parameter value;
  
  coding the quantized DCT coefficients associated with the foreground object to produce coded quantized DCT coefficients; and
  
  sending, to a storage module, a representation of the coded quantized DCT coefficients.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, further comprising:
    - coding the video frame via a first-pass of a low-complexity coding operation;
      
      adjusting a human visual system factor associated with the video frame based on the coded video frame; and
      
      adjusting the quantization parameter value associated with the foreground object based on the adjusted human visual system factor.
  - 3. The method of claim 1, wherein the foreground object is a first foreground object, the class assigned to the foreground object being a first class, the weight associated with the first class being a first weight, the quantization parameter value associated with the first foreground object being a first quantization parameter value, the method further comprising:
    - assigning a second class from the plurality of predetermined classes to a second foreground object from the video frame, the second class being different from the first class;
      
      adjusting a second quantization parameter value associated with the second foreground object based on at least one of a target bit rate and a second weight associated with the second class assigned to the second foreground object, the second quantization parameter value being different from the first quantization parameter value, the second weight being different from the first weight;
      
      producing a plurality of DCT coefficients for pixels from the plurality of pixels of the video frame associated with the second foreground object;
      
      quantizing the DCT coefficients associated with the second foreground object based on the adjusted second quantization parameter value; and
      
      coding the quantized DCT coefficients associated with the second foreground object.
  - 4. The method of claim 1, wherein:
    - the adjusting includes scaling the quantization parameter value associated with the foreground object based on at least one of the target bit rate or the weight associated with the class assigned to the foreground object.
  - 5. The method of claim 1, further comprising:
    - generating gradient information associated with the video frame via a single pass through the video frame;
      
      deriving a human visual system factor associated with the video frame using the gradient information; and
      
      adjusting the quantization parameter value associated with the foreground object based on at least one of the target bit rate, the weight associated with the class assigned to the foreground object, or the human visual system factor.
  - 6. The method of claim 1, wherein the type of a foreground object is at least one of a person, an animal, a vehicle, a building, a pole, or a sign.
  - 7. The method of claim 1, wherein the video frame is from a plurality of video frames, the plurality of pixels of the video frame are organized into a plurality of blocks of pixels, the method further comprising:
    - adjusting a bit rate associated with the video frame based on at least one of a target bit rate for the plurality of video frames, a remaining number of video frames from the plurality of video frames, a remaining number of available bits assigned to the video frame, or a scene complexity associated with the video frame;
      
      adjusting a bit rate associated with each block of pixels from the plurality of blocks of pixels based at least in part on a target bit rate associated with the video frame or a complexity of each block of pixels from the plurality of blocks of pixels and adjusting the quantization parameter value associated with the foreground object based on at least one of the weight associated with the class assigned to the foreground object, a target bit rate associated with the foreground object, the target bit rate associated with the video frame, or the bit rate associated with each block of pixels from the plurality of blocks of pixels.
  - 8. The method of claim 7, wherein the scene complexity is based on at least one of a number of objects in the video frame, one or more sizes of objects in the video frame, or one or more classes of objects in the video frame.
  - 9. The method of claim 7, wherein the complexity of the each block of pixels from the plurality of blocks of pixels is based at least in part on one or more human visual system factors.
  - 10. The method of claim 1, wherein the foreground object is from a plurality of foreground objects, the adjusting the quantization parameter value associated with the foreground object from the video frame being based on a quantity of foreground objects from the plurality of foreground objects.

11. A method, comprising:
- receiving, at an analytics module implemented in at least one of a memory or a processing device, a first video frame having a plurality of blocks of pixels and a second video frame having a plurality of blocks of pixels;
  
  assigning, at the analytics module and without user intervention, a class, based on a type of a foreground object from the first video frame, from a plurality of predetermined classes to the foreground object, the foreground object including a block of pixels from the plurality of blocks of pixels of the first video frame, each class from the plurality of predetermined classes having associated therewith a coding priority;
  
  identifying in the second video frame a prediction block of pixels associated with the block of pixels in the foreground object, the identifying being based on a prediction search window having a search area associated with the class assigned to the foreground object;
  
  coding, at a coding module, the first video frame based on the identified prediction block of pixels, a size of the foreground object, and a target bit rate associated with the first video frame to produce a coded video flame; and
  
  sending, to a storage module, a representation of the coded video frame.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
- - 12. The method of claim 11, further comprising:
    - updating the search area of the prediction search window according to tracked motion information associated with the foreground object over a plurality of video frames including the first video frame.
  - 13. The method of claim 11, wherein:
    - the class assigned to the foreground object is a first class, the plurality of predetermined classes including a second class different from the first class,the first class having an, associated prediction search window,the second class having an associated prediction search window,a search area of the prediction search window associated with the first class being smaller than a search area of the prediction search window associated with the second class when the coding priority associated with the first class is lower than the coding priority associated with the second class.
  - 14. The method of claim 11, further comprising:
    - adjusting the search area of the prediction search window based on moving portions of the foreground object.
  - 15. The method of claim 11, wherein the class from the plurality of predetermined classes is one of a vehicle or a person.
  - 16. The method of claim 11, wherein the foreground object is from a plurality of foreground objects, the coding including coding the first video frame based on a quantity of foreground objects from the plurality of foreground objects.
  - 17. The method of claim 11, wherein the coding includes coding the first video frame based on gradient information associated with at least one of the first video frame or the second video frame.
  - 18. The method of claim 11, wherein the coding includes coding the first video frame based on gradient information associated with the foreground object, and (2) temporal activity associated with the foreground object.

19. A non-transitory processor-readable:
- medium storing code representing instructions to be executed by a processor, the code comprising code to cause the processor to;
  
  assign, without user intervention, based on a type of a foreground object from a video frame having a plurality of pixels, a class from a plurality of predetermined classes to the foreground object, each class from the plurality of predetermined classes having associated therewith a coding priority;
  
  track motion information associated with the foreground object in a first video frame having a plurality of blocks of pixels, the foreground object including, a block of pixels from the plurality of blocks of pixels of the first video frame;
  
  identity in a second video frame having a plurality of blocks of pixels a prediction block of pixels associated with the block of pixels in the foreground object, the identification of the prediction block of pixels based on a prediction search window having a search area associated with the tracked motion information associated with the foreground object, the search area of the prediction search window is updated according to the class assigned to the foreground object;
  
  code the first video frame based on the identified prediction block of pixels and a target bit rate associated with the first video frame to produce a coded Video frame; and
  
  send, to a storage module, a representation of the coded video frame.
- View Dependent Claims (20, 21, 22)
- - 20. The non-transitory processor-readable medium of claim 19, wherein the class from the plurality of predetermined classes is one of a vehicle or a person.
  - 21. The non-transitory processor-readable medium of claim 19, wherein the search area of the prediction search window is updated according to the tracked motion information associated with the foreground object.
  - 22. The non-transitory processor-readable medium of claim 19, the code further comprising code to cause the processor to:
    - generate gradient information associated with the first video frame; and
      
      define, based on the gradient information, a human visual system factor associated with the first video frame,the code to cause the processor to code the first video frame includes code to cause the processor to code the first video frame based on the human visual system factor.

23. A method, comprising:
- receiving, at an analytics module implemented in at least one of a memory or a processing device, a plurality of pictures associated with a scene;
  
  assigning, at the analytics module and without user intervention, based on a type of a foreground object from a picture in a first group of pictures (GOP) from the plurality of pictures, a class from a plurality of predetermined classes to the foreground object, each class from the plurality of predetermined classes having associated therewith a coding priority, the first GOP (1) having a first number of frames between two intra-frames, and (2) associated with the scene at a first time;
  
  tracking, at a tracking module, motion information associated with the foreground object over a plurality of pictures;
  
  inserting an intra-frame picture in the first GOP based on the tracked motion information associated with the foreground object and the coding priority associated with the class assigned to the foreground object;
  
  defining a second GOP from the plurality of pictures and associated with the scene at a second time after the first time to have a second number of frames between two intra-frames based on the foreground object leaving the scene after the first time and before the second time, the second number of frames being different than the first number of frames; and
  
  sending, to a storage module, a representation of at least one of the first GOP or the second GOP.
- View Dependent Claims (24, 25)
- - 24. The method of claim 23, further comprising:
    - modifying a structure associated with at least one of the first GOP or the second GOP based on segmentation results associated with the foreground object and with the coding priority associated with the class assigned to the foreground object.
  - 25. The method of claim 23, further comprising:
    - modifying a number of pictures associated with at least one of the first GOP or the second GOP based, on segmentation results associated with the foreground object and with the coding priority associated with the class assigned to the foreground object.

26. A method, comprising:
- receiving, at an analytics module implemented in at least one of a memory or a processing device, a plurality of pictures associated with a scene;
  
  assigning, at the analytics module and without user intervention, based on a type of a foreground object from a picture in a first group of pictures (GOP) from the plurality of pictures, a class from a plurality of predetermined classes to the foreground object, each class from the plurality of predetermined classes having associated therewith a coding priority, the first group of pictures (1) having a first number of frames between two intra-frames, and (2) associated with the scene at a first time;
  
  tracking, at a tracking module, motion information associated with the foreground object over a plurality of pictures;
  
  replacing a block of pixels in the foreground object with an intra-frame block of pixels based on the tracked motion information associated with the foreground object and the coding priority associated with the class assigned to the foreground object;
  
  defining a second GOP from the plurality of pictures and associated with the scene at a second time after the first time to have a second number of frames between two intra-frames based on the foreground object leaving the scene after the first time and before the second time, the first number of frames being different than the second number of frames; and
  
  sending, to a storage module, a representation of the second GOP.

27. A method, comprising:
- receiving, at an analytics module implemented in at least one of a memory or a processing device, a group of pictures (GOP);
  
  segmenting a foreground object from a background of a picture in the GOP the foreground object of the picture having a plurality of pixels organized into a plurality of blocks of pixels, the background of the picture having a plurality of pixels organized into a plurality of blocks of pixels;
  
  tracking motion information associated with a block of pixels from the plurality of blocks of pixels of the foreground object, a first block of pixels from the plurality of blocks of pixels of the background, and a second block of pixels from the plurality of blocks of pixels of the background;
  
  encoding, at an encode module, the block of pixels from the plurality of blocks of pixels of the foreground object as an intra-coded block of pixels to produce an encoded intra-coded block of pixels based on (1) the motion information associated with the block of pixels from the plurality of blocks of pixels of the foreground object, (2) a size of the foreground object, and (3) a target bit rate associated with the GOP;
  
  encoding, at the encode module, the first block of pixels from the plurality of blocks of pixels of the background as a predictive-coded block of pixels to produce an encoded predictive coded block of pixels based on the motion information associated with the first block of pixels from the plurality of blocks of pixels of the background and the target bit rate associated with the GOP;
  
  encoding, at the encode module, the second block of pixels from the plurality of blocks of pixels of the background as a skipped block of pixels to produce an encoded skipped block of pixels based on the motion information associated with the second block of pixels from the plurality of blocks of pixels of the background and the target bit rate associated with the GOP; and
  
  sending, to a storage module, a representation of at least one of the encoded intra-coded block of pixels, the encoded predictive-coded block of pixels, or the encoded skipped block of pixels.
- View Dependent Claims (28)
- - 28. The method of claim 27, wherein the tracking of motion information includes detecting motion in the first block of pixels from the plurality of blocks of pixels of the background and detecting an absence of motion in the second block of pixels from the plurality of blocks of pixels of the background.

29. An apparatus, comprising:
- a receive module implemented in at least one of a memory or a processing device the receive module configured to receive a video frame having a plurality of pixels;
  
  a segment module configured to segment, from the video frame, a foreground object from a background, the foreground object being from a plurality of foreground objects, the foreground object of the video frame having a plurality of pixels organized into a plurality of blocks of pixels, the background of the video frame having a plurality of pixels organized into a plurality of Hocks of pixels;
  
  a track module configured to track motion information associated with the block of pixels from the plurality of blocks of pixels of the foreground, a first block of pixels from the plurality of blocks of pixels of the background, and a second block of pixels from the plurality of blocks of pixels of the background;
  
  an encode module configured to encode the block of pixels from the plurality of blocks of pixels of the foreground object as an intra-coded macroblock based on (1) the motion information associated with the block of pixels from the plurality of blocks of pixels of the foreground object, (2) a quantity of foreground objects from the plurality of foreground objects, and (3) a target bit rate associated with the video frame to produce an encoded intra-coded macroblock;
  
  an encode module configured to encode the first block of pixels from the plurality of blocks of pixels of the background as a predictive-coded macroblock based on the motion information associated with the first block of pixels from the plurality of blocks of pixels of the background and the target hit rate associated with the video frame to produce an encoded predictive-coded macroblock;
  
  an encode module configured to encode the second block of pixels from the plurality of blocks of pixels of the background as at least one of a bidirectionally-predictive coded macroblock or a skipped macroblock based on the motion information associated with the second block of pixels from the plurality of blocks of pixels of the background and the target bit rate associated with the video frame to produce at least one of an encoded bidirectionally-predictive coded macroblock or an encoded skipped macroblock; and
  
  a send module configured to send a representation of at least one of the encoded intra-coded macroblock, the encoded predictive-coded macroblock, or the encoded skipped macroblock.
- View Dependent Claims (30, 31, 32, 33)
- - 30. The apparatus of claim 29, wherein the type pf foreground object is at least one of a person, an animal, a vehicle, a building, a pole, or a sign.
  - 31. The apparatus of claim 29, wherein the track module is configured to detect motion in the first block of pixels from the plurality of blocks of pixels of the background, the track module configured to detect an absence of motion in the second block of pixels from the plurality of blocks of pixels of the background.
  - 32. The apparatus of claim 29, wherein at least one of the block of pixels from the plurality of blocks of pixels of the foreground object defines a contour associated with the foreground object.
  - 33. The apparatus of claim 29, wherein the encode module is configured to encode the block of pixels from the plurality of blocks of pixels of the foreground object based on a size of the foreground object.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cernium Corporation (Kastle Systems International LLC)
Original Assignee
CheckVideo, LLC (Kastle Systems International LLC)
Inventors
Cheok, Lai-Tee, Gagvani, Nikhil
Primary Examiner(s)
Senfi, Behrooz
Assistant Examiner(s)
VAZQUEZ COLON, MARIA E

Application Number

US12/620,232
Publication Number

US 20100124274A1
Time in Patent Office

2,219 Days
Field of Search

375/240.03, 382/103, 348/143
US Class Current

1/1
CPC Class Codes

H04N 19/107   between spatial and tempora...

H04N 19/114   Adapting the group of pictu...

H04N 19/115   Selection of the code volum...

H04N 19/124   Quantisation

H04N 19/132   Sampling, masking or trunca...

H04N 19/137   Motion inside a coding unit...

H04N 19/14   Coding unit complexity, e.g...

H04N 19/142   Detection of scene cut or s...

H04N 19/146   Data rate or code amount at...

H04N 19/149   by estimating the code amou...

H04N 19/152   by measuring the fullness o...

H04N 19/167   Position within a video ima...

H04N 19/17   the unit being an image reg...

H04N 19/172   the region being a picture,...

H04N 19/176   the region being a block, e...

H04N 19/177   the unit being a group of p...

H04N 19/61   in combination with predict...

H04N 19/70   characterised by syntax asp...

H04N 7/18   Closed-circuit television [...

H04N 7/183   for receiving images from a...

Analytics-modulated coding of surveillance video

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

142 Citations

33 Claims

Specification

Solutions

Use Cases

Quick Links

Analytics-modulated coding of surveillance video

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

142 Citations

33 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links