Methods, devices and systems for detecting objects in a video

US 9,646,212 B2
Filed: 08/26/2016
Issued: 05/09/2017
Est. Priority Date: 09/12/2012
Status: Active Grant

First Claim

Patent Images

1. A method of detecting human objects with a video system, comprising:

obtaining a first video image from a first video camera;

obtaining a second video image from a second video camera;

determining pixels of the first video image are first foreground pixels and a group of the first foreground pixels constitute a first foreground blob set of one or more first foreground blobs;

for each of plural locations within the first video image, comparing a corresponding predetermined shape with the first foreground blob set to obtain a corresponding first probability of a human at the corresponding location, thereby obtaining plural first probabilities associated with the first video image corresponding to the plural locations within the first video image;

determining pixels of the second video image are second foreground pixels and a group of the second foreground pixels constitute a second foreground blob set of one or more second foreground blobs;

for each of plural locations within the second video image, comparing a corresponding predetermined shape with the second foreground blob set to obtain a corresponding second probability of a human at the corresponding location, thereby obtaining plural second probabilities associated with the second video image corresponding to the plural locations within the second video image;

using the plural first probabilities associated with the first video image and the plural second probabilities associated with the second video image, determining X humans are represented by the first foreground blob set and the second foreground blob set, where X is a whole number; and

providing at least one of a report, an alarm, and an event detection using the determination of the representation of X humans,wherein a size of the corresponding predetermined shapes for each of the plural locations within the first video image and each of the plural locations within the second video image is determined in response to calibration of the video system.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, devices and systems for performing video content analysis to detect humans or other objects of interest a video image is disclosed. The detection of humans may be used to count a number of humans, to determine a location of each human and/or perform crowd analyses of monitored areas.

Citations

38 Claims

1. A method of detecting human objects with a video system, comprising:
- obtaining a first video image from a first video camera;
  
  obtaining a second video image from a second video camera;
  
  determining pixels of the first video image are first foreground pixels and a group of the first foreground pixels constitute a first foreground blob set of one or more first foreground blobs;
  
  for each of plural locations within the first video image, comparing a corresponding predetermined shape with the first foreground blob set to obtain a corresponding first probability of a human at the corresponding location, thereby obtaining plural first probabilities associated with the first video image corresponding to the plural locations within the first video image;
  
  determining pixels of the second video image are second foreground pixels and a group of the second foreground pixels constitute a second foreground blob set of one or more second foreground blobs;
  
  for each of plural locations within the second video image, comparing a corresponding predetermined shape with the second foreground blob set to obtain a corresponding second probability of a human at the corresponding location, thereby obtaining plural second probabilities associated with the second video image corresponding to the plural locations within the second video image;
  
  using the plural first probabilities associated with the first video image and the plural second probabilities associated with the second video image, determining X humans are represented by the first foreground blob set and the second foreground blob set, where X is a whole number; and
  
  providing at least one of a report, an alarm, and an event detection using the determination of the representation of X humans,wherein a size of the corresponding predetermined shapes for each of the plural locations within the first video image and each of the plural locations within the second video image is determined in response to calibration of the video system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
- - 2. The method of claim 1, further comprising using the plural probabilities associated with the first video image and the plural probabilities associated with the second video image to determine a location of each of the X humans.
  - 3. The method of claim 2, wherein the determined location of each of the X humans is a location within an image plane corresponding to the first video image.
  - 4. The method of claim 2, wherein the determined location of each of the X humans is a location corresponding to the real world.
  - 5. The method of claim 1, wherein determining first foreground pixels of the first video image comprises comparison of a first frame of the first video image without foreground objects with comparison of a second frame of the first video image containing the foreground objects.
  - 6. The method of claim 1, wherein the predetermined shape is the same for each of the plural locations within the first video image.
  - 7. The method of claim 1, wherein the predetermined shape for at least some of the plural locations within the first video image has a different size.
  - 8. The method of claim 1, wherein the calibration of the video system comprises:
    - for each of the plural locations within the first video image, determining an image size of a portion of the first video image corresponding to an average human size at the corresponding location to determine the size of each predetermined shape for each of the plural locations within the first video image; and
      
      for each of the plural locations within the second video image, determining an image size of a portion of the second video image corresponding to an average human size at the corresponding location to determine the size of each predetermined shape for each of the plural locations within the second video image.
  - 9. The method of claim 1,further comprising, prior to the determination of the pixels of the first video image that are first foreground pixels and prior to the determination of the pixels of the second video image that are second foreground pixels, for each of the plural locations within the first video image and for each of the plural locations within the second video image, determining the corresponding predetermined shape by estimating a foreground image part to be occupied in the corresponding video image when a human exists at the corresponding location.
  - 10. The method of claim 9, wherein estimating the foreground image part for each of the plural locations within the first video image and for each of the plural locations within the second video image is based on a projection of a model of a human in the real world onto an image plane of the corresponding video image.
  - 11. The method of claim 1, wherein each of the first video image and second video image comprises a plurality of image frames, each image frame comprising a two dimensional image having the corresponding plural locations, each of the corresponding plural locations identified by a corresponding x, y coordinate pair within the corresponding two dimensional image.
  - 12. The method of claim 11, wherein each of the plural locations within the first video image and each of the plural locations within the second video image is associated with the corresponding predetermined shape with respect to an image plane of the corresponding video image.
  - 13. The method of claim 1, further comprising, for each of the plural locations within the first video image and for each of the plural locations within the second video image, calculating a recall ratio of the corresponding predetermined shape and the corresponding foreground blob set to determine the associated corresponding probability.
  - 14. The method of claim 13, wherein for each of the plural locations within the first video image and for each of the plural locations within the second video image, calculating the recall ratio comprises determining a ratio of (a) an area comprising an overlap of an area occupied by the corresponding predetermined shape and the corresponding foreground blob set and (b) an area of the corresponding foreground blob set.
  - 15. The method of claim 1, further comprising:
    - creating a probability map comprising plural third probabilities using the plural first probabilities associated with the first video image and the plural second probabilities associated with the second video image; and
      
      determining local maximums of probabilities of the probability map.
  - 16. The method of claim 15, further comprising:
    - selecting a first subset of the plural predetermined shapes based on the third probabilities and selecting a second subset of the plural predetermined shapes based on the third probabilities; and
      
      analyzing an overlap of an area occupied by the selected first subset of the plural predetermined shapes and an area occupied by the first foreground blob set; and
      
      analyzing an overlap of an area occupied by the selected second subset of the plural predetermined shapes and an area occupied by the second foreground blob set.
  - 17. The method of claim 15, further comprising determining each of the plural third probabilities by associating a corresponding one of the first probabilities with a corresponding one of the second probabilities.
  - 18. The method of claim 15, wherein each of the plural third probabilities of the probability map are associated with a corresponding real world location.
  - 19. The method of claim 18, further comprising:
    - selecting a first real world location associated with one of the third probabilities corresponding to a local maximum of the probability map;
      
      obtaining a first predetermined shape and a second predetermined shape that each correspond to the selected first real world location.
  - 20. The method of claim 19, further comprising:
    - analyzing an amount of an overlap of an area occupied by the first predetermined shape and the first foreground blob set; and
      
      analyzing an amount of an overlap of an area occupied by the second predetermined shape and the second foreground blob set.
  - 21. The method of claim 19, further comprising:
    - calculating a first ratio of (a) an area comprising an overlap of an area occupied by the first predetermined shape and the first foreground blob set and (b) an area of the first foreground blob set,wherein the first ratio is to determine that X humans are represented by the first foreground blob set.
  - 22. The method of claim 18, further comprising calculating a precision value and a recall value for each of m locations of the real world locations corresponding to the plural third probabilities, m being an integer, each of the m locations corresponding to a local maximum of the probability map.
  - 23. The method of claim 22, wherein each of the m locations are selected sequentially 1 to m, a selection of an (m−
    - 1)th location excluding selection of an mth location that falls within a first predetermined distance of the (m−
      
      1)th location.
  - 24. The method of claim 23, wherein each of the m locations are selected sequentially 1 to m wherein the selection of a next location of the m locations comprises selecting a location based upon a bottom edge of at least one of the first video image and the second video image for those locations corresponding to a local maximum that have not been excluded.
  - 25. The method of claim 18, further comprising, for each of the real word locations corresponding to the plural third probabilities, associating one of the first probabilities and one of the second probabilities to obtain the corresponding third probability.
  - 26. The method of claim 25, wherein each of the third probabilities of the probability map are determined based on a mathematical product of one of the first probabilities and one of the second probabilities.
  - 27. The method of claim 15, wherein each of the third probabilities of the probability map is determined based on a mathematical product of one of the first probabilities and one of the second probabilities.

28. A method of detecting human objects with a video system, comprising:
- obtaining a first video image from a first video camera;
  
  obtaining a second video image from a second video camera;
  
  determining pixels of the first video image are first foreground pixels and a group of the first foreground pixels constitute a first foreground blob set of one or more first foreground blobs;
  
  determining pixels of the second video image are second foreground pixels and a group of the second foreground pixels constitute a second foreground blob set of one or more second foreground blobs;
  
  for each of plural locations within the first video image, comparing a corresponding predetermined shape with the first foreground blob set and for each of plural locations within the second video image, comparing a corresponding predetermined shape with the second foreground blob set, to determine X humans are represented by the first foreground blob set and the second foreground blob set, where X is whole number, and to determine a location of each of the X humans within the real world; and
  
  providing at least one of a report, an alarm, and an event detection using the determination of the representation of X humans,wherein a size of the corresponding predetermined shapes for each of the plural locations within the first video image and each of the plural locations within the second video image is determined in response to calibration of the video system.
- View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36)
- - 29. The method of claim 28, further comprising detecting the existence of a crowd by reviewing at least some of the locations of the X humans.
  - 30. The method of claim 28, further comprising determining an existence of a crowd when it is determined that Y of the X humans are located within a first area of a horizontal plane of the real world.
  - 31. The method of claim 30, wherein the first area comprises a predetermined geometric shape having a predetermined area size within the real world.
  - 32. The method of claim 30, wherein the first area comprises an area defined by a circle.
  - 33. The method of claim 30, further comprising determining a crowd density within the first area.
  - 34. The method of claim 33, further comprising comparing the crowd density to a threshold and providing at least one of the report and the alarm when the crowd density exceeds the threshold.
  - 35. The method of claim 28, further comprising:
    - determining a first crowd density within a first area corresponding to a first time;
      
      determining a second crowd density within the first area corresponding to a second time; and
      
      determining a crowd gathering event in response to the first crowd density and the second crowd density.
  - 36. The method of claim 28, further comprising:
    - determining a first crowd density within a first area corresponding to a first time;
      
      determining a second crowd density within the first area corresponding to a second time; and
      
      determining a crowd dispersing event in response to the first crowd density and the second crowd density.

37. A video surveillance system, comprising:
- a first video source configured to provide a first video image of a real world scene;
  
  a second video source configured to provide a second video image of the real world scene;
  
  a foreground detection module configured to detect first foreground pixels of the first video image, a group of the first foreground pixels constituting a first foreground blob set of one or more first foreground blobs, second foreground pixels of the second video image, and a group of the second foreground pixels constituting a second foreground blob set of one or more second foreground blobs;
  
  a human detection module configured determine X humans are represented by the first foreground blob set and the second foreground blob set by, for each of plural locations within the first video image, comparing a corresponding predetermined shape with the first foreground blob set and for each of plural locations within the second video image, comparing a corresponding predetermined shape with the second foreground blob set; and
  
  a response module configured to provide at least one of a report, an alarm, and an event detection using the determined representation of X humans,wherein the human detection module is configured to associate the plural locations within the first video image with corresponding ones of the plural locations within the second video image based upon determining real world locations that correspond to the plural locations within the first video image and the plural locations within the second video image.
- View Dependent Claims (38)
- - 38. The video surveillance system of claim 37, wherein the human detection module is configured to determine X humans are represented by the first foreground blob set and the second foreground blob set by, for each of the plural locations within the first video image, analyzing an amount of an overlapping area of the corresponding predetermined shape with the first foreground blob set and for each of the plural locations within the second video image, analyzing an amount of an overlapping area of the corresponding predetermined shape with the second foreground blob set.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Motorola Solutions, Inc.
Original Assignee
Avigilon Fortress Corporation (Motorola Solutions, Inc.)
Inventors
Zhang, Zhong, Yin, Weihong, Venetianer, Peter
Primary Examiner(s)
AZARIAN, SEYED H

Application Number

US15/247,986
Publication Number

US 20160379061A1
Time in Patent Office

256 Days
Field of Search

382100, 382103, 382106-107, 382117-118, 382154, 382162, 382168, 382173, 382181, 382189-199, 382209, 382219, 382232, 382254, 382274, 382276, 382291, 382305, 382312, 348143, 348142, 348154, 348159, 348 86
US Class Current
CPC Class Codes

G06T 2207/10016   Video; Image sequence

G06T 2207/20076   Probabilistic image processing

G06T 2207/30196   Human being; Person

G06T 2207/30232   Surveillance

G06T 7/75   involving models

G06V 20/52   Surveillance or monitoring ...

G06V 20/53   Recognition of crowd images...

G06V 20/54   of traffic, e.g. cars on th...

G06V 40/103   Static body considered as a...

Methods, devices and systems for detecting objects in a video

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

38 Claims

Specification

Solutions

Use Cases

Quick Links

Methods, devices and systems for detecting objects in a video

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

38 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links