Automatic detection and tracking of multiple individuals using multiple cues

US 7,130,446 B2
Filed: 12/03/2001
Issued: 10/31/2006
Est. Priority Date: 12/03/2001
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving a frame of content;

automatically detecting a candidate area for a new face region in the frame, wherein detecting the candidate area comprises;

determining whether there is motion at a plurality of pixels on a plurality of lines across the frame;

generating a sum of frame differences for each possible segment of each of the plurality of lines;

selecting, for each of the plurality of lines, the segment having the largest sum;

identifying a smoothest region of the selected segments;

checking whether the smoothest region resembles a human upper body; and

extracting, as the candidate area, a portion of the smoothest region that resembles a human head;

using one or more hierarchical verification levels to verify whether a human face is in the candidate area;

indicating that the candidate area includes a face if the one or more hierarchical verification levels verify that a human face is in the candidate area; and

using a plurality of cues to track each verified face in the content from frame to frame.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Automatic detection and tracking of multiple individuals includes receiving a frame of video and/or audio content and identifying a candidate area for a new face region in the frame. One or more hierarchical verification levels are used to verify whether a human face is in the candidate area, and an indication made that the candidate area includes a face if the one or more hierarchical verification levels verify that a human face is in the candidate area. A plurality of audio and/or video cues are used to track each verified face in the video content from frame to frame.

98 Citations

View as Search Results

19 Claims

1. A method comprising:
- receiving a frame of content;
  
  automatically detecting a candidate area for a new face region in the frame, wherein detecting the candidate area comprises;
  
  determining whether there is motion at a plurality of pixels on a plurality of lines across the frame;
  
  generating a sum of frame differences for each possible segment of each of the plurality of lines;
  
  selecting, for each of the plurality of lines, the segment having the largest sum;
  
  identifying a smoothest region of the selected segments;
  
  checking whether the smoothest region resembles a human upper body; and
  
  extracting, as the candidate area, a portion of the smoothest region that resembles a human head;
  
  using one or more hierarchical verification levels to verify whether a human face is in the candidate area;
  
  indicating that the candidate area includes a face if the one or more hierarchical verification levels verify that a human face is in the candidate area; and
  
  using a plurality of cues to track each verified face in the content from frame to frame.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The method of claim 1, wherein the frame of content comprises a frame of video content.
  - 3. The method of claim 1, wherein the frame of content comprises a frame of audio content.
  - 4. The method of claim 1, wherein the frame of content comprises a frame of both video and audio content.
  - 5. The method of claim 1, further comprising repeating the automatically detecting in the event tracking of a verified face is lost.
  - 6. The method of claim 1, wherein the method further comprises receiving the frame of content from a video capture device local to a system implementing the method.
  - 7. The method of claim 1, wherein the method further comprises receiving the frame of content from a computer readable medium accessible to a system implementing the method.
  - 8. The method of claim 1, wherein automatically detecting the candidate area further comprises:
    - detecting whether there is audio in the frame, and if there is audio in the frame, then performing audio-based initialization to identify one or more candidate areas; and
      
      using, if there is neither motion nor audio in the frame, a fast face detector to identify one or more candidate areas.
  - 9. The method of claim 1, wherein determining whether there is motion comprises:
    - determining, for each of the plurality of pixels, whether a difference between an intensity value of the pixel in the frame and an intensity value of a corresponding pixel in one or more other frames exceeds a threshold value.
  - 10. The method of claim 1, wherein the one or more hierarchical verification levels include a coarse level and a fine level, wherein the coarse level can verify whether the human face is in the candidate area faster but with less accuracy than the fine level.
  - 11. The method of claim 1, wherein using one or more hierarchical verification levels comprises, as one of the levels of verification:
    - generating a color histogram of the candidate area;
      
      generating an estimated color histogram of the candidate area based on previous frames;
      
      determining a similarity value between the color histogram and the estimated color histogram; and
      
      verifying that the candidate area includes a face if the similarity value is greater than a threshold value.
  - 12. The method of claim 1, wherein indicating that the candidate area includes the face comprises recording the candidate area in a tracking list.
  - 13. The method of claim 12, wherein recording the candidate area in the tracking list comprises accessing a record corresponding to the candidate area and resetting a time since last verification of the candidate.
  - 14. The method of claim 1, wherein the one or more hierarchical verification levels include a first level and a second level, and wherein using the one or more hierarchical verification levels to verify whether the human face is in the candidate area comprises:
    - checking whether, using the first level verification, the human face is verified as in the candidate area; and
      
      using the second level verification only if the checking indicates that the human face is not verified as in the candidate area by the first level verification.
  - 15. The method of claim 1, wherein using one or more hierarchical verification levels comprises:
    - using a first verification process to determine whether the human head is in the candidate area; and
      
      if the first verification process verifies that the human head is in the candidate area, then indicating the area includes a face, and otherwise using a second verification process to determine whether the human head is in the area.
  - 16. The method of claim 15, wherein the first verification process is faster but less accurate than the second verification process.
  - 17. The method of claim 1, wherein the plurality of cues include foreground color, background color, edge intensity, motion, and audio.

18. A computer-readable storage medium comprising computer-program instructions that when executed by a processor perform acts of:
- receiving a frame of content;
  
  automatically detecting a candidate area for a new face region in the frame, wherein detecting the candidate area comprises;
  
  determining whether there is motion at a plurality of pixels on a plurality of lines across the frame;
  
  generating a sum of frame differences for each possible segment of each of the plurality of lines;
  
  selecting, for each of the plurality of lines, the segment having the largest sum;
  
  identifying a smoothest region of the selected segments;
  
  checking whether the smoothest region resembles a human upper body; and
  
  extracting, as the candidate area, a portion of the smoothest regionthat resembles a human head;
  
  using one or more hierarchical verification levels to verify whether a human face is in the candidate area;
  
  indicating that the candidate area includes a face if the one or more hierarchical verification levels verify that a human face is in the candidate area; and
  
  using a plurality of cues to track each verified face in the content from frame to frame.

19. A computing device comprising:
- a processor; and
  
  a memory coupled to the processor, the memory comprising computer-program instructions that when executed by the processor perform acts of;
  
  receiving a frame of content;
  
  automatically detecting a candidate area for a new face region in the frame, wherein detecting the candidate area comprises;
  
  determining whether there is motion at a plurality of pixels on a plurality of lines across the frame;
  
  generating a sum of frame differences for each possible segment of each of the plurality of lines;
  
  selecting, for each of the plurality of lines, the segment having the largest sum;
  
  identifying a smoothest region of the selected segments;
  
  checking whether the smoothest region resembles a human upper body; and
  
  extracting, as the candidate area, a portion of the smoothest region that resembles a human head;
  
  using one or more hierarchical verification levels to verify whether a human face is in the candidate area;
  
  indicating that the candidate area includes a face if the one or more hierarchical verification levels verify that a human face is in the candidate area; and
  
  using a plurality of cues to track each verified face in the content from frame to frame.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Zhigu Holdings Limited
Original Assignee
Microsoft Corporation
Inventors
Rui, Yong, Chen, Yunqiang
Primary Examiner(s)
Wu, Jingge
Assistant Examiner(s)
GORADIA, SHEFALI DINESH

Application Number

US10/006,927
Publication Number

US 20030103647A1
Time in Patent Office

1,793 Days
Field of Search

382/103, 382/264, 382/266, 382/115, 382/118, 382/224, 382/240, 348/14.01, 348/14.1, 348/14.08, 348/14.07, 348/14.02, 704/270
US Class Current

382/103
CPC Class Codes

G06T 2207/10016   Video; Image sequence

G06T 2207/30196   Human being; Person

G06T 2207/30201   Face

G06T 7/251   involving models

G06V 40/162   using pixel segmentation or...

Automatic detection and tracking of multiple individuals using multiple cues

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

98 Citations

19 Claims

Specification

Use Cases

Quick Links

Others

Automatic detection and tracking of multiple individuals using multiple cues

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

98 Citations

19 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others