Method and system for facial recognition for a videoconference

US 9,282,284 B2
Filed: 05/20/2013
Issued: 03/08/2016
Est. Priority Date: 05/20/2013
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

identifying a participant from audio information, wherein identifying the participant from the audio information comprises;

performing a feature extraction and a speaker segmentation on the audio information to determine a voice model, andcomparing the determined voice model from the audio information with a plurality of voice models stored in a database to identify the participant;

identifying the participant in video information, wherein identifying the participant in the video information comprises;

identifying a plurality of facial images in the video information;

determining a one of the plurality of facial images in the video information as having the most movement as compared to others of the plurality of facial images; and

identifying the participant as the determined one of the plurality of facial images;

capturing, from the video information, a plurality of images of the participant identified in the video information wherein ones of the plurality of captured images of the participant include respective different expressions of a face of the participant and wherein other ones of the plurality of captured images of the participant include respective different illumination conditions of the face of the participant;

associating a unique identifier with the captured plurality of images, the unique identifier corresponding to the participant identified from the audio information; and

saving the captured plurality of images and the associated unique identifier in the database.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Videoconferencing may be provided. A participant may be identified from audio information and in video information. From the video information, a plurality of images may be captured of the participant identified in the video information. A unique identifier may be associated with the captured plurality of images. The unique identifier may correspond to the participant identified from the audio information. The captured plurality of images and the associated unique identifier may be saved in a database.

Citations

23 Claims

1. A method comprising:
- identifying a participant from audio information, wherein identifying the participant from the audio information comprises;
  
  performing a feature extraction and a speaker segmentation on the audio information to determine a voice model, andcomparing the determined voice model from the audio information with a plurality of voice models stored in a database to identify the participant;
  
  identifying the participant in video information, wherein identifying the participant in the video information comprises;
  
  identifying a plurality of facial images in the video information;
  
  determining a one of the plurality of facial images in the video information as having the most movement as compared to others of the plurality of facial images; and
  
  identifying the participant as the determined one of the plurality of facial images;
  
  capturing, from the video information, a plurality of images of the participant identified in the video information wherein ones of the plurality of captured images of the participant include respective different expressions of a face of the participant and wherein other ones of the plurality of captured images of the participant include respective different illumination conditions of the face of the participant;
  
  associating a unique identifier with the captured plurality of images, the unique identifier corresponding to the participant identified from the audio information; and
  
  saving the captured plurality of images and the associated unique identifier in the database.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method of claim 1, wherein identifying the participant from the audio information comprises providing the unique identifier corresponding to a stored voice model when the determined voice model from the audio information matches the stored voice model.
  - 3. The method of claim 1, wherein identifying the participant from the audio information comprises providing the unique identifier corresponding to a stored voice model when the determined voice model from the audio information matches the stored voice model, the unique identifier comprising a name of the identified participant.
  - 4. The method of claim 1, wherein identifying the participant from the audio information comprises performing a speaker segmentation and recognition (SSR) algorithm on the audio information to identify the participant from the audio information.
  - 5. The method of claim 1, wherein identifying the participant in video information comprises performing an active speaker detection (ASD) algorithm on the video information to identify the participant in the video information.
  - 6. The method of claim 1, wherein capturing the plurality of images comprises capturing the plurality of images wherein ones of the plurality of images are at respective different angles with respect to a position of a face of the participant.
  - 7. The method of claim 1, wherein capturing the plurality of images comprises capturing the plurality of images wherein ones of the plurality of images include respective different poses of a face of the participant.
  - 8. The method of claim 1, wherein capturing the plurality of images comprises capturing the plurality of images wherein ones of the plurality of images include respective different expressions of a face of the participant.
  - 9. The method of claim 1, wherein capturing the plurality of images comprises capturing the plurality of images wherein ones of the plurality of images include respective different illumination conditions of a face of the participant.
  - 10. The method of claim 1, wherein associating the unique identifier with the captured plurality of images comprises associating the unique identifier comprising a name of the participant.
  - 11. The method of claim 1, further comprising receiving the audio information.
  - 12. The method of claim 11, wherein receiving the audio information comprises receiving the audio information from a videoconference.
  - 13. The method of claim 1, further comprising receiving the video information.
  - 14. The method of claim 13, wherein receiving the video information comprises receiving the video information from a videoconference.
  - 15. The method of claim 1, further comprising:
    - receiving the audio information from a teleconference; and
      
      receiving the video information from the teleconference.
  - 16. The method of claim 1, further comprising removing a one of the captured plurality of images from the captured plurality of images when a combined confidence level of the identified participant from the audio information and the video information associated with the one of the captured plurality of images is below a predefined threshold.

17. An apparatus comprising:
- a memory storage; and
  
  a processing unit coupled to the memory storage, wherein the processing unit is operative to;
  
  identify a participant from video information in a teleconference, wherein the processing unit being operative to identify the participant in the video information comprises the processing unit being operative to;
  
  identify a plurality of facial images in the video information,determine a one of the plurality of facial images in the video information as having the most movement as compared to others of the plurality of facial images, andidentify the participant as the determined one of the plurality of facial images;
  
  capture, from the video information in the teleconference, a plurality of images of the participant identified in the video information wherein ones of the plurality of captured images of the participant include respective different expressions of a face of the participant and wherein other ones of the plurality of captured images of the participant include respective different illumination conditions of the face of the participant;
  
  associate a unique identifier with the captured plurality of images, the unique identifier corresponding to the participant identified from audio information in the teleconference, wherein the processing unit being operative to associate the unique identifier comprises the processing unit being operative to;
  
  perform a feature extraction and a speaker segmentation on the audio information to determine a voice model, andcompare the determined voice model from the audio information with a plurality of voice models stored in a database to identify the participant;
  
  receive participant information corresponding to the unique identifier; and
  
  save the captured plurality of images and the associated participant information in a database.
- View Dependent Claims (18, 19, 20)
- - 18. The apparatus of claim 17, wherein the processing unit being operative to receive the participant information comprises the processing unit being operative to receive the participant information comprising at least one of the following:
    - a name of the participant;
      
      a phone number of the participant;
      
      an email address of the participant;
      
      a business address of the participant;
      
      a job title of the participant; and
      
      an employer of the participant.
  - 19. The apparatus of claim 17, wherein the processing unit being operative to capture the plurality of images comprises the processing unit being operative to capture the plurality of images wherein ones of the plurality of images are at respective different angles with respect to a position of the face of the participant.
  - 20. The apparatus of claim 17, wherein the processing unit being operative to capture the plurality of images comprises the processing unit being operative to capture the plurality of images wherein ones of the plurality of images include respective different poses of the face of the participant.

21. A non-transitory computer-readable medium that stores a set of instructions which when executed perform a method comprising:
- identifying a participant in video information, wherein identifying the participant in the video information comprises;
  
  identifying a plurality of facial images in the video information;
  
  determining a one of the plurality of facial images in the video information as having the most movement as compared to others of the plurality of facial images; and
  
  identifying the participant as the determined one of the plurality of facial images;
  
  capturing, from the video information, a plurality of images of the participant identified in the video information wherein ones of the plurality of captured images of the participant include respective different expressions of a face of the participant and wherein other ones of the plurality of captured images of the participant include respective different illumination conditions of the face of the participant;
  
  associating a unique identifier with the captured plurality of images, the unique identifier corresponding to the participant identified from audio information, wherein associating the unique identifier with the captured plurality of images comprises;
  
  performing a feature extraction and a speaker segmentation on the audio information to determine a voice model, andcomparing the determined voice model from the audio information with a plurality of voice models stored in a database to identify the participant; and
  
  saving the captured plurality of images and the associated unique identifier in the database.
- View Dependent Claims (22, 23)
- - 22. The non-transitory computer-readable medium of claim 21, further comprising removing a one of the captured plurality of images from the captured plurality of images when a combined confidence level of the identified participant from the audio information and the video information associated with the one of the captured plurality of images is below a predefined threshold.
  - 23. The non-transitory computer-readable medium of claim 21, further comprising using the database to identify the participant.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Original Assignee
Cisco Technology, Inc. (Cisco Systems, Inc.)
Inventors
Kajarekar, Sachin S., Sen, Mainak
Primary Examiner(s)
Nguyen, Joseph J

Application Number

US13/897,476
Publication Number

US 20140340467A1
Time in Patent Office

1,023 Days
Field of Search

348/14.08
US Class Current

1/1
CPC Class Codes

G06V 40/172 Classification, e.g. identi...

H04N 7/15 Conference systems

Method and system for facial recognition for a videoconference

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for facial recognition for a videoconference

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links