Video-teleconferencing system with eye-gaze correction

US 20030197779A1
Filed: 04/23/2002
Published: 10/23/2003
Est. Priority Date: 04/23/2002
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

concurrently, capturing first and second video images representative of a first conferee taken from different views;

tracking a head position of the first conferee from the first and second video images;

ascertaining features and contours from the first video image that match features and contours from the second video image; and

synthesizing the head position as well as the features and contours from the first and second video images that match to generate a virtual image video stream of the first conferee that makes the first conferee appear to be making eye contact with a second conferee who is watching the virtual image video stream.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Correcting for eye-gaze in video communication devices is accomplished by blending information captured from a stereoscopic view of the conferee and generating a virtual image of the conferee. A personalized face model of the conferee is captured to track head position of the conferee. First and second video images representative of a first conferee taken from different views are concurrently captured. A head position of the first conferee is tracked from the first and second video images. Matching features and contours from the first and second video images are ascertained. The head position as well as the matching features and contours from the first and second video images are synthesized to generate a virtual image video stream of the first conferee that makes the first conferee appear to be making eye contact with a second conferee who is watching the virtual image video stream.

75 Citations

41 Claims

1. A method, comprising:
- concurrently, capturing first and second video images representative of a first conferee taken from different views;
  
  tracking a head position of the first conferee from the first and second video images;
  
  ascertaining features and contours from the first video image that match features and contours from the second video image; and
  
  synthesizing the head position as well as the features and contours from the first and second video images that match to generate a virtual image video stream of the first conferee that makes the first conferee appear to be making eye contact with a second conferee who is watching the virtual image video stream.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method as recited in claim 1, further comprising storing a personalized face model of the first conferee.
  - 3. The method as recited in claim 1, further comprising storing a personalized face model of the first conferee and evaluating the first and second video images with respect to the personalized face model of the first conferee to monitor feature points from the first and second video images to track the head position.
  - 4. The method as recited in claim 1, wherein ascertaining features from the first video image that matches features from the second video image comprises rectifying the first and second video images and locating features from the first and second video images that reside on epipolar lines that match.
  - 5. The method as recited in claim 1, wherein ascertaining contours from the first video image that matches contours from the second video image comprises rectifying the first and second video images and locating contours from the first and second video images that reside on epipolar lines that match.
  - 6. The method as recited in claim 1, wherein synthesizing the head position as well as the features and contours from the first and second video images comprises morphing the head position as well as the features and contours from the first and second images to generate the virtual image video stream of the first conferee.
  - 7. The method as recited in claim 1, wherein synthesizing the head position as well as the features and contours from the first and second video images comprises blending multi-texture features associated with the head position as well as the features and contours from the first and second images to generate the virtual image video stream of the first conferee.
  - 8. One or more computer-readable media comprising computer-executable instructions that, when executed, perform the method as recited in claim 1.

9. A method, comprising:
- storing a personalized face model of a first conferee;
  
  concurrently, capturing first and second video images representative of the first conferee taken from different views;
  
  evaluating the first and second video images with respect to the personalized face model of the first conferee to ascertain three dimensional information; and
  
  synthesizing the three dimensional information to generate a virtual image video stream of the first conferee that makes the first conferee appear to be making eye contact with a second conferee who is watching the virtual image video stream.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The method as recited in claim 9, wherein evaluating the first and second video images with respect to the personalized face model of the first conferee further comprises tracking feature points from the first and second video images to monitor a head position of the first conferee.
  - 11. The method as recited in claim 9, further comprising ascertaining features and contours from the first video image that match features and contours from the second video image.
  - 12. The method as recited in claim 9, further comprising ascertaining features and contours from the first video image that match features and contours from the second video image and synthesizing the three dimensional information with features and contours to generate the virtual image video stream of the first conferee.
  - 13. The method as recited in claim 9, further comprising ascertaining features and contours from the first video image that match features and contours from the second video image and synthesizing the three dimensional information with features and contours to generate the virtual image video stream of the first conferee, and wherein synthesizing the three dimensional information as well as the features and contours from the first and second video images comprises morphing the head position as well as the features and contours from the first and second images to generate the virtual image video stream of the first conferee.
  - 14. The method as recited in claim 9, further comprising ascertaining features and contours from the first video image that match features and contours from the second video image and synthesizing the three dimensional information with features and contours to generate the virtual image video stream of the first conferee, and wherein synthesizing the three dimensional information as well as the features and contours from the first and second video images comprises blending multi-texture features associated with the head position as well as the features and contours from the first and second images to generate the virtual image video stream of the first conferee.
  - 15. One or more computer-readable media comprising computer-executable instructions that, when executed, perform the method as recited in claim 9.

16. A system, comprising:
- means for concurrently capturing first and second video images representative of a first conferee taken from different views;
  
  means for tracking a head position of the first conferee from the first and second video images;
  
  means for ascertaining features and contours from the first video image that match features and contours from the second video image; and
  
  means for synthesizing the head position as well as the features and contours from the first and second video images that match to generate a virtual image video stream of the first conferee that makes the first conferee appear to be making eye contact with a second conferee who is watching the virtual image video stream.
- View Dependent Claims (17, 18, 19, 20, 21, 22)
- - 17. The system as recited in claim 16, further comprising means for storing a personalized face model of the first conferee in a memory device.
  - 18. The system as recited in claim 16, further comprising means for storing a personalized face model of the first conferee in a memory device and means for evaluating the first and second video images with respect to the personalized face model of the first conferee to monitor feature points from the first and second video images to track the head position.
  - 19. The system as recited in claim 16, wherein the means for ascertaining features from the first video image that matches features from the second video image comprises means for rectifying the first and second video images and means for locating features from the first and second video images that reside on epipolar lines that match.
  - 20. The system as recited in claim 16, wherein the means for ascertaining contours from the first video image that matches contours from the second video image comprises means for rectifying the first and second video images and means for locating contours from the first and second video images that reside on epipolar lines that match.
  - 21. The system as recited in claim 16, wherein the means for synthesizing the head position as well as the features and contours from the first and second video images comprises means for morphing the head position as well as the features and contours from the first and second images to generate the virtual image video stream of the first conferee.
  - 22. The system as recited in claim 16, wherein the means for synthesizing the head position as well as the features and contours from the first and second video images comprises means for blending multi-texture features associated with the head position as well as the features and contours from the first and second images to generate the virtual image video stream of the first conferee.

23. A video-teleconferencing system, comprising:
- a head pose tracking module, configured to receive first and second video images representative of a first conferee concurrently taken from different views and track head position of the first conferee;
  
  a stereo module, configured to receive the first and second video images representative of the first conferee concurrently taken from different views and match non-rigid parts observed from the first and second video images; and
  
  a view synthesis module, configured to synthesize the head position as well as the matching non-rigid parts from the first and second video images to generate a virtual image video stream of the first conferee that makes the first conferee appear to be making eye contact with a second conferee who is watching the virtual image video stream.
- View Dependent Claims (24, 25, 26, 27, 28)
- - 24. The system as recited in claim 23, wherein the stereo module is further configured to match features of the first conferee observed from the first and second video images.
  - 25. The system as recited in claim 23, wherein the stereo module is further configured to rectify the first and second video images.
  - 26. The system as recited in claim 23, wherein the stereo module is further configured to perform contour matching of the of the first and second video images.
  - 27. The system as recited in claim 23, wherein the view system module is configured to synthesize the head position as well non-rigid parts from the first and second video images by morphing the head position as well as the non-rigid parts form the first and second images to generate the virtual image video stream of the first conferee.
  - 28. The system as recited in claim 23, wherein the view system module is configured to synthesize the head position as well non-rigid parts from the first and second video images by blending multi-texture features associated with the head position as well as the non-rigid parts form the first and second images to generate the virtual image video stream of the first conferee.

29. One or more computer-readable media having stored thereon computer executable instructions that, when executed by one or more processors, causes the one or more processors of a computer system to:
- concurrently, capture first and second video images representative of a first conferee taken from different views;
  
  track a head position of the first conferee from the first and second video images;
  
  ascertain features and contours from the first video image that match features and contours from the second video image; and
  
  synthesize the head position as well as the features and contours from the first and second video images that match to generate a virtual image video stream of the first conferee that makes the first conferee appear to be making eye contact with a second conferee who is watching the virtual image video stream.
- View Dependent Claims (30, 31, 32, 33, 34, 35)
- - 30. One or more computer-readable media as recited in claim 29, further comprising computer executable instructions that, when executed, direct the computer system to store a personalized face model of the first conferee.
  - 31. One or more computer-readable media as recited in claim 29, further comprising computer executable instructions that, when executed, direct the computer system to store a personalized face model of the first conferee and evaluate the first and second video images with respect to the personalized face model of the first conferee to monitor feature points from the first and second video images to track the head position.
  - 32. One or more computer-readable media as recited in claim 29, further comprising computer executable instructions that, when executed, direct the computer system to, rectify the first and second video images and locate features from the first and second video images that reside on epipolar lines that match when ascertaining features from the first video image that matches features from the second video image.
  - 33. One or more computer-readable media as recited in claim 29, further comprising computer executable instructions that, when executed, direct the computer system to, rectify the first and second video images and locate contours from the first and second video images that reside on epipolar lines that match when ascertaining contours from the first video image that matches contours from the second video image.
  - 34. One or more computer-readable media as recited in claim 29, further comprising computer executable instructions that, when executed, direct the computer system to morph the head position as well as the features and contours from the first and second images to generate the virtual image video stream of the first conferee when synthesizing the head position as well as the features and contours from the first and second video images.
  - 35. One or more computer-readable media as recited in claim 29, further comprising computer executable instructions that, when executed, direct the computer system to blend multi-texture features associated with the head position as well as the features and contours from the first and second images to generate the virtual image video stream of the first conferee, when synthesizing the head position as well as the features and contours from the first and second video images.

36. One or more computer-readable media having stored thereon computer executable instructions that, when executed by one or more processors, causes the one or more processors of a computer system to:
- store a personalized face model of a first conferee;
  
  concurrently, capture first and second video images representative of the first conferee taken from different views;
  
  evaluate the first and second video images with respect to the personalized face model of the first conferee to ascertain three dimensional information; and
  
  synthesize the three dimensional information to generate a virtual image video stream of the first conferee that makes the first conferee appear to be making eye contact with a second conferee who is watching the virtual image video stream.
- View Dependent Claims (37, 38, 39, 40, 41)
- - 37. One or more computer-readable media as recited in claim 36, further comprising computer executable instructions that, when executed, direct the computer system to, track feature points from the first and second video images to monitor a head position of the first conferee when evaluating the first and second video images with respect to the personalized face model of the first conferee.
  - 38. One or more computer-readable media as recited in claim 36, further comprising computer executable instructions that, when executed, direct the computer system to, ascertain features and contours from the first video image that match features and contours from the second video image.
  - 39. One or more computer-readable media as recited in claim 36, further comprising computer executable instructions that, when executed, direct the computer system to ascertain features and contours from the first video image that match features and contours from the second video image and synthesizing the three dimensional information with features and contours to generate the virtual image video stream of the first conferee.
  - 40. One or more computer-readable media as recited in claim 36, further comprising computer executable instructions that, when executed, direct the computer system to ascertain features and contours from the first video image that match features and contours from the second video image and synthesize the three dimensional information with features and contours, and morph the head position as well as the features and contours from the first and second images to generate the virtual image video stream of the first conferee when synthesizing the three dimensional information as well as the features and contours from the first and second video images.
  - 41. One or more computer-readable media as recited in claim 36, further comprising computer executable instructions that, when executed, direct the computer system to, ascertain features and contours from the first video image that match features and contours from the second video image and synthesize the three dimensional information with the features and contours to generate the virtual image video stream of the first conferee.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Zhang, Zhengyou, Yang, Ruigang

Granted Patent

US 6,771,303 B2
Time in Patent Office

Days
Field of Search
US Class Current

348/14.16
CPC Class Codes

H04N 7/144 camera and display on the s...

Video-teleconferencing system with eye-gaze correction

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

75 Citations

41 Claims

Specification

Solutions

Use Cases

Quick Links

Video-teleconferencing system with eye-gaze correction

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

75 Citations

41 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links