Rendered audiovisual communication
First Claim
1. A computer-implemented method, comprising:
- capturing, using a first computing device, image data including a representation of a face during a period of time;
capturing, using the first computing device, audio data during the period of time;
obtaining a model corresponding to the face, the model associating a first state to first facial image data that represents a first facial expression and associating a phoneme to second facial image data that represents a second facial expression;
determining that a portion of the image data, captured within a first period of time, corresponds to the first state;
determining that a portion of the audio data, captured within a second period of time, corresponds to the phoneme;
generating rendered image data based at least in part on the first facial image data, the second facial image data, and the model; and
sending the rendered image data to a second computing device.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and approaches are provided to allow for rendered audiovisual communication. An electronic device can be used to capture image information relating to physical features of a user. A model can be generated from the image information, and the model may be used to render audiovisual communication information from image and audio captured in real time. The rendered audiovisual communication data can simulate live video conferencing with substantial performance gains over conventional approaches to video conferencing. When the image capturing component of the electronic device is capable of depth imaging, stereo imaging, or other imaging techniques, the rendered audiovisual communication can be further enhanced with 3-D rendering of the user. Other aspects of audiovisual data, such as speech, background, and lighting conditions can also be rendered or synthesized to improve audiovisual communication.
-
Citations
17 Claims
-
1. A computer-implemented method, comprising:
-
capturing, using a first computing device, image data including a representation of a face during a period of time; capturing, using the first computing device, audio data during the period of time; obtaining a model corresponding to the face, the model associating a first state to first facial image data that represents a first facial expression and associating a phoneme to second facial image data that represents a second facial expression; determining that a portion of the image data, captured within a first period of time, corresponds to the first state; determining that a portion of the audio data, captured within a second period of time, corresponds to the phoneme; generating rendered image data based at least in part on the first facial image data, the second facial image data, and the model; and sending the rendered image data to a second computing device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computing device, comprising:
-
a processor; a camera; and a microphone; and memory including instructions that, upon being executed by the processor, cause the computing device to; capture, using the camera, image data including a representation of a face during a period of time; capture, using the microphone, audio data during the period of time; cause a model corresponding to the face to be obtained, the model associating a first state to first facial image data that represents a first facial expression and associating a phoneme to second facial image data that represents a second facial expression; determine that a portion of the image data, captured within a first period of time, corresponds to the first state; determine that a portion of the audio data, captured within a second period of time, corresponds to the phoneme; cause rendered image data to be generated based at least in part on the first facial image data, the second facial image data, and the model; and cause the rendered image data to be sent to a second computing device. - View Dependent Claims (11, 12, 13)
-
-
14. One or more non-transitory computer-readable storage mediums storing instructions that, upon being executed by one or more processors, cause the one or more processors to:
-
capture, using a first computing device, image data including a representation of a face during a first period of time; capture, using the first computing device, first speech data corresponding to first speech of a first language during the first period of time; obtain a model corresponding to the face based at least in part on at least one of the image data or the first speech data; generate second speech data corresponding to a translation of the first speech of the first language to second speech of a second language; generate rendered image data corresponding to the face based at least in part on the model and at least one of the image data or the second speech data; send the rendered image data to a second computing device during a second period of time; and send the second speech data to the second computing device during the second period of time. - View Dependent Claims (15, 16, 17)
-
Specification