Rendered audiovisual communication

US 9,479,736 B1
Filed: 07/27/2015
Issued: 10/25/2016
Est. Priority Date: 03/12/2013
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

capturing, using a first computing device, image data including a representation of a face during a period of time;

capturing, using the first computing device, audio data during the period of time;

obtaining a model corresponding to the face, the model associating a first state to first facial image data that represents a first facial expression and associating a phoneme to second facial image data that represents a second facial expression;

determining that a portion of the image data, captured within a first period of time, corresponds to the first state;

determining that a portion of the audio data, captured within a second period of time, corresponds to the phoneme;

generating rendered image data based at least in part on the first facial image data, the second facial image data, and the model; and

sending the rendered image data to a second computing device.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and approaches are provided to allow for rendered audiovisual communication. An electronic device can be used to capture image information relating to physical features of a user. A model can be generated from the image information, and the model may be used to render audiovisual communication information from image and audio captured in real time. The rendered audiovisual communication data can simulate live video conferencing with substantial performance gains over conventional approaches to video conferencing. When the image capturing component of the electronic device is capable of depth imaging, stereo imaging, or other imaging techniques, the rendered audiovisual communication can be further enhanced with 3-D rendering of the user. Other aspects of audiovisual data, such as speech, background, and lighting conditions can also be rendered or synthesized to improve audiovisual communication.

Citations

17 Claims

1. A computer-implemented method, comprising:
- capturing, using a first computing device, image data including a representation of a face during a period of time;
  
  capturing, using the first computing device, audio data during the period of time;
  
  obtaining a model corresponding to the face, the model associating a first state to first facial image data that represents a first facial expression and associating a phoneme to second facial image data that represents a second facial expression;
  
  determining that a portion of the image data, captured within a first period of time, corresponds to the first state;
  
  determining that a portion of the audio data, captured within a second period of time, corresponds to the phoneme;
  
  generating rendered image data based at least in part on the first facial image data, the second facial image data, and the model; and
  
  sending the rendered image data to a second computing device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The computer-implemented method of claim 1, further comprising:
    - capturing the first facial image data;
      
      storing, for the model, first data for associating the first state to the first facial image data;
      
      capturing the second facial image data andstoring, for the model, second data for associating the phoneme to the second facial image data.
  - 3. The computer-implemented method of claim 2, wherein the first state represents a neutral user state, and the method further comprises:
    - generating first blended image data by blending the first facial image data and the second facial image data over a duration,wherein the rendered image data is generated further based at least in part on the first blended image data.
  - 4. The computer-implemented method of claim 1, wherein the model is a previously generated model.
  - 5. The computer-implemented method of claim 1, further comprising:
    - capturing, using the first computing device, background image data,wherein the rendered image data is generated further based at least in part on the background image data.
  - 6. The computer-implemented method of claim 1, wherein the audio data includes first speech data corresponding to first speech of a first language, and the method further comprises:
    - generating second speech data corresponding to a translation of the first speech of the first language to second speech of a second language; and
      
      sending the second speech data to the second computing device within a duration the rendered image data is sent to the second computing device.
  - 7. The computer-implemented method of claim 1, wherein the audio data includes speech data corresponding to speech, and the method further comprises:
    - generating text data corresponding to a conversion of the speech to text; and
      
      sending the text data to the second computing device within a duration the rendered image data is sent to the second computing device.
  - 8. The computer-implemented method of claim 1, further comprising:
    - capturing, using the second computing device, second image data including a second representation of a second face during a second period of time;
      
      capturing, using the second computing device, second audio data during the second period of time;
      
      obtaining a second model corresponding to the second face;
      
      generating second rendered image data based at least in part on the second model and at least one of the second image data or the second audio data; and
      
      sending the second rendered image data to the first computing device.
  - 9. The computer-implemented method of claim 1, further comprising:
    - determining a display capability of the second computing device,wherein the rendered image data corresponds to a format that is supported by the display capability of the second computing device.

10. A computing device, comprising:
- a processor;
  
  a camera; and
  
  a microphone; and
  
  memory including instructions that, upon being executed by the processor, cause the computing device to;
  
  capture, using the camera, image data including a representation of a face during a period of time;
  
  capture, using the microphone, audio data during the period of time;
  
  cause a model corresponding to the face to be obtained, the model associating a first state to first facial image data that represents a first facial expression and associating a phoneme to second facial image data that represents a second facial expression;
  
  determine that a portion of the image data, captured within a first period of time, corresponds to the first state;
  
  determine that a portion of the audio data, captured within a second period of time, corresponds to the phoneme;
  
  cause rendered image data to be generated based at least in part on the first facial image data, the second facial image data, and the model; and
  
  cause the rendered image data to be sent to a second computing device.
- View Dependent Claims (11, 12, 13)
- - 11. The computing device of claim 10, wherein the instructions upon being executed further cause the computing device to:
    - capture the first facial image data;
      
      store, for the model, first data for associating the first state to the first facial image data;
      
      capture the second facial image data; and
      
      store, for the model, second data for associating the phoneme to the second facial image data.
  - 12. The computing device of claim 10, wherein the instructions upon being executed further cause the computing device to:
    - send information corresponding to the image data and the audio data to a remote server,wherein the model is obtained by the remote server, the rendered image data is generated by the remote server, and the rendered image data is sent to the second computing device by the remote server.
  - 13. The computing device of claim 10, wherein the model is obtained by the computing device, the rendered image data is generated by the computing device, and the rendered image data is sent to the second computing device by the computing device.

14. One or more non-transitory computer-readable storage mediums storing instructions that, upon being executed by one or more processors, cause the one or more processors to:
- capture, using a first computing device, image data including a representation of a face during a first period of time;
  
  capture, using the first computing device, first speech data corresponding to first speech of a first language during the first period of time;
  
  obtain a model corresponding to the face based at least in part on at least one of the image data or the first speech data;
  
  generate second speech data corresponding to a translation of the first speech of the first language to second speech of a second language;
  
  generate rendered image data corresponding to the face based at least in part on the model and at least one of the image data or the second speech data;
  
  send the rendered image data to a second computing device during a second period of time; and
  
  send the second speech data to the second computing device during the second period of time.
- View Dependent Claims (15, 16, 17)
- - 15. The one or more non-transitory computer-readable storage mediums of claim 14, wherein the instructions further cause the one or more processors to:
    - capture, using the first computing device, background image data,wherein the rendered image data is generated further based at least in part on the background image data.
  - 16. The one or more non-transitory computer-readable storage mediums of claim 14, wherein the instructions upon being executed further cause the one or more processors to:
    - generate text data corresponding to a conversion of at least one of the first speech or the second speech to text; and
      
      send the text data to the second computing device during the second period of time.
  - 17. The one or more non-transitory computer-readable storage mediums of claim 14, wherein the instructions upon being executed further cause the one or more processors to:
    - determine a display capability of the second computing device,wherein the rendered image data corresponds to a format that is supported by the display capability of the second computing device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Karakotsios, Kenneth Mark
Primary Examiner(s)
TIEU, BINH KIEN

Application Number

US14/810,336
Time in Patent Office

456 Days
Field of Search

348/14.01, 348/14.02, 348/14.07, 348/14.08, 348/14.1, 348/14.12, 348/14.14, 348/14.15, 704/270, 704/276, 704/256, 704/235, 704/258
US Class Current

1/1
CPC Class Codes

G06V 40/174   Facial expression recognition

G10L 15/02   Feature extraction for spee...

G10L 15/26   Speech to text systems G10L...

G10L 2015/025   Phonemes, fenemes or fenone...

H04N 5/265   Mixing

H04N 7/157   defining a virtual conferen...

Rendered audiovisual communication

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Rendered audiovisual communication

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links