VOICE EFFECTS BASED ON FACIAL EXPRESSIONS

US 20180336716A1
Filed: 02/28/2018
Published: 11/22/2018
Est. Priority Date: 05/16/2017
Status: Abandoned Application

First Claim

Patent Images

1. A method, comprising:

at an electronic device having at least a camera and a microphone;

displaying a virtual avatar generation interface;

displaying first preview content of a virtual avatar in the virtual avatar generation interface, the first preview content of the virtual avatar corresponding to realtime preview video frames of a user headshot in a field of view of the camera and associated headshot changes in an appearance;

while displaying the first preview content of the virtual avatar, detecting an input in the virtual avatar generation interface;

in response to detecting the input in the virtual avatar generation interface;

capturing, via the camera, a video signal associated with the user headshot during a recording session;

capturing, via the microphone, a user audio signal during the recording session;

extracting audio feature characteristics from the captured user audio signal; and

extracting facial feature characteristics associated with the face from the captured video signal; and

in response to detecting expiration of the recording session;

generating an adjusted audio signal from the captured audio signal based at least in part on the facial feature characteristics and the audio feature characteristics;

generating second preview content of the virtual avatar in the virtual avatar generation interface according to the facial feature characteristics and the adjusted audio signal; and

presenting the second preview content in the virtual avatar generation interface.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments of the present disclosure can provide systems, methods, and computer-readable medium for adjusting audio and/or video information of a video clip based at least in part on facial feature and/or voice feature characteristics extracted from hardware components. For example, in response to detecting a request to generate an avatar video clip of a virtual avatar, a video signal associated with a face in a field of view of a camera and an audio signal may be captured. Voice feature characteristics and facial feature characteristics may be extracted from the audio signal and the video signal, respectively. In some examples, in response to detecting a request to preview the avatar video clip, an adjusted audio signal may be generated based at least in part on the facial feature characteristics and the voice feature characteristics, and a preview of the video clip of the virtual avatar using the adjusted audio signal may be displayed.

61 Citations

View as Search Results

20 Claims

1. A method, comprising:
- at an electronic device having at least a camera and a microphone;
  
  displaying a virtual avatar generation interface;
  
  displaying first preview content of a virtual avatar in the virtual avatar generation interface, the first preview content of the virtual avatar corresponding to realtime preview video frames of a user headshot in a field of view of the camera and associated headshot changes in an appearance;
  
  while displaying the first preview content of the virtual avatar, detecting an input in the virtual avatar generation interface;
  
  in response to detecting the input in the virtual avatar generation interface;
  
  capturing, via the camera, a video signal associated with the user headshot during a recording session;
  
  capturing, via the microphone, a user audio signal during the recording session;
  
  extracting audio feature characteristics from the captured user audio signal; and
  
  extracting facial feature characteristics associated with the face from the captured video signal; and
  
  in response to detecting expiration of the recording session;
  
  generating an adjusted audio signal from the captured audio signal based at least in part on the facial feature characteristics and the audio feature characteristics;
  
  generating second preview content of the virtual avatar in the virtual avatar generation interface according to the facial feature characteristics and the adjusted audio signal; and
  
  presenting the second preview content in the virtual avatar generation interface.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1, further comprising storing facial feature metadata associated with the facial feature characteristics extracted from the video signal and strong audio metadata associated with the audio feature characteristics extracted from the audio signal.
  - 3. The method of claim 2, further comprising generating adjusted facial feature metadata from the facial feature metadata based at least in part on the facial feature characteristics and the audio feature characteristics.
  - 4. The method of claim 3, wherein the second preview of the virtual avatar is displayed further according to the adjusted facial metadata.

5. An electronic device, comprising:
- a camera;
  
  a microphone; and
  
  one or more processors in communication with the camera and the microphone, the one or more processors configured to;
  
  while displaying a first preview of a virtual avatar, detecting an input in a virtual avatar generation interface;
  
  in response to detecting the input in the virtual avatar generation interface, initiating a capture session including;
  
  capturing, via the camera, a video signal associated with a face in a field of view of the camera;
  
  capturing, via the microphone, an audio signal associated with the captured video signal;
  
  extracting audio feature characteristics from the captured audio signal; and
  
  extracting facial feature characteristics associated with the face from the captured video signal; and
  
  in response to detecting expiration of the capture session;
  
  generating an adjusted audio signal based at least in part on the audio feature characteristics and the facial feature characteristics; and
  
  displaying a second preview of the virtual avatar in the virtual avatar generation interface according to the facial feature characteristics and the adjusted audio signal.
- View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 6. The electronic device of claim 5, wherein the audio signal is further adjusted based at least in part on a type of the virtual avatar.
  - 7. The electronic device of claim 6, wherein the type of the virtual avatar is received based at least in part on an avatar type selection affordance presented in the virtual avatar generation interface.
  - 8. The electronic device of claim 6, wherein the type of the virtual avatar includes an animal type, and wherein the adjusted audio signal is generated based at least in part on a predetermined sound associated with the animal type.
  - 9. The electronic device of claim 5, wherein the one or more processors are further configured to determine whether a portion of the audio signal corresponds to the face in the field of view.
  - 10. The electronic device of claim 9, wherein the one or more processors are further configured to, in accordance with a determination that the portion of the audio signal corresponds to the face, store the portion of the audio signal for use in generating the adjusted audio signal.
  - 11. The electronic device of claim 9, wherein the one or more processors are further configured to, in accordance with a determination that the portion of the audio signal does not correspond to the face, discard at least the portion of the audio signal.
  - 12. The electronic device of claim 5, wherein the audio feature characteristics comprise features of a voice associated with the face in the field of view.
  - 13. The electronic device of claim 5, wherein the one or more processors are further configured to store facial feature metadata associated with the facial feature characteristics extracted from the video signal.
  - 14. The electronic device of claim 13, wherein the one or more processors are further configured to generate adjusted facial metadata based at least in part on the facial feature characteristics and the audio feature characteristics.
  - 15. The electronic device of claim 14, wherein the second preview of the virtual avatar is generated according to the adjusted facial metadata and the adjusted audio signal.

16. A computer-readable storage medium storing computer-executable instructions that, when executed by one or more processors, configure the one or more processors to perform operations comprising:
- in response to detecting a request to generate an avatar video clip of a virtual avatar;
  
  capturing, via a camera of an electronic device, a video signal associated with a face in a field of view of the camera;
  
  capturing, via a microphone of the electronic device, an audio signal;
  
  extracting voice feature characteristics from the captured audio signal; and
  
  extracting facial feature characteristics associated with the face from the captured video signal; and
  
  in response to detecting a request to preview the avatar video clip;
  
  generating an adjusted audio signal based at least in part on the facial feature characteristics and the voice feature characteristics; and
  
  displaying a preview of the video clip of the virtual avatar using the adjusted audio signal.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computer-readable storage medium of claim 16, wherein the audio signal is adjusted based at least in part on a facial expression identified in the facial feature characteristics associated with the face.
  - 18. The computer-readable storage medium of claim 16, wherein the adjusted audio signal is further adjusted by inserting one or more pre-stored audio samples.
  - 19. The computer-readable storage medium of claim 16, wherein the audio signal is adjusted based at least in part on a level, pitch, duration, variable playback speed, speech spectral-format positions, speech spectral-format-levels, instantaneous playback speed, or change in a voice associated with the face.
  - 20. The computer-readable storage medium of claim 16, wherein the one or more processors are further configured to perform the operations comprising transmitting the video clip of the virtual avatar to another electronic device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Apple Inc.
Original Assignee
Apple Inc.
Inventors
Ramprashad, Sean A., Avendano, Carlos M., Lindahl, Aram M.

Application Number

US15/908,603
Publication Number

US 20180336716A1
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 3/012   Head tracking input arrange...

G06F 3/0304   Detection arrangements usin...

G06F 3/0484   for the control of specific...

G06F 3/04842   Selection of displayed obje...

G06F 3/04886   by partitioning the display...

G06T 13/40   of characters, e.g. humans,...

G06V 20/20   in augmented reality scenes

G06V 40/175   Static expression

G06V 40/176   Dynamic expression

H04L 51/04   Real-time or near real-time...

H04L 51/08   Annexed information, e.g. a...

H04L 51/10   Multimedia information

H04L 51/52   for supporting social netwo...

H04L 51/58   Message adaptation for wire...

H04M 1/72436   for text messaging, e.g. sh...

H04M 1/72439   for image or video messaging

H04M 2250/52   including functional featur...

H04N 23/611   where the recognised object...

H04N 23/63   by using electronic viewfin...

H04W 4/12   Messaging; Mailboxes; Annou...

VOICE EFFECTS BASED ON FACIAL EXPRESSIONS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

61 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

VOICE EFFECTS BASED ON FACIAL EXPRESSIONS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

61 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links