Techniques for providing audio and video effects
First Claim
1. A method, comprising:
- at an electronic device having at least a camera and a microphone;
displaying a virtual avatar generation interface;
displaying first preview content of a virtual avatar in the virtual avatar generation interface, the first preview content of the virtual avatar corresponding to real-time preview video frames of a user headshot in a field of view of the camera and associated headshot changes in an appearance;
while displaying the first preview content of the virtual avatar, detecting an input in the virtual avatar generation interface;
in response to detecting the input in the virtual avatar generation interface;
capturing, via the camera, a video signal associated with the user headshot during a recording session;
capturing, via the microphone, a voice audio signal during the recording session; and
in response to detecting expiration of the recording session;
transforming the voice audio signal into a first set of voice audio features, the first set of voice audio features including at least one speech formant of the voice audio signal;
identifying a feature set of a predetermined voice audio signal associated with the virtual avatar;
generating a second set of voice audio features based at least in part on the first set of voice audio features and the feature set of the predetermined voice audio signal associated with the virtual avatar, the second set of voice audio features including a modified version of the at least one speech formant of the voice audio signal; and
composing a modified voice audio signal based at least in part on the second set of voice audio features;
generating second preview content of the virtual avatar in the virtual avatar generation interface according to the video signal and the modified voice audio signal; and
presenting the second preview content in the virtual avatar generation interface.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of the present disclosure can provide systems, methods, and computer-readable medium for providing audio and/or video effects based at least in part on facial features and/or voice feature characteristics of the user. For example, video and/or an audio signal of the user may be recorded by a device. Voice audio features and facial feature characteristics may be extracted from the voice audio signal and the video, respectively. The facial features of the user may be used to modify features of a virtual avatar to emulate the facial feature characteristics of the user. The extracted voice audio features may modified to generate an adjusted audio signal or an audio signal may be composed from the voice audio features. The adjusted/composed audio signal may simulate the voice of the virtual avatar. A preview of the modified video/audio may be provided at the user'"'"'s device.
65 Citations
20 Claims
-
1. A method, comprising:
at an electronic device having at least a camera and a microphone; displaying a virtual avatar generation interface; displaying first preview content of a virtual avatar in the virtual avatar generation interface, the first preview content of the virtual avatar corresponding to real-time preview video frames of a user headshot in a field of view of the camera and associated headshot changes in an appearance; while displaying the first preview content of the virtual avatar, detecting an input in the virtual avatar generation interface; in response to detecting the input in the virtual avatar generation interface; capturing, via the camera, a video signal associated with the user headshot during a recording session; capturing, via the microphone, a voice audio signal during the recording session; and in response to detecting expiration of the recording session; transforming the voice audio signal into a first set of voice audio features, the first set of voice audio features including at least one speech formant of the voice audio signal; identifying a feature set of a predetermined voice audio signal associated with the virtual avatar; generating a second set of voice audio features based at least in part on the first set of voice audio features and the feature set of the predetermined voice audio signal associated with the virtual avatar, the second set of voice audio features including a modified version of the at least one speech formant of the voice audio signal; and composing a modified voice audio signal based at least in part on the second set of voice audio features; generating second preview content of the virtual avatar in the virtual avatar generation interface according to the video signal and the modified voice audio signal; and presenting the second preview content in the virtual avatar generation interface. - View Dependent Claims (2, 3, 4)
-
5. An electronic device, comprising:
-
a speaker; a camera; a microphone; and one or more processors in communication with the speaker, the camera, and the microphone, the one or more processors configured to; display a virtual avatar generation interface; display first preview content of a virtual avatar in the virtual avatar generation interface, the first preview content of the virtual avatar corresponding to real-time preview video frames of a user headshot in a field of view of the camera and associated headshot changes in an appearance; while displaying the first preview content of the virtual avatar, detect an input in the virtual avatar generation interface; in response to detecting the input in the virtual avatar generation interface; capture, via the camera, a video signal associated with the user headshot during a recording session; capture, utilizing the microphone, a voice audio signal during the recording; and in response to detecting expiration of the recording session; transform the voice audio signal into a first set of voice audio features, the first set of voice audio features including a formant of the voice audio signal; identify a feature set of a predetermined voice audio signal associated with a virtual avatar; generate a second set of voice audio features based at least in part on the first set of voice audio features and the feature set of the predetermined voice audio signal associated with the virtual avatar; and compose a modified voice audio signal according to the second set of voice audio features; generate second preview content of the virtual avatar in the virtual avatar generation interface according to the video signal and the modified voice audio signal; and present the second preview content in the virtual avatar generation interface. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A non-transitory computer-readable storage medium storing computer-executable instructions that, when executed by one or more processors, configure the one or more processors to perform operations comprising:
-
displaying a virtual avatar generation interface; receiving, at the virtual avatar generation interface, a selection associated with a virtual avatar, the virtual avatar being associated with particular vocal characteristics; displaying first preview content of the virtual avatar in the virtual avatar generation interface, the first preview content of the virtual avatar corresponding to real-time preview video frames of a user headshot in a field of view of the camera and associated headshot changes in an appearance; while displaying the first preview content of the virtual avatar, detecting an input in the virtual avatar generation interface; in response to detecting the input in the virtual avatar generation interface; capturing, utilizing a camera, a video signal associated with the user headshot during a recording session; capturing, utilizing a microphone and the virtual avatar generation interface, a voice audio signal during the recording session; and in response to detecting expiration of the recording session; transforming the voice audio signal of the user into a first set of voice audio features, the first set of voice audio features including at least one of speech formant of the voice audio signal; generating a second set of voice audio features based at least in part on the first set of voice audio features and the particular vocal characteristics associated with the virtual avatar; and composing a modified voice audio signal according to the second set of voice audio features; generating second preview content of the virtual avatar in the virtual avatar generation interface according to the video signal and the modified voice audio signal; and presenting the second preview content in the virtual avatar generation interface. - View Dependent Claims (17, 18, 19, 20)
-
Specification