SYSTEMS AND METHODS FOR VOICE PERSONALIZATION OF VIDEO CONTENT
First Claim
1. A method for generating an audio portion of media content, the method comprising:
- receiving a selection from a user of a piece of prerecorded media content, the prerecorded media content comprising a background scene having a character therein;
accessing an individualized three-dimensional (3D) head model;
accessing at least one voice sample of the user;
converting the at least one voice sample to at least one audio track;
detecting from the at least one audio track a plurality of phonemes;
creating at least one viseme track that associates the plurality of phonemes with a plurality of visemes, each of the plurality of visemes being indicative of an animated mouth movement of the individualized 3D head model;
synchronizing the at least one audio track and the at least one viseme track; and
generating personalized media content by,associating the individualized 3D head model with the character of the background scene, andassociating the synchronized at least one audio track and at least one viseme track with the individualized 3D head model to cause the animated mouth movement of the individualized 3D head model to correspond to the at least one audio track during playback of the personalized media content.
4 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are disclosed for performing voice personalization of video content. The personalized media content may include a composition of a background scene having a character, head model data representing an individualized three-dimensional (3D) head model of a user, audio data simulating the user'"'"'s voice, and a viseme track containing instructions for causing the individualized 3D head model to lip sync the words contained in the audio data. The audio data simulating the user'"'"'s voice can be generated using a voice transformation process. In certain examples, the audio data is based on a text input or selected by the user (e.g., via a telephone or computer) or a textual dialogue of a background character.
-
Citations
21 Claims
-
1. A method for generating an audio portion of media content, the method comprising:
-
receiving a selection from a user of a piece of prerecorded media content, the prerecorded media content comprising a background scene having a character therein; accessing an individualized three-dimensional (3D) head model; accessing at least one voice sample of the user; converting the at least one voice sample to at least one audio track; detecting from the at least one audio track a plurality of phonemes; creating at least one viseme track that associates the plurality of phonemes with a plurality of visemes, each of the plurality of visemes being indicative of an animated mouth movement of the individualized 3D head model; synchronizing the at least one audio track and the at least one viseme track; and generating personalized media content by, associating the individualized 3D head model with the character of the background scene, and associating the synchronized at least one audio track and at least one viseme track with the individualized 3D head model to cause the animated mouth movement of the individualized 3D head model to correspond to the at least one audio track during playback of the personalized media content. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. An animation system for performing voice personalization of media content, the animation system comprising:
-
a piece of media content comprising a background scene having a character; head model data representing an individualized three-dimensional (3D) head model; audio data representing at least one voice sample of a user, the at least one voice sample corresponding to a first text; a processor configured to receive the media content, the head model data and the audio data to generate personalized media content by processing the at least one voice sample to create at least one audio track; detecting from the at least one audio track a plurality of phonemes; creating at least one viseme track that associates the plurality of phonemes with a plurality of visemes, each of the plurality of visemes comprising instructions for a corresponding animated mouth movement of the individualized 3D head model; and compositing the media content, the individualized 3D head model, the at least one audio track and the at least one viseme track such that the individualized 3D head model is associated with the character and such that the at least one audio track and the at least one viseme track are associated with the individualized 3D head model to cause the animated mouth movement of the individualized 3D head model to correspond to the at least one audio track during playback of the personalized media content. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. A system for animating media content, the system comprising:
-
means for receiving a selection of a piece of media content, the media content comprising a background scene having a character therein; means for receiving an individualized three-dimensional (3D) head model of a user; means for receiving at least one voice sample of the user; means for converting the at least one voice sample to at least one audio track; means for detecting from the at least one audio track a plurality of phonemes; means for creating at least one viseme track that associates the plurality of phonemes with a plurality of visemes, each of the plurality of visemes being indicative of an animated mouth movement of the individualized 3D head model; and means for generating personalized media content by associating the individualized 3D head model with the character of the background scene, and associating the at least one audio track and the at least one viseme track with the individualized 3D head model to cause the animated mouth movement of the individualized 3D head model to correspond to the at least one audio track during playback of the personalized media content. - View Dependent Claims (20, 21)
-
Specification