Video generation based on text
First Claim
Patent Images
1. A method comprising:
- inputting a text sequence at a processing device; and
generating, by the processing device, a video sequence of a person based on the text sequence to simulate visual and audible emotional expressions of the person, including using an audio model of the person'"'"'s voice to generate an audio portion of the video sequence, said generating being based on a machine learning analysis of a real life video sequence of the person.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques for generating a video sequence of a person based on a text sequence, are disclosed herein. Based on the received text sequence, a processing device generates the video sequence of a person to simulate visual and audible emotional expressions of the person, including using an audio model of the person'"'"'s voice to generate an audio portion of the video sequence. The emotional expressions in the visual portion of the video sequence are simulated based a priori knowledge about the person. For instance, the a priori knowledge can include photos or videos of the person captured in real life.
-
Citations
22 Claims
-
1. A method comprising:
-
inputting a text sequence at a processing device; and generating, by the processing device, a video sequence of a person based on the text sequence to simulate visual and audible emotional expressions of the person, including using an audio model of the person'"'"'s voice to generate an audio portion of the video sequence, said generating being based on a machine learning analysis of a real life video sequence of the person. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method comprising:
-
inputting a text sequence at a processing device; generating, by the processing device, a visual sequence of a person based on the text sequence to simulate visual emotional expressions of the person, wherein a face portion of each frame of the visual sequence is represented by a combination of a priori images of the person; generating, by the processing device, an audio sequence of the person based on the text sequence to simulate audible emotional expressions of the person, using an audio model of the person'"'"'s voice, said generating the visual sequence of the person being based on a machine learning analysis of a real life video sequence of the person; and producing, by the processing device, a video sequence of the person by merging the visual sequence and the audio sequence, wherein the visual sequence and the audio sequence are synchronized based on the text sequence. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. A method comprising:
-
creating a text sequence, wherein the text sequence represents one or more words that a person is to utter in a video sequence to be generated using an audio model based on the person'"'"'s voice, to visually and audibly represent a range of emotional expressions of the person; identifying an indicator associated with a word within the text sequence, wherein the indicator is one in a predetermined set of indicators, each of which indicates a different emotional expression of the person; incorporating the indicator into the text sequence; and sending the text sequence to a device configured to generate the video sequence based on the text sequence and a machine learning analysis of a real life video sequence of the person. - View Dependent Claims (21, 22)
-
Specification