Video generation based on text

US 9,082,400 B2
Filed: 05/04/2012
Issued: 07/14/2015
Est. Priority Date: 05/06/2011
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

inputting a text sequence at a processing device; and

generating, by the processing device, a video sequence of a person based on the text sequence to simulate visual and audible emotional expressions of the person, including using an audio model of the person'"'"'s voice to generate an audio portion of the video sequence, said generating being based on a machine learning analysis of a real life video sequence of the person.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for generating a video sequence of a person based on a text sequence, are disclosed herein. Based on the received text sequence, a processing device generates the video sequence of a person to simulate visual and audible emotional expressions of the person, including using an audio model of the person'"'"'s voice to generate an audio portion of the video sequence. The emotional expressions in the visual portion of the video sequence are simulated based a priori knowledge about the person. For instance, the a priori knowledge can include photos or videos of the person captured in real life.

Citations

22 Claims

1. A method comprising:
- inputting a text sequence at a processing device; and
  
  generating, by the processing device, a video sequence of a person based on the text sequence to simulate visual and audible emotional expressions of the person, including using an audio model of the person'"'"'s voice to generate an audio portion of the video sequence, said generating being based on a machine learning analysis of a real life video sequence of the person.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein the processing device is a mobile device, the text sequence is inputted from a second mobile device via a Short Message Service (SMS) channel, and said generating a video sequence of a person comprises generating, by the mobile device, a video sequence of a person based on shared information stored on the mobile device and the second mobile device.
  - 3. The method of claim 1, wherein the text sequence includes a set of words including at least one word, and wherein the video sequence is generated such that the person appears to utter the words in the video sequence.
  - 4. The method of claim 1, wherein the text sequence includes a text representing an utterance, and wherein the video sequence is generated such that the person appears to utter the utterance in the video sequence.
  - 5. The method of claim 1, wherein the text sequence includes a word and an indicator for the word, the indicator indicates an emotional expression of the person at a time in the video sequence when the person appears to utter the word in the video sequence, the indicator is within a predetermined set of indicators, and each indicator of the predetermined set of indicators is associated with a different emotional expression.
  - 6. The method of claim 1, wherein said generating a video sequence comprises:
    - generating, by the processing device, a video sequence of a person to simulate visual and audible emotional expressions of the person based on the text sequence and a priori knowledge of the person.
  - 7. The method of claim 1, wherein said generating a video sequence comprises:
    - mapping words in the text sequence to facial features of the person; and
      
      rendering the facial features of the person in a background scene.
  - 8. The method of claim 7, wherein the words are mapped to the facial features based on one or more indicators for the words, wherein the indicators indicate emotional expressions of the person at a time in the video sequence when the person appears to utter the words in the video sequence.
  - 9. The method of claim 7, wherein the facial features include a specific facial feature that applies specifically to the person.
  - 10. The method of claim 7, wherein said generating the video sequence further comprises:
    - generating body gestures of the person compatible with the facial features of the person.
  - 11. The method of claim 1, wherein said generating the video sequence comprises:
    - generating an audio sequence representing speech of the person based on words in the text sequence, by using the audio model based on the person'"'"'s voice.
  - 12. The method of claim 1, wherein the receiving of a text sequence comprises:
    - receiving a text sequence in real-time;
      
      wherein the generating of a video sequence comprises;
      
      generating a video sequence of a person in real-time based on the text sequence to simulate visual and audible emotional expressions of the person, including using an audio model of the person'"'"'s voice to generate an audio portion of the video sequence.

13. A method comprising:
- inputting a text sequence at a processing device;
  
  generating, by the processing device, a visual sequence of a person based on the text sequence to simulate visual emotional expressions of the person, wherein a face portion of each frame of the visual sequence is represented by a combination of a priori images of the person;
  
  generating, by the processing device, an audio sequence of the person based on the text sequence to simulate audible emotional expressions of the person, using an audio model of the person'"'"'s voice, said generating the visual sequence of the person being based on a machine learning analysis of a real life video sequence of the person; and
  
  producing, by the processing device, a video sequence of the person by merging the visual sequence and the audio sequence, wherein the visual sequence and the audio sequence are synchronized based on the text sequence.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The method of claim 13, wherein the face portion of each frame of the visual sequence is represented by a linear combination of a priori images of the person, and each a priori image of the a priori images of the person corresponds to a deviation from a mean image of the person.
  - 15. The method of claim 13, wherein said generating a visual sequence of a person based on the text sequence comprises:
    - dividing each frame of the visual sequence into two or more regions, wherein at least one of the regions is represented by a combination of a priori images of the person.
  - 16. The method of claim 13, wherein the audio model of the person'"'"'s voice includes a plurality of voice features created from speech samples of the person, each voice feature of the plurality of voice features corresponds to a text.
  - 17. The method of claim 16, wherein each voice feature of the plurality of voice features corresponds to a word, a phoneme, or an utterance.
  - 18. The method of claim 13, wherein the audio model of the person'"'"'s voice includes a plurality of voice features created from speech samples of the person, a second person'"'"'s speech according to the text sequence, and a correspondence between the person'"'"'s voice wave forms and the second person'"'"'s voice wave forms;
    - and wherein the person'"'"'s voice features are mapped to the second person'"'"'s speech based on the correspondence between the person'"'"'s voice wave forms and the second person'"'"'s voice wave forms.
  - 19. The method of claim 13, wherein the audio model of the person'"'"'s voice includes a plurality of voice features created from speech samples of the person, a speech generated by a text-to-speech model according to the text sequence, and a correspondence between the person'"'"'s voice wave forms and the text-to-speech model'"'"'s voice wave forms;
    - and wherein the person'"'"'s voice features are mapped to the speech based on the correspondence between the person'"'"'s voice wave forms and the text-to-speech model'"'"'s voice wave forms.

20. A method comprising:
- creating a text sequence, wherein the text sequence represents one or more words that a person is to utter in a video sequence to be generated using an audio model based on the person'"'"'s voice, to visually and audibly represent a range of emotional expressions of the person;
  
  identifying an indicator associated with a word within the text sequence, wherein the indicator is one in a predetermined set of indicators, each of which indicates a different emotional expression of the person;
  
  incorporating the indicator into the text sequence; and
  
  sending the text sequence to a device configured to generate the video sequence based on the text sequence and a machine learning analysis of a real life video sequence of the person.
- View Dependent Claims (21, 22)
- - 21. The method of claim 20, wherein said identifying an indicator comprises:
    - selecting an item from a menu of items to be associated with a word within the text sequence, wherein each item in the menu is an indicator suggesting an emotional expression of the person.
  - 22. The method of claim 20, wherein said identifying an indicator comprises:
    - identifying an indicator associated with a word within the text sequence based on an audio sequence of a speaker speaking the word within the text sequence using an automatic speech recognition (ASR) engine.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Behrooz Rezvani
Original Assignee
Seyyer, Inc.
Inventors
Rezvani, Behrooz, Rouhi, Ali
Primary Examiner(s)
ROBERTS, SHAUN A

Application Number

US13/464,915
Publication Number

US 20130124206A1
Time in Patent Office

1,166 Days
Field of Search

704/235, 704/258, 704/270, 345/473
US Class Current

1/1
CPC Class Codes

G06T 13/40   of characters, e.g. humans,...

G10L 13/08   Text analysis or generation...

G10L 13/10   Prosody rules derived from ...

H04M 1/72436   for text messaging, e.g. sh...

H04M 1/72439   for image or video messaging

Video generation based on text

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Video generation based on text

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links