Method and system for converting text to lip-synchronized speech in real time
First Claim
1. A method for presenting information in real time, the method comprising:
- providing a plurality of rules for controlling modification of words of a sequence of words, the rules including rules to add a sound after a phrase, to replace words with words of different complexity, to remove certain verbs without replacing the verbs, and to modify words based on identification of a current expression derived from comparison of words of the sequence to be spoken;
providing an expression store with images of a character representing different expressions of emotion for that character;
receiving a sequence of words;
modifying the words of the received sequence by for each of a plurality of rules,determining whether the rule applies to words of the received sequence; and
when it is determined that the rule applies, modifying the words of the received sequence in accordance with the rule;
generating speech for the character corresponding to the modified words, the speech represented by a sequence of phonemes including replacing phonemes with other phonemes to achieve regional effects;
identifying expressions of emotion from the words of the received sequence;
mapping the phonemes of the speech and the identified expressions for the character to the words of the received sequence;
generating a sequence of images based on the images of the expression store to represent the character speaking the generated speech and having the identified expressions of emotion and to represent hands of the character moved to effect output of the modified words in a sign language, wherein the mapping to words of the received sequence is used to synchronize the movement of the lips representing the character enunciating the phonemes of the words with the image of the character exhibiting the identified expressions of emotion mapped to those words so that the speaking of a word is synchronized with the image of the character exhibiting the expression of emotion identified from that word; and
outputting the generated speech represented by the sequence of phonemes and the sequence of generated images to portray the character speaking the words of the modified received sequence and having the identified expressions.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for presenting lip-synchronized speech corresponding to the text received in real time is provided. A lip synchronization system provides an image of a character that is to be portrayed as speaking text received in real time. The lip synchronization system receives a sequence of text corresponding to the speech of the character. It may modify the received text in various ways before synchronizing the lips. It may generate phonemes for the modified text that are adapted to certain idioms. The lip synchronization system then generates the lip-synchronized images based on the phonemes generated from the modified texts and based on the identified expressions.
44 Citations
24 Claims
-
1. A method for presenting information in real time, the method comprising:
-
providing a plurality of rules for controlling modification of words of a sequence of words, the rules including rules to add a sound after a phrase, to replace words with words of different complexity, to remove certain verbs without replacing the verbs, and to modify words based on identification of a current expression derived from comparison of words of the sequence to be spoken; providing an expression store with images of a character representing different expressions of emotion for that character; receiving a sequence of words; modifying the words of the received sequence by for each of a plurality of rules, determining whether the rule applies to words of the received sequence; and when it is determined that the rule applies, modifying the words of the received sequence in accordance with the rule; generating speech for the character corresponding to the modified words, the speech represented by a sequence of phonemes including replacing phonemes with other phonemes to achieve regional effects; identifying expressions of emotion from the words of the received sequence; mapping the phonemes of the speech and the identified expressions for the character to the words of the received sequence; generating a sequence of images based on the images of the expression store to represent the character speaking the generated speech and having the identified expressions of emotion and to represent hands of the character moved to effect output of the modified words in a sign language, wherein the mapping to words of the received sequence is used to synchronize the movement of the lips representing the character enunciating the phonemes of the words with the image of the character exhibiting the identified expressions of emotion mapped to those words so that the speaking of a word is synchronized with the image of the character exhibiting the expression of emotion identified from that word; and outputting the generated speech represented by the sequence of phonemes and the sequence of generated images to portray the character speaking the words of the modified received sequence and having the identified expressions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for presenting a lip-syncing character, comprising:
-
a rules store containing rules for controlling modification of words of sequence of words, the rules including rules to add a sound after a phrase and to remove certain verbs; an expression store containing images of a character representing different expressions of emotion for that character; a modify word component that receives a sequence of words in real time and modifies the words of the sequence in accordance with the rules of the rules store; an identify expressions component that identifies expressions of emotion from the words of the sequence and maps the expressions of emotion to the words; a lip synchronization component that inputs the modified words of the sequence, the map of expressions of emotion to the words, and the images of the character representing different expressions of emotion and outputs in real time as the words are received speech corresponding to the modified words of the sequence and images of the character speaking the output speech and having the identified expressions of emotion synchronized to the speech as indicated by the map and images of hands of the character moving to effect output of the modified words in a sign language. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A computer-readable storage medium containing instructions for controlling a computer to present images of a character speaking, by a method comprising:
-
providing a plurality of rules for controlling modification of words of a sequence of words, the rules including rules to add a sound after a phrase and to replace words with words of different complexity; providing images of a character representing different expressions of emotion of the character; receiving a sequence of words in real time; modifying the words of the sequence in accordance with the provided rules; after modifying the words, generating speech corresponding to the received sequence of words as modified; identifying expressions of emotion from the words of the received sequence of words; generating a sequence of images based on the provided images to represent the character speaking the generated speech and exhibiting the identified expressions of emotion so that the speaking of a word is synchronized with an expression of emotion identified from that word and to represent the character using a sign language to effect the output of modified words of the sequence; and outputting the generated speech and sequence of images to portray the character speaking the text with the identified expression of emotion. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24)
-
Specification