Character animation
First Claim
Patent Images
1. Apparatus for generating an animated character representation, the apparatus comprising a processing system having:
- an input for receiving marked-up input data including;
i. content data representing speech to be presented; and
, ii. presentation data representing the manner in which the speech is to be presented;
a processor coupled to the input for generating data according to a defined time-base, the data including;
i phoneme data generated in accordance with the content data; and
, ii viseme data generated in accordance with the phoneme data and the presentation data;
the processor being further adapted to;
iii. generate audio data in accordance with the phoneme data;
iv. generate image data in accordance with the viseme data and the presentation data; and
, v. synchronise the output of the audio and image data in accordance with the defined time-base.
3 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a method and apparatus for generating an animated character representation. This is achieved by using marked-up data including both content data and presentation data. The system then uses this information to generate phoneme and viseme data representing the speech to be presented by the character. By providing the presentation data this ensures that at least some variation in character appearance will automatically occur beyond that of the visemes required to make the character appear to speak. This contributes to the character having a far more lifelike appearance.
-
Citations
41 Claims
-
1. Apparatus for generating an animated character representation, the apparatus comprising a processing system having:
-
an input for receiving marked-up input data including;
i. content data representing speech to be presented; and
,ii. presentation data representing the manner in which the speech is to be presented;
a processor coupled to the input for generating data according to a defined time-base, the data including;
i phoneme data generated in accordance with the content data; and
,ii viseme data generated in accordance with the phoneme data and the presentation data;
the processor being further adapted to;
iii. generate audio data in accordance with the phoneme data;
iv. generate image data in accordance with the viseme data and the presentation data; and
,v. synchronise the output of the audio and image data in accordance with the defined time-base. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 29)
i. first viseme data generated in accordance with the phoneme data; and
,ii. second viseme data generated in accordance with the first viseme data modified by presentation data;
wherein the image data is generated in accordance with the second viseme data.
-
-
3. Apparatus according to claim 1, wherein the processor includes:
-
a text-to-speech processor for generating the phoneme data and the audio data;
an animation processor for generating the viseme data and the image data; and
,a parser for;
i. parsing the received marked-up data;
ii. detecting predetermined Content data which is to be presented in a predetermined manner;
iii. generating presentation data representative of the predetermined manner; and
,iv. modifying the received marked-up data with the generated presentation data.
-
-
4. Apparatus according to claim 3, the processing system further comprising a store for storing data, the parser being coupled to the store to obtain an indication of the predetermined content data therefrom.
-
5. Apparatus according to claim 4, wherein the predetermined content data includes words which are names, nouns, negatives and numbers.
-
6. Apparatus according to claim 3, wherein the text-to-speech processor includes a linguistic processor adapted to:
-
parse the content data;
determine the phonemes required to represent the content data; and
generate phoneme time references for each of the phonemes, the phoneme time reference indicating the time at which the respective phoneme should be presented with respect to the time base.
-
-
7. Apparatus according to claim 6, wherein the linguistic processor is further adapted to:
-
parse the presentation data;
generate a number of tags representing the presentation data; and
,generate tag time references for each of the tags, the tag time reference indicating the time at which the respective tag should modify the manner of presentation with respect to the time base.
-
-
8. Apparatus according to claim 6, wherein the linguistic processor is coupled to the store to obtain an indication of the phonemes required to represent respective words.
-
9. Apparatus according to claim 6, wherein the text-to-speech processor includes a concatenation processor adapted to:
-
determine phoneme data representing each of the phonemes; and
,concatenate the phoneme data in accordance with the phoneme time references to generate audio data representing the speech.
-
-
10. Apparatus according to claim 9, wherein the concatenation processor is coupled to the store to obtain the phoneme data therefrom in accordance with the determined phonemes.
-
11. Apparatus according to claim 6, wherein the animation processor includes a phoneme processor adapted to:
-
obtain the determined phonemes, and the associated phoneme time references, from the linguistic processor;
determine visemes corresponding to each of the determined phonemes; and
,determine a viseme time reference for each viseme in accordance with the phoneme time reference of the corresponding phoneme.
-
-
12. Apparatus according to claim 11, wherein the phoneme processor is coupled to the store to obtain translation data therefrom, the translation data indicating a viseme associated with each of the phonemes, the phoneme processor using the translation data to determine the visemes in accordance with the determined phonemes.
-
13. Apparatus according to claim 12, wherein the animation processor includes a viseme processor coupled to the store, the viseme processor being adapted to obtain viseme data from the store in accordance with the determined visemes, the viseme data including a number of parameters representing the variation required from a base character image to represent the respective viseme.
-
14. Apparatus according to claim 13, wherein the animation processor includes at least one modification processor adapted to modify the viseme data in accordance with the presentation data.
-
15. Apparatus according to claim 14, wherein the or each modification processor is coupled to the store to obtain modification data therefrom, the or each modification processor using the modification data to modify the parameters of the viseme data.
-
16. Apparatus according to claim 14, wherein the or each modification processor is adapted to modify at least one of a specified expression, behaviour, and action.
-
17. Apparatus according to claim 14, wherein the or each modification processor is further adapted to modify the viseme data in accordance with pseudo-random data.
-
18. Apparatus according to claim 13, wherein the animation processor further comprises an interpolation processor for interpolating the viseme data to determine the appearance of the character at times between the specified visemes.
-
19. Apparatus according to claim 18, wherein the processing system further comprises a render processor coupled to the interpolation processor for generating image data in accordance with the interpolated viseme data, the image data representing the character presenting the speech defined by the content data.
-
20. Apparatus according to claim 19, wherein the processing system further includes an video processor, the render processor forming part of the video processor.
-
21. Apparatus according to claim 20, wherein the video processor generates video data representing the animated character sequence.
-
22. Apparatus according to claim 1, wherein the processing system further comprises a communications network interface, which in use couples the computing device to a communications network, thereby allowing the animated character representation to be transferred to other processing systems coupled to the communications network.
-
23. Apparatus according to claim 22, wherein in use the input is adapted to receive marked-up data from the communications network.
-
24. Image and/or audio data generated using apparatus according to claim 1.
-
29. A method according to claim 8, wherein the method of generating the phoneme data further comprises:
-
using each of the determined phonemes to obtain respective phoneme data; and
,concatenating the phoneme data in accordance with the phoneme time references to generate audio data representing the speech.
-
-
25. A method of generating an animated character representation using a processing system, the method comprising:
-
receiving marked-up input data including;
content data representing speech to be presented; and
,presentation data representing the manner in which the speech is presented;
generating data according to a defined time-base, the data including;
phoneme data generated in accordance with the content data; and
,viseme data generated in accordance with the phoneme data and the presentation data;
generating audio data in accordance with the phoneme data;
generating image data in accordance with the viseme data and the presentation data; and
,synchronising the output of the audio and image data in accordance with the defined time-base. - View Dependent Claims (26, 27, 28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41)
i. generating first viseme data in accordance with the phoneme data; and
,ii. generating second viseme data in accordance with the first viseme data modified by presentation data;
wherein the image data is generated in accordance with the second viseme data.
-
-
27. A method according to claim 25, wherein the method further comprises:
-
parsing the received marked-up data;
detecting predetermined content data which is to be presented in a predetermined manner;
generating presentation data representative of the predetermined manner; and
,modifying the received marked-up data with the generated presentation data.
-
-
28. A method according to claim 27, wherein the predetermined content data includes words which are names, nouns, negatives and numbers.
-
30. A method according to claim 28, wherein the method of generating the viseme data comprises:
-
determining visemes corresponding to each of the determined phonemes; and
,determining a viseme time reference for each viseme in accordance with the phoneme time reference of the corresponding phoneme; and
,using the viseme to obtain the viseme data.
-
-
31. A method according to claim 30, wherein the visemes are determining by accessing translation data in accordance with the determined phonemes, the translation data indicating a viseme corresponding to each phoneme.
-
32. A method according to claim 30, wherein the viseme data includes a number of parameters representing the variation required from a base character image to represent the respective viseme.
-
33. A method according to claim 30, wherein the method further comprises modifying the viseme data by modifying the parameters in accordance with the presentation data, the viseme data being modified to represent at least one of a specified expression, behaviour, and action.
-
34. A method according to claim 30, wherein the viseme data is further modified in accordance with pseudo-random behaviour.
-
35. A method according to claim 30, wherein the method further comprises interpolating the viseme data to determine the appearance of the character at times between the specified visemes.
-
36. A method according to claim 35, wherein the method further comprises using the interpolated viseme data to generate image data representing the character presenting the speech defined by the content data.
-
37. A method according to claim 25, wherein the method of generating the phoneme data comprises:
-
parsing the content data;
determining the phonemes required to represent the content data; and
,generating phoneme time references for each of the phonemes, the phoneme time reference indicating the time at which the phoneme should be presented with respect to the time base.
-
-
38. A method according to claims 37, wherein the method further comprises:
-
parsing the presentation data;
generating a number of tags representing the presentation data; and
,generating tag time references for each of the tags, the tag time reference indicating the time at which the respective tag should modify the manner of presentation with respect to the time base.
-
-
39. A method according to claim 37, wherein the method of determining the phonemes comprises using the parsed content data to access a dictionary, the dictionary indicating the phonemes required to represent respective words.
-
40. A method according to claim 37, wherein the method further comprises modifying the phoneme data in accordance with the presentation data.
-
41. Image data and/or audio data generated in accordance with the method of claim 25.
Specification