Audio-Visual Dialogue System and Method
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides an audio-visual dialogue system that allows a user to create an ‘avatar’ which may be customised to look and sound a particular way. The avatar may be created to resemble, for example, a person, animal or mythical creature, and generated to have a variable voice which may be female or male. The system then employs a real-time voice conversion in order to transform any audio input, for example, spoken word, into a target voice that is selected and customised by the user. The system is arranged to facially animate the avatar using a real-time lip-synching algorithm such that the generated avatar and the target voice are synchronised.
39 Citations
98 Claims
-
1-78. -78. (canceled)
-
79. An audio-visual dialogue system, comprising:
-
an audio input device; an audio output device; a visual output device; and a processor, the processor being arranged to; receive an input audio signal representing a source voice from the audio input device; perform substantially real-time voice conversion on the input audio signal to produce an output audio signal representing a target voice, wherein the output audio signal is provided to the audio output device, and wherein the real-time voice conversion process includes; i) decomposing the input audio signal into a set of time-varying filter characteristics and a residual excitation signal; ii) spectrally transforming the time-varying filter characteristics, and/or modifying the pitch of the residual excitation signal; and iii) synthesising the output audio signal in dependence on the transformed time-varying filter characteristics and/or the pitch modified residual excitation signal; generate an avatar, wherein the avatar is visually displayed on the visual output device; and facially animate the generated avatar, wherein the animation is synchronised with the output audio signal. - View Dependent Claims (80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97)
-
-
98. A method of audio-visual dialogue, comprising:
-
receiving an input audio signal representing a source voice from an audio input device; performing substantially real-time voice conversion on the input audio signal to produce an output audio signal representing a target voice, wherein the output audio signal is provided to an audio output device, and wherein the substantially real-time voice conversion includes; i) decomposing the input audio signal into a set of time-varying filter characteristics and a residual excitation signal; ii) spectrally transforming the time-varying filter characteristics, and/or modifying the pitch of the residual excitation signal; and iii) synthesising the output audio signal in dependence on the transformed time-varying filter characteristics and/or the pitch modified residual excitation signal; generating an avatar, wherein the avatar is visually displayed on a visual output device; and facially animating the generated avatar, wherein the animation is synchronised with the output audio signal.
-
Specification