Conversational interface agent
First Claim
Patent Images
1. A computer readable medium having instructions, which when executed on a computer provide a user interface, the instructions comprising:
- a speech synthesizer receiving input for synthesis and providing an audio output signal; and
a video rendering module receiving information related to the audio output signal, the video rendering module rendering a representation comprising a sequence of video frames of a talking head having a talking state with mouth movements in accordance with the audio output signal added to each of the frames during the talking state and a waiting state with added non-talking mouth movements during the waiting state in accordance with listening, and wherein the video rendering module returns to an earlier, preselected frame in the sequence upon reaching a selected frame in the sequence.
2 Assignments
0 Petitions
Accused Products
Abstract
A video rewrite technique for rendering a talking head or agent completely simulates a conversation by including a waiting or listening state. Smooth transitions are provided to and from a talking state.
-
Citations
11 Claims
-
1. A computer readable medium having instructions, which when executed on a computer provide a user interface, the instructions comprising:
-
a speech synthesizer receiving input for synthesis and providing an audio output signal; and a video rendering module receiving information related to the audio output signal, the video rendering module rendering a representation comprising a sequence of video frames of a talking head having a talking state with mouth movements in accordance with the audio output signal added to each of the frames during the talking state and a waiting state with added non-talking mouth movements during the waiting state in accordance with listening, and wherein the video rendering module returns to an earlier, preselected frame in the sequence upon reaching a selected frame in the sequence. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer readable medium having instructions, which when executed on a computer provide a user interface, the instructions comprising:
-
a speech synthesizer receiving input for synthesis and providing an audio output signal; and a video rendering module receiving information related to the audio output signal, the video rendering module rendering a representation comprising a sequence of video frames of a talking head having a talking state with mouth movements in accordance with the audio output signal added to each of the frames during the talking state and a waiting state with added non-talking mouth movements during the waiting state in accordance with listening, wherein the video rendering module tracks movements of the talking head in the sequence of video frames, wherein the video rendering module transforms affine parameters to physical movements of the talking head for each frame, wherein the physical movements include translations and rotations of the talking head and wherein for each of said plurality of frames, a mouth position corresponding to the talking state is added as a function of the physical parameters of the frame if a difference in at least one of physical parameters between the frame and the corresponding interpolated physical parameter exceeds a selected threshold, whereas if the difference in at least one of physical parameters between the frame and the corresponding interpolated physical parameter does not exceed the selected threshold, the mouth position corresponding to the talking state is added as a function of interpolated physical parameters.
-
-
8. A computer readable medium having instructions, which when executed on a computer provide a user interface, the instructions comprising:
-
a speech synthesizer receiving input for synthesis and providing an audio output signal; and a video rendering module receiving information related to the audio output signal, the video rendering module rendering a representation of a talking head having a talking state with mouth movements in accordance with the audio output signal and a waiting state with mouth movements in accordance with listening, the video rendering module accessing a store having a sequence of frames of the talking head and continuously rendering at least a portion of each of the frames in the sequence of frames while selectively adding a corresponding mouth position for the talking state to each of the frames in accordance with the audio output signal and in accordance with tracking movements of the talking head during the sequence of frames, wherein the video rendering module transforms affine parameters to physical movements of the talking head for each frame, wherein the physical movements include translations and rotations of the talking head, wherein the mouth positions are added based upon interpolated physical movements of the talking head, wherein for each of a plurality of frames, interpolated physical movements are calculated as a function of a corresponding preceding frame and a corresponding succeeding frame, and wherein for each of said plurality of frames, a mouth position corresponding to the talking state is added as a function of the physical parameters of the frame if a difference in at least one of physical parameters between the frame and the corresponding interpolated physical parameter exceeds a selected threshold, whereas if the difference in at least one of physical parameters between the frame and the corresponding interpolated physical parameter does not exceed the selected threshold, the mouth position corresponding to the talking state is added as a function of interpolated physical parameters.
-
-
9. A computer-implemented method for generating a talking head on a computer display to simulate a conversation, the method comprising:
-
continuously rendering a sequence of video frames of a talking head with each frame having mouth characteristics indicative of a non-talking state, wherein continuously rendering includes returning to an earlier, preselected frame in the sequence upon reaching a selected frame in the sequence; tracking movements of the talking head throughout the sequence of video frames; outputting a voice audio; and selectively adding a corresponding mouth position to selected frames of the video sequence as a function of the voice audio and tracked movements of the talking head.
-
-
10. A computer-implemented method for generating a talking head on a computer display to simulate a conversation, the method comprising:
-
continuously rendering a sequence of video frames of a talking head with each frame having mouth characteristics indicative of a non-talking state; tracking physical movements including translations and rotations of the talking head throughout the sequence of video frames, wherein tracking movements includes transforming affine parameters to physical movements of the talking head for each frame; calculating interpolated physical movements of the talking head as a function of a corresponding preceding frame and a corresponding succeeding frame for each of a plurality of frames outputting a voice audio; and selectively adding a corresponding mouth position to selected frames of the video sequence as a function of the voice audio and tracked movements of the talking head, and wherein adding a mouth position includes, for each of said plurality of frames, adding a mouth position corresponding to the talking state is added as a function of the physical parameters of the frame if a difference in at least one of physical parameters between the frame and the corresponding interpolated physical parameter exceeds a selected threshold, whereas if the difference in at least one of physical parameters between the frame and the corresponding interpolated physical parameter does not exceed the selected threshold, the mouth position corresponding to the talking state is added as a function of interpolated physical parameters. - View Dependent Claims (11)
-
Specification