Conversational interface agent

US 7,019,749 B2
Filed: 03/14/2002
Issued: 03/28/2006
Est. Priority Date: 12/28/2001
Status: Expired due to Fees

First Claim

Patent Images

1. A computer readable medium having instructions, which when executed on a computer provide a user interface, the instructions comprising:

a speech synthesizer receiving input for synthesis and providing an audio output signal; and

a video rendering module receiving information related to the audio output signal, the video rendering module rendering a representation comprising a sequence of video frames of a talking head having a talking state with mouth movements in accordance with the audio output signal added to each of the frames during the talking state and a waiting state with added non-talking mouth movements during the waiting state in accordance with listening, and wherein the video rendering module returns to an earlier, preselected frame in the sequence upon reaching a selected frame in the sequence.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A video rewrite technique for rendering a talking head or agent completely simulates a conversation by including a waiting or listening state. Smooth transitions are provided to and from a talking state.

Citations

11 Claims

1. A computer readable medium having instructions, which when executed on a computer provide a user interface, the instructions comprising:
- a speech synthesizer receiving input for synthesis and providing an audio output signal; and
  
  a video rendering module receiving information related to the audio output signal, the video rendering module rendering a representation comprising a sequence of video frames of a talking head having a talking state with mouth movements in accordance with the audio output signal added to each of the frames during the talking state and a waiting state with added non-talking mouth movements during the waiting state in accordance with listening, and wherein the video rendering module returns to an earlier, preselected frame in the sequence upon reaching a selected frame in the sequence.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The computer readable medium of claim 1 wherein the video rendering module tracks movements of the talking head in the sequence of video frames.
  - 3. The computer readable medium of claim 2 wherein the video rendering module transforms affine parameters to physical movements of the talking head for each frame.
  - 4. The computer readable medium of claim 3 wherein the physical movements include translations and rotations of the talking head.
  - 5. The computer readable medium of claim 3 wherein for each of a plurality of frames, interpolated physical movements are calculated as a function of a corresponding preceding frame and a corresponding succeeding frame.
  - 6. The computer readable medium of claim 2 wherein the talking mouth positions are added based upon interpolated physical movements of the talking head.

7. A computer readable medium having instructions, which when executed on a computer provide a user interface, the instructions comprising:
- a speech synthesizer receiving input for synthesis and providing an audio output signal; and
  
  a video rendering module receiving information related to the audio output signal, the video rendering module rendering a representation comprising a sequence of video frames of a talking head having a talking state with mouth movements in accordance with the audio output signal added to each of the frames during the talking state and a waiting state with added non-talking mouth movements during the waiting state in accordance with listening, wherein the video rendering module tracks movements of the talking head in the sequence of video frames, wherein the video rendering module transforms affine parameters to physical movements of the talking head for each frame, wherein the physical movements include translations and rotations of the talking head and wherein for each of said plurality of frames, a mouth position corresponding to the talking state is added as a function of the physical parameters of the frame if a difference in at least one of physical parameters between the frame and the corresponding interpolated physical parameter exceeds a selected threshold, whereas if the difference in at least one of physical parameters between the frame and the corresponding interpolated physical parameter does not exceed the selected threshold, the mouth position corresponding to the talking state is added as a function of interpolated physical parameters.

8. A computer readable medium having instructions, which when executed on a computer provide a user interface, the instructions comprising:
- a speech synthesizer receiving input for synthesis and providing an audio output signal; and
  
  a video rendering module receiving information related to the audio output signal, the video rendering module rendering a representation of a talking head having a talking state with mouth movements in accordance with the audio output signal and a waiting state with mouth movements in accordance with listening, the video rendering module accessing a store having a sequence of frames of the talking head and continuously rendering at least a portion of each of the frames in the sequence of frames while selectively adding a corresponding mouth position for the talking state to each of the frames in accordance with the audio output signal and in accordance with tracking movements of the talking head during the sequence of frames, wherein the video rendering module transforms affine parameters to physical movements of the talking head for each frame, wherein the physical movements include translations and rotations of the talking head, wherein the mouth positions are added based upon interpolated physical movements of the talking head, wherein for each of a plurality of frames, interpolated physical movements are calculated as a function of a corresponding preceding frame and a corresponding succeeding frame, and wherein for each of said plurality of frames, a mouth position corresponding to the talking state is added as a function of the physical parameters of the frame if a difference in at least one of physical parameters between the frame and the corresponding interpolated physical parameter exceeds a selected threshold, whereas if the difference in at least one of physical parameters between the frame and the corresponding interpolated physical parameter does not exceed the selected threshold, the mouth position corresponding to the talking state is added as a function of interpolated physical parameters.

9. A computer-implemented method for generating a talking head on a computer display to simulate a conversation, the method comprising:
- continuously rendering a sequence of video frames of a talking head with each frame having mouth characteristics indicative of a non-talking state, wherein continuously rendering includes returning to an earlier, preselected frame in the sequence upon reaching a selected frame in the sequence;
  
  tracking movements of the talking head throughout the sequence of video frames;
  
  outputting a voice audio; and
  
  selectively adding a corresponding mouth position to selected frames of the video sequence as a function of the voice audio and tracked movements of the talking head.

10. A computer-implemented method for generating a talking head on a computer display to simulate a conversation, the method comprising:
- continuously rendering a sequence of video frames of a talking head with each frame having mouth characteristics indicative of a non-talking state;
  
  tracking physical movements including translations and rotations of the talking head throughout the sequence of video frames, wherein tracking movements includes transforming affine parameters to physical movements of the talking head for each frame;
  
  calculating interpolated physical movements of the talking head as a function of a corresponding preceding frame and a corresponding succeeding frame for each of a plurality of framesoutputting a voice audio; and
  
  selectively adding a corresponding mouth position to selected frames of the video sequence as a function of the voice audio and tracked movements of the talking head, and wherein adding a mouth position includes, for each of said plurality of frames, adding a mouth position corresponding to the talking state is added as a function of the physical parameters of the frame if a difference in at least one of physical parameters between the frame and the corresponding interpolated physical parameter exceeds a selected threshold, whereas if the difference in at least one of physical parameters between the frame and the corresponding interpolated physical parameter does not exceed the selected threshold, the mouth position corresponding to the talking state is added as a function of interpolated physical parameters.
- View Dependent Claims (11)
- - 11. The computer-implemented method of claim 10 wherein continuously rendering includes returning to an earlier, preselected frame in the sequence upon reaching a selected frame in the sequence.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Zhang, Bo, Shum, Heung-Yeung, Guo, Baining
Primary Examiner(s)
Nguyen, Phu K.

Application Number

US10/099,673
Publication Number

US 20030144055A1
Time in Patent Office

1,475 Days
Field of Search

345/473, 345/474, 345/475, 704/231, 704/235, 704/260
US Class Current

345/473
CPC Class Codes

G06T 13/40 of characters, e.g. humans,...

G10L 2021/105 Synthesis of the lips movem...

Conversational interface agent

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

11 Claims

Specification

Solutions

Use Cases

Quick Links

Conversational interface agent

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

11 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links