System and method of providing conversational visual prosody for talking heads
First Claim
Patent Images
1. A system comprising:
- a processor;
a first module controlling the processor to perform a prosodic analysis and a syntactic analysis of speech data to be spoken by a virtual agent to a user, the prosodic analysis comprising analyzing speech intonations comprising loudness and accent, identifying prosodic phrase boundaries in the speech data, and identifying a type for each of the prosodic phrase boundaries;
a second module controlling the processor to determine a culture of the user based on an analysis of prosody associated with received speech from the user, the analysis being independent of an identity of the user; and
a third module controlling the processor to control movement of the virtual agent according to the prosodic analysis, the syntactic analysis, and the culture of the user and not based on a previously-stored template for controlling the movement, wherein the movement of the virtual agent at each of the prosodic phrase boundaries is selected based on the type identified for each of the prosodic phrase boundaries.
16 Assignments
0 Petitions
Accused Products
Abstract
A system and method of controlling the movement of a virtual agent while the agent is speaking to a human user during a conversation is disclosed. The method comprises receiving speech data to be spoken by the virtual agent, performing a prosodic analysis of the speech data, selecting matching prosody patterns from a speaking database and controlling the virtual agent movement according to the selected prosody patterns.
26 Citations
17 Claims
-
1. A system comprising:
-
a processor; a first module controlling the processor to perform a prosodic analysis and a syntactic analysis of speech data to be spoken by a virtual agent to a user, the prosodic analysis comprising analyzing speech intonations comprising loudness and accent, identifying prosodic phrase boundaries in the speech data, and identifying a type for each of the prosodic phrase boundaries; a second module controlling the processor to determine a culture of the user based on an analysis of prosody associated with received speech from the user, the analysis being independent of an identity of the user; and a third module controlling the processor to control movement of the virtual agent according to the prosodic analysis, the syntactic analysis, and the culture of the user and not based on a previously-stored template for controlling the movement, wherein the movement of the virtual agent at each of the prosodic phrase boundaries is selected based on the type identified for each of the prosodic phrase boundaries. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for controlling movement of a virtual agent on a client device while the virtual agent is speaking to a user, the system comprising a server that:
-
transmits speech data to be spoken by the virtual agent to the client device over a network; generates virtual agent movement data based on a prosodic analysis, a syntactic analysis of the speech data and a culture of the user determined based on an analysis of prosody associated with received speech from the user, independent of an identity of the user, an identification of phrase boundaries in each utterance defined in the speech data and a phrase boundary type for each of the phrase boundaries, and not based on a previously-stored template for controlling the movement of the virtual agent, wherein the virtual agent movement data is configured to synchronize the movement of the virtual agent with phrase boundaries and to reflect a pitch accent associated with the phrase boundary type associated with each of the phrase boundaries; and transmits the virtual agent movement data to the client device over the network for controlling movement of the virtual agent while the virtual agent speaks to the user. - View Dependent Claims (13)
-
-
14. A system for controlling movement of a virtual animated entity during a transition from speaking to listening, the system comprising:
-
a processor; a first module controlling the processor, as the virtual animated entity is concluding a speaking segment, to select transition movement data based at least in part on a syntactic analysis of speech to be spoken by the virtual animated entity and further based on a user culture determined by an analysis of prosody associated with received speech from a user, the analysis being independent of an identity of the user and the transition movement data not based on a previously-stored template for controlling the movement of the virtual animated entity; and a second module controlling the processor to control the movement of the virtual animated entity from a first time the virtual animated entity has approximately finished speaking and through a second time at which the virtual animated entity stops speaking based on the user culture, wherein after the virtual animated entity stops speaking the transition movement data continues to control movement of the virtual animated entity to signal the user to speak. - View Dependent Claims (15)
-
-
16. A system for controlling movement of a virtual animated entity during a transition from talking to listening, the system comprising:
-
a processor; a first module controlling the processor, approximately at an end of the virtual animated entity talking, to select transition movement data based at least in part on a syntactic analysis of speech to be spoken by the virtual animated entity and further based on a user culture determined by an analysis of prosody associated with received speech from a user, the analysis being independent of an identity of the user and the transition movement data not based on a previously-stored template for controlling movement of the virtual animated entity; and a second module controlling the processor to control the movement of the virtual animated entity to indicate that the virtual animated entity is approximately finished talking and will soon listen for speech data from the user based on the user culture, the movement including movement after the virtual animated entity finishes talking to signal the user to speak. - View Dependent Claims (17)
-
Specification