System and method of providing conversational visual prosody for talking heads

US 8,131,551 B1
Filed: 07/18/2006
Issued: 03/06/2012
Est. Priority Date: 05/16/2002
Status: Expired due to Term

First Claim

Patent Images

1. A system comprising:

a processor;

a first module controlling the processor to perform a prosodic analysis and a syntactic analysis of speech data to be spoken by a virtual agent to a user, the prosodic analysis comprising analyzing speech intonations comprising loudness and accent, identifying prosodic phrase boundaries in the speech data, and identifying a type for each of the prosodic phrase boundaries;

a second module controlling the processor to determine a culture of the user based on an analysis of prosody associated with received speech from the user, the analysis being independent of an identity of the user; and

a third module controlling the processor to control movement of the virtual agent according to the prosodic analysis, the syntactic analysis, and the culture of the user and not based on a previously-stored template for controlling the movement, wherein the movement of the virtual agent at each of the prosodic phrase boundaries is selected based on the type identified for each of the prosodic phrase boundaries.

View all claims

16 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method of controlling the movement of a virtual agent while the agent is speaking to a human user during a conversation is disclosed. The method comprises receiving speech data to be spoken by the virtual agent, performing a prosodic analysis of the speech data, selecting matching prosody patterns from a speaking database and controlling the virtual agent movement according to the selected prosody patterns.

26 Citations

View as Search Results

17 Claims

1. A system comprising:
- a processor;
  
  a first module controlling the processor to perform a prosodic analysis and a syntactic analysis of speech data to be spoken by a virtual agent to a user, the prosodic analysis comprising analyzing speech intonations comprising loudness and accent, identifying prosodic phrase boundaries in the speech data, and identifying a type for each of the prosodic phrase boundaries;
  
  a second module controlling the processor to determine a culture of the user based on an analysis of prosody associated with received speech from the user, the analysis being independent of an identity of the user; and
  
  a third module controlling the processor to control movement of the virtual agent according to the prosodic analysis, the syntactic analysis, and the culture of the user and not based on a previously-stored template for controlling the movement, wherein the movement of the virtual agent at each of the prosodic phrase boundaries is selected based on the type identified for each of the prosodic phrase boundaries.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The system of claim 1, wherein the movement of the virtual agent is controlled to be approximately simultaneous with the received speech that triggers the movement.
  - 3. The system of claim 1, further comprising:
    - a fourth module controlling the processor to receive the speech data to be spoken by the virtual agent to the user.
  - 4. The system of claim 1, wherein the movement of the virtual agent is at least one of:
    - raising a head of the virtual agent, raising at least one eyebrow of the virtual agent, and nodding the head of the virtual agent.
  - 5. The system of claim 1, wherein the system is a client device that communicates over a network with a server.
  - 6. The system of claim 5, further comprising:
    - a fourth module controlling the processor, after the virtual agent finishes speaking to the user, to receive additional speech data from the user and transmit the additional speech data to the server for speech processing in order to generate a response by the virtual agent to the user.
  - 7. The system of claim 6, further comprising:
    - a fifth module controlling the processor, after the user finishes a speech segment, to receive responsive speech from the server for generating a virtual agent response to the user.
  - 8. The system of claim 7, wherein the third module controls the processor to control movement of the virtual agent to be approximately simultaneous with at least one of user speech data and virtual agent speech data that triggers the movement.
  - 9. The system of claim 5, wherein the movement of the virtual agent while the virtual agent speaks to the user is based on at least one of additional speech data received over the network from the server and virtual agent movement data received over the network from the server.
  - 10. The system of claim 5, wherein the network comprises at least one of the Internet, a packet network, a wireless network and an Internet Protocol network.
  - 11. The system of claim 1, wherein the prosodic analysis further comprises selecting segments of matching visual prosody patterns from an audio-visual database of recorded speech, where both audio and video are recorded of a person speaking, and wherein the third module controls the movement of the virtual agent according to matched visual prosody patterns.

12. A system for controlling movement of a virtual agent on a client device while the virtual agent is speaking to a user, the system comprising a server that:
- transmits speech data to be spoken by the virtual agent to the client device over a network;
  
  generates virtual agent movement data based on a prosodic analysis, a syntactic analysis of the speech data and a culture of the user determined based on an analysis of prosody associated with received speech from the user, independent of an identity of the user, an identification of phrase boundaries in each utterance defined in the speech data and a phrase boundary type for each of the phrase boundaries, and not based on a previously-stored template for controlling the movement of the virtual agent, wherein the virtual agent movement data is configured to synchronize the movement of the virtual agent with phrase boundaries and to reflect a pitch accent associated with the phrase boundary type associated with each of the phrase boundaries; and
  
  transmits the virtual agent movement data to the client device over the network for controlling movement of the virtual agent while the virtual agent speaks to the user.
- View Dependent Claims (13)
- - 13. The system of claim 12, wherein the network comprises at least one of:
    - the Internet, a packet network, a wireless network and an Internet Protocol network.

14. A system for controlling movement of a virtual animated entity during a transition from speaking to listening, the system comprising:
- a processor;
  
  a first module controlling the processor, as the virtual animated entity is concluding a speaking segment, to select transition movement data based at least in part on a syntactic analysis of speech to be spoken by the virtual animated entity and further based on a user culture determined by an analysis of prosody associated with received speech from a user, the analysis being independent of an identity of the user and the transition movement data not based on a previously-stored template for controlling the movement of the virtual animated entity; and
  
  a second module controlling the processor to control the movement of the virtual animated entity from a first time the virtual animated entity has approximately finished speaking and through a second time at which the virtual animated entity stops speaking based on the user culture, wherein after the virtual animated entity stops speaking the transition movement data continues to control movement of the virtual animated entity to signal the user to speak.
- View Dependent Claims (15)
- - 15. The system of claim 14, wherein the transition movement data is selected from a transition movement database.

16. A system for controlling movement of a virtual animated entity during a transition from talking to listening, the system comprising:
- a processor;
  
  a first module controlling the processor, approximately at an end of the virtual animated entity talking, to select transition movement data based at least in part on a syntactic analysis of speech to be spoken by the virtual animated entity and further based on a user culture determined by an analysis of prosody associated with received speech from a user, the analysis being independent of an identity of the user and the transition movement data not based on a previously-stored template for controlling movement of the virtual animated entity; and
  
  a second module controlling the processor to control the movement of the virtual animated entity to indicate that the virtual animated entity is approximately finished talking and will soon listen for speech data from the user based on the user culture, the movement including movement after the virtual animated entity finishes talking to signal the user to speak.
- View Dependent Claims (17)
- - 17. The system of claim 16, wherein the transition movement data is selected from a transition database.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Interactions, LLC
Original Assignee
AT&T Intellectual Property II LP (AT&T, Inc.)
Inventors
Cosatto, Eric, Graf, Hans Peter, Strom, Volker Franz
Primary Examiner(s)
Saint Cyr, Leonard

Application Number

US11/458,282
Time in Patent Office

2,058 Days
Field of Search

704/270.1, 704/275, 704/276, 348/14.01, 358/2.1, 382/100
US Class Current

704/270
CPC Class Codes

G10L 15/1807 using prosody or stress

G10L 2021/105 Synthesis of the lips movem...

System and method of providing conversational visual prosody for talking heads

First Claim

16 Assignments

0 Petitions

Accused Products

Abstract

26 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

System and method of providing conversational visual prosody for talking heads

First Claim

16 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

26 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links