Method and apparatus for predicting events in video conferencing and other applications

US 20020101505A1
Filed: 12/05/2000
Published: 08/01/2002
Est. Priority Date: 12/05/2000
Status: Active Grant

First Claim

Patent Images

1. A method for predicting an event using at least one of audio and video information, the method comprising the steps of:

establishing a plurality of cues defining behavior characteristics that suggest that a given event; and

processing at least one of said audio and video information to identify one of said cues.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and apparatus are disclosed for predicting events using acoustic and visual cues. The present invention processes audio and video information to identify one or more (i) acoustic cues, such as intonation patterns, pitch and loudness, (ii) visual cues, such as gaze, facial pose, body postures, hand gestures and facial expressions, or (iii) a combination of the foregoing, that are typically associated with an event, such as behavior exhibited by a video conference participant before he or she speaks. In this manner, the present invention allows the video processing system to predict events, such as the identity of the next speaker. The predictive speaker identifier operates in a learning mode to learn the characteristic profile of each participant in terms of the concept that the participant “will speak” or “will not speak” under the presence or absence of one or more predefined visual or acoustic cues. The predictive speaker identifier operates in a predictive mode to compare the learned characteristics embodied in the characteristic profile to the audio and video information and thereby predict the next speaker.

Citations

25 Claims

1. A method for predicting an event using at least one of audio and video information, the method comprising the steps of:
- establishing a plurality of cues defining behavior characteristics that suggest that a given event; and
  
  processing at least one of said audio and video information to identify one of said cues.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein said plurality of cues includes at least one cue identifying behavior that is typically exhibited by a person before said person speaks.
  - 3. The method of claim 1, wherein said plurality of cues includes at least one acoustic cue identifying behavior that is typically exhibited by a person when said person is about to end speaking.
  - 4. The method of claim 1, further comprising the step of obtaining an image of said person associated with said identified cue.
  - 5. The method of claim 1, further comprising the step of maintaining a profile for at least one person that establishes thresholds for one or more of said plurality of cues.
  - 6. The method of claim 1, wherein said cue indicates that a person is about to fall asleep.

7. A method for tracking a speaker in a video processing system, said video processing system processing at least one of audio and video information, the method comprising the steps of:
- processing at least one of said audio and video information to identify one of a plurality of cues defining behavior characteristics that suggest that a person is about to speak; and
  
  obtaining an image of said person associated with said identified cue.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 8. The method of claim 7, wherein at least one camera is focused in accordance with pan, tilt and zoom values associated with a person associated with said cue.
  - 9. The method of claim 7, wherein said plurality of cues includes at least one visual cue identifying behavior that is typically exhibited by a person before said person speaks.
  - 10. The method of claim 7, wherein said plurality of cues includes at least one acoustic cue identifying behavior that is typically exhibited by a person before said person speaks.
  - 11. The method of claim 7, wherein said event is associated with a person is about to end speaking.
  - 12. The method of claim 7, further comprising the step of obtaining an image of said person associated with said identified cue.
  - 13. The method of claim 9, wherein said visual cue includes detecting the eyes of a person looking at a current speaker.
  - 14. The method of claim 9, wherein said visual cue includes detecting the raising of a hand or finger by a person.
  - 15. The method of claim 9, wherein said visual cue includes detecting a facial pose of a person in the direction of a current speaker.
  - 16. The method of claim 9, wherein said visual cue includes detecting a nodding of the head by a person.
  - 17. The method of claim 9, wherein said visual cue includes detecting a smile by a person.
  - 18. The method of claim 9, wherein said visual cue includes detecting a person leaning forward.
  - 19. The method of claim 10, wherein said acoustic cue includes non-verbal speech suggesting a person is about to speak.
  - 20. The method of claim 7, further comprising the step of obtaining a pan-view with a camera when one of said cues indicates a person is about to end speaking.
  - 21. The method of claim 7, further comprising the step of maintaining a profile for at least one person that establishes thresholds for one or more of said plurality of cues.

22. A system for predicting an event using at least one of audio and video information, comprising:
- a memory for storing computer readable code; and
  
  a processor operatively coupled to said memory, said processor configured to;
  
  establish a plurality of cues defining behavior characteristics that suggest that a given event; and
  
  process at least one of said audio and video information to identify one of said cues.

23. A system for tracking a speaker in a video processing system, said video processing system processing at least one of audio and video information, comprising:
- a memory for storing computer readable code; and
  
  a processor operatively coupled to said memory, said processor configured to;
  
  process at least one of said audio and video information to identify one of a plurality of cues defining behavior characteristics that suggest that a person is about to speak; and
  
  obtain an image of said person associated with said identified cue.

24. An article of manufacture for predicting an event using at least one of audio and video information, comprising:
- a computer readable medium having computer readable code means embodied thereon, said computer readable program code means comprising;
  
  a step to establish a plurality of cues defining behavior characteristics that suggest that a given event; and
  
  a step to process at least one of said audio and video information to identify one of said cues.

25. An article of manufacture for tracking a speaker in a video processing system, said video processing system processing at least one of audio and video information, comprising:
- a computer readable medium having computer readable code means embodied thereon, said computer readable program code means comprising;
  
  a step to process at least one of said audio and video information to identify one of a plurality of cues defining behavior characteristics that suggest that a person is about to speak; and
  
  a step to obtain an image of said person associated with said identified cue.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Pendragon Wireless LLC (Pendrell Corporation)
Original Assignee
Philips Electronics North America Corporation (Koninklijke Philips N.V.)
Inventors
Gutta, Srinivas, Strubbe, Hugo, Colmenarez, Antonio

Granted Patent

US 6,894,714 B2
Time in Patent Office

Days
Field of Search
US Class Current

348/14.07
CPC Class Codes

H04N 7/15 Conference systems

Method and apparatus for predicting events in video conferencing and other applications

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for predicting events in video conferencing and other applications

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links