System and method for determining the active talkers in a video conference
First Claim
1. A method of determining an active talker for display on a video conferencing system, the method comprising:
- implementing a state machine, the state machine having an active state and a pause state for each of N participants in a video conferencing session, the state machine also having a silent state;
for each of the N participants, capturing audio data using an audio capture sensor and video data using a video capture sensor;
determining a first state of the state machine;
determining transition probabilities from the first state to other states in the state machine based on corresponding probabilities of active speech by the N participants (pA, pB . . . pN), where the probabilities of active speech are functions of a probability of soft voice detection captured by the audio capture sensor for a corresponding one of the participants and a probability of lip motion detection captured by the video capture sensor for the corresponding one of the participants;
selecting a second state of the state machine based on the transition probabilities, the second state corresponding to an active talker; and
automatically displaying at least the active talker corresponding to the second state.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention describes a method of determining the active talker for display on a video conferencing system, including the steps of: for each participant, capturing audio data using an audio capture sensor and video data using a video capture sensor; determining the probability of active speech (pA, pB . . . pN), where the probability of active speech is a function of the probability of soft voice detection captured by the audio capture sensor and the probability of lip motion detection captured by the video capture sensor; and automatically displaying at least the participant that has the highest probability of active speech.
-
Citations
22 Claims
-
1. A method of determining an active talker for display on a video conferencing system, the method comprising:
-
implementing a state machine, the state machine having an active state and a pause state for each of N participants in a video conferencing session, the state machine also having a silent state; for each of the N participants, capturing audio data using an audio capture sensor and video data using a video capture sensor; determining a first state of the state machine; determining transition probabilities from the first state to other states in the state machine based on corresponding probabilities of active speech by the N participants (pA, pB . . . pN), where the probabilities of active speech are functions of a probability of soft voice detection captured by the audio capture sensor for a corresponding one of the participants and a probability of lip motion detection captured by the video capture sensor for the corresponding one of the participants; selecting a second state of the state machine based on the transition probabilities, the second state corresponding to an active talker; and automatically displaying at least the active talker corresponding to the second state. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer readable storage device or storage disk comprising computer-readable instructions which, when executed, cause a processor to at least:
-
implement a state machine, the state machine having an active state and a pause state for each of N participants in a video conferencing session, the state machine also having a silent state; for each of the N participants, capture audio data using an audio capture sensor and video data using a video capture sensor; determine a first state of the state machine; determine transition probabilities from the first state to other states in the state machine based on corresponding probabilities of active speech by the N participants (pA*, pB* . . . pN*), where the probabilities of active speech are functions of a probability of soft voice detection captured by the audio capture sensor for a corresponding one of the participants and a probability of lip motion detection captured by the video capture sensor for the corresponding one of the participants; select a second state of the state machine based on the transition probabilities; and automatically display at least an active talker corresponding to the second state.
-
-
22. An apparatus for providing feedback to a participant in a video conference, the apparatus comprising:
-
a processor; and a memory including computer-readable instructions which, when executed by the processor, cause the processor to at least; implement a state machine, the state machine having an active state and a pause state for each of N participants in a video conferencing session, the state machine also having a silent state; for each of the N participants, capture audio data using an audio capture sensor and video data using a video capture sensor; determine a first state of the state machine; determine transition probabilities from the first state to other states in the state machine based on corresponding probabilities of active speech by the N participants (pA*, pB* . . . pN*), where the probabilities of active speech are functions of a probability of soft voice detection captured by the audio capture sensor for a corresponding one of the participants and a probability of lip motion detection captured by the video capture sensor for the corresponding one of the participants; select a second state of the state machine based on the transition probabilities; and automatically display at least an active talker corresponding to the second state.
-
Specification