System and method for determining the active talkers in a video conference

US 9,154,730 B2
Filed: 10/16/2009
Issued: 10/06/2015
Est. Priority Date: 10/16/2009
Status: Active Grant

First Claim

Patent Images

1. A method of determining an active talker for display on a video conferencing system, the method comprising:

implementing a state machine, the state machine having an active state and a pause state for each of N participants in a video conferencing session, the state machine also having a silent state;

for each of the N participants, capturing audio data using an audio capture sensor and video data using a video capture sensor;

determining a first state of the state machine;

determining transition probabilities from the first state to other states in the state machine based on corresponding probabilities of active speech by the N participants (p_A, p_B. . . p_N), where the probabilities of active speech are functions of a probability of soft voice detection captured by the audio capture sensor for a corresponding one of the participants and a probability of lip motion detection captured by the video capture sensor for the corresponding one of the participants;

selecting a second state of the state machine based on the transition probabilities, the second state corresponding to an active talker; and

automatically displaying at least the active talker corresponding to the second state.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention describes a method of determining the active talker for display on a video conferencing system, including the steps of: for each participant, capturing audio data using an audio capture sensor and video data using a video capture sensor; determining the probability of active speech (p_A, p_B. . . p_N), where the probability of active speech is a function of the probability of soft voice detection captured by the audio capture sensor and the probability of lip motion detection captured by the video capture sensor; and automatically displaying at least the participant that has the highest probability of active speech.

Citations

22 Claims

1. A method of determining an active talker for display on a video conferencing system, the method comprising:
- implementing a state machine, the state machine having an active state and a pause state for each of N participants in a video conferencing session, the state machine also having a silent state;
  
  for each of the N participants, capturing audio data using an audio capture sensor and video data using a video capture sensor;
  
  determining a first state of the state machine;
  
  determining transition probabilities from the first state to other states in the state machine based on corresponding probabilities of active speech by the N participants (p_A, p_B. . . p_N), where the probabilities of active speech are functions of a probability of soft voice detection captured by the audio capture sensor for a corresponding one of the participants and a probability of lip motion detection captured by the video capture sensor for the corresponding one of the participants;
  
  selecting a second state of the state machine based on the transition probabilities, the second state corresponding to an active talker; and
  
  automatically displaying at least the active talker corresponding to the second state.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 2. The method recited in claim 1 further including denoising the probabilities of active speech by applying a smoothing filter to generate smoothed probability values, wherein the smoothed probability values are represented by (p_A*, p_B* . . . p_N*).
  - 3. The method recited in claim 2, wherein the smoothing filter is a nonlinear filter.
  - 4. The method recited in claim 2, wherein the smoothing filter is a linear filter.
  - 5. The method recited in claim 2, further including applying a median filter to an output of the smoothing filter.
  - 6. The method recited in claim 2, further including determining a maximum value of the smoothed probability values (p_A*, p_B* . . . p_N*).
  - 7. The method recited in claim 6, further including comparing the maximum value of the smoothed probability values (p_A*, p_B* . . . p_N*) to a threshold value.
  - 8. The method recited in claim 7, further including, when the maximum value of the smoothed probability values (p_A*, p_B* . . . p_N*) is greater than the threshold value, setting the state machine to one of the active states that corresponds to the maximum value of the smoothed probability values.
  - 9. The method recited in claim 8, further including setting the state machine to the silent state when the maximum value of the smoothed probability values (p_A*, p_B* . . . p_N*) is less than the threshold value.
  - 10. The method recited in claim 2, further including ranking the smoothed probability values (p_A*, p_B* . . . p_N*).
  - 11. The method recited in claim 10, further including comparing a lowest ranked one of the smoothed probability values (p_A*, p_B* . . . p_N*) to a threshold value.
  - 12. The method recited in claim 11, further including setting the state machine to one of the active states that corresponds to the lowest ranked participant when the lowest ranked one of the smoothed probability values (p_A*, p_B* . . . p_N*) is greater than the threshold value.
  - 13. The method recited in claim 12, further including setting the state machine to the silent state when the lowest ranked one of the smoothed probability values (p_A*, p_B* . . . p_N*) is less than the threshold value.
  - 14. The method recited in claim 2, further including determining the probabilities of p*=(p_A*, p_B* . . . p_N*), given all possible states Pr (p*Is) where s is all of the possible 2N+1 states.
  - 15. The method recited in claim 14, further including finding a sequence of the states using dynamic programming.
  - 16. The method recited in claim 15, wherein finding the sequence of the states using dynamic programming includes using a truncated Viterbi algorithm.
  - 17. The method recited in claim 16, wherein the transition probabilities are modulated based on a duration in the first state.
  - 18. The method as defined in claim 1, wherein the state machine is one state machine having (1) N active states corresponding to the N participants, (2) N inactive states corresponding to the N participants, and (3) the silent state, N being at least three.
  - 19. The method as defined in claim 18, wherein the transition probabilities include N+1 probabilities corresponding to the N active states and the silent state.
  - 20. The method as defined in claim 1, wherein the determining of the transition probabilities includes determining three or more transition probabilities from the first state.

21. A computer readable storage device or storage disk comprising computer-readable instructions which, when executed, cause a processor to at least:
- implement a state machine, the state machine having an active state and a pause state for each of N participants in a video conferencing session, the state machine also having a silent state;
  
  for each of the N participants, capture audio data using an audio capture sensor and video data using a video capture sensor;
  
  determine a first state of the state machine;
  
  determine transition probabilities from the first state to other states in the state machine based on corresponding probabilities of active speech by the N participants (p_A*, p_B* . . . p_N*), where the probabilities of active speech are functions of a probability of soft voice detection captured by the audio capture sensor for a corresponding one of the participants and a probability of lip motion detection captured by the video capture sensor for the corresponding one of the participants;
  
  select a second state of the state machine based on the transition probabilities; and
  
  automatically display at least an active talker corresponding to the second state.

22. An apparatus for providing feedback to a participant in a video conference, the apparatus comprising:
- a processor; and
  
  a memory including computer-readable instructions which, when executed by the processor, cause the processor to at least;
  
  implement a state machine, the state machine having an active state and a pause state for each of N participants in a video conferencing session, the state machine also having a silent state;
  
  for each of the N participants, capture audio data using an audio capture sensor and video data using a video capture sensor;
  
  determine a first state of the state machine;
  
  determine transition probabilities from the first state to other states in the state machine based on corresponding probabilities of active speech by the N participants (p_A*, p_B* . . . p_N*), where the probabilities of active speech are functions of a probability of soft voice detection captured by the audio capture sensor for a corresponding one of the participants and a probability of lip motion detection captured by the video capture sensor for the corresponding one of the participants;
  
  select a second state of the state machine based on the transition probabilities; and
  
  automatically display at least an active talker corresponding to the second state.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hewlett-Packard Development Company, L.P. (HP Inc.)
Original Assignee
Hewlett-Packard Development Company, L.P. (HP Inc.)
Inventors
Lee, Bowon, Mukherjee, Debargha
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Villena, Mark

Application Number

US12/580,958
Publication Number

US 20110093273A1
Time in Patent Office

2,181 Days
Field of Search

704/270, 715/811, 715/767
US Class Current

1/1
CPC Class Codes

G10L 15/24   Speech recognition using no...

G10L 25/78   Detection of presence or ab...

H04L 65/4038   with floor control

H04N 21/4394   involving operations for an...

H04N 21/44008   involving operations for an...

H04N 21/440245   the reformatting operation ...

H04N 21/4788   communicating with other us...

H04N 7/15   Conference systems

System and method for determining the active talkers in a video conference

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for determining the active talkers in a video conference

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links