Method and apparatus for focus-of-attention control
First Claim
Patent Images
1. A method for virtual camera control implemented by one or more computing devices, comprising:
- acquiring at one or more computing devices a media stream having an audio component and a video component;
processing the video component to detect one or more video participants;
processing the video component to determine a video speaking state indicating a probability that at least one of the one or more detected video participants are currently speaking;
processing the audio component to detect one or more audio participants;
processing the audio component to determine an audio speaking state indicating a probability that at least one of the one or more detected audio participants are currently speaking;
identifying a person and a speaking state, associated with the person, based on the video speaking state and the audio speaking state; and
applying at least one video transformation to the video component based at least in part on the identified person and speaking state.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are methods for automatically generating commands to transform a video sequence based on information regarding speaking participants derived from the audio and video signals. The audio stream is analyzed to detect individual speakers and the video is optionally analyzed to detect lip movement to determine a probability that a detected participant is speaking. Commands are then generated to transform the video stream consistent with the identified speaker.
-
Citations
20 Claims
-
1. A method for virtual camera control implemented by one or more computing devices, comprising:
-
acquiring at one or more computing devices a media stream having an audio component and a video component; processing the video component to detect one or more video participants; processing the video component to determine a video speaking state indicating a probability that at least one of the one or more detected video participants are currently speaking; processing the audio component to detect one or more audio participants; processing the audio component to determine an audio speaking state indicating a probability that at least one of the one or more detected audio participants are currently speaking; identifying a person and a speaking state, associated with the person, based on the video speaking state and the audio speaking state; and applying at least one video transformation to the video component based at least in part on the identified person and speaking state. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An apparatus for virtual camera control implemented by one or more computing devices, comprising:
-
a memory; a processor operative to retrieve instructions from the memory and execute them to; acquire at one or more computing devices a media stream having an audio component and a video component; process the video component to detect one or more video participants; process the video component to determine a video speaking state indicating a probability that at least one of the one or more detected video participants are currently speaking; process the audio component to detect one or more audio participants; process the audio component to determine an audio speaking state indicating a probability that at least one of the one or more detected audio participants are currently speaking; identify a person and a speaking state, associated with the person, based on the video speaking state and the audio speaking state; and apply at least one video transformation to the video component based at least in part on the identified person and speaking state. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A method for virtual camera control implemented by one or more computing devices, comprising:
-
acquiring at one or more computing devices a media stream having an audio component and a video component; processing the video component to detect one or more video participants; processing the video component to detect a location of at least one of the one or more video participants; processing the audio component to detect a location of at least one of one or more audio participants; processing the audio component to determine an audio speaking state for the at least one of the one or more audio participants, the audio speaking state indicating a probability that the at least one of the one or more audio participants is currently speaking; identifying a location of a person and a speaking state, associated with the person, based on the audio speaking state and the location of the at least one of the one or more audio participants, and the location of the at least one of the one or more video participants; and applying at least one video transformation to the video component based at least in part on the identified location of the person and the speaking state. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification