Method and apparatus for using face detection information to improve speaker segmentation
First Claim
Patent Images
1. A method comprising:
- obtaining media, the media including a video stream and an audio stream through an input/output (I/O) interface of a computing system;
detecting a number of faces visible in the video stream; and
performing a speaker segmentation on the media, wherein performing the speaker segmentation includes utilizing the number of faces visible in the video stream to augment the speaker segmentation, the speaker segmentation being performed by the computing system.
1 Assignment
0 Petitions
Accused Products
Abstract
In one embodiment, a method includes obtaining media that includes a video stream and an audio stream. The method also includes detecting a number of faces visible in the video stream, and performing a speaker segmentation on the media. Performing the speaker segmentation on the media includes utilizing the number of faces visible in the video stream to augment the speaker segmentation.
-
Citations
17 Claims
-
1. A method comprising:
-
obtaining media, the media including a video stream and an audio stream through an input/output (I/O) interface of a computing system; detecting a number of faces visible in the video stream; and performing a speaker segmentation on the media, wherein performing the speaker segmentation includes utilizing the number of faces visible in the video stream to augment the speaker segmentation, the speaker segmentation being performed by the computing system. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A tangible, non-transitory computer-readable medium comprising computer program code, the computer program code, when executed, configured to:
-
obtain media, the media including a video stream and an audio stream; detect a number of faces visible in the video stream; and perform a speaker segmentation on the media, wherein the computer program code configured to perform the speaker segmentation includes computer program code operable to utilize the number of faces visible in the video stream to augment the speaker segmentation. - View Dependent Claims (7, 8, 9, 10)
-
-
11. An apparatus comprising:
-
a face detection arrangement, the face detection arrangement being configured to process a video component of media to identify a number of faces in the video component; and a speaker segmentation arrangement, the speaker segmentation arrangement being configured to process an audio component of the media to identify a speaker change in the audio component, wherein the speaker segmentation arrangement is configured to use the number of faces in the video component when processing the audio component to identify the speaker change; and a processor, wherein the face detection arrangement and the speaker segmentation arrangement are embodied as logic on a tangible, non-transitory computer-readable medium, and wherein the logic is arranged to be executed by the processor. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
Specification