Techniques for separating and evaluating audio and video source data
First Claim
Patent Images
1. A method, comprising:
- electronically capturing visual features associated with a speaker speaking;
electronically capturing audio;
matching selective portions of the audio with the visual features; and
identifying the remaining portions of the audio as potential noise not associated with the speaker speaking.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus are provided to separate and evaluate audio and video. Audio and video are captured; the audio is evaluated to detect one or more speakers speaking. Visual features are associated with the speakers speaking. The audio and video are separated and corresponding portions of the audio are mapped to the visual features for purposes of isolating audio associated with each speaker and for purposes of filtering out noise associated with the audio.
74 Citations
28 Claims
-
1. A method, comprising:
-
electronically capturing visual features associated with a speaker speaking;
electronically capturing audio;
matching selective portions of the audio with the visual features; and
identifying the remaining portions of the audio as potential noise not associated with the speaker speaking. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method, comprising:
-
monitoring an electronic video of a first speaker and a second speaker;
concurrently capturing audio associated with the first and second speaker speaking;
analyzing the video to detect when the first and second speakers are moving their respective mouths; and
matching portions of the captured audio to the first speaker and other portions to the second speaker based on the analysis. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system, comprising:
-
a camera;
a microphone; and
a processing device, wherein the camera captures video of a speaker and communicates the video to the processing device, the microphone captures audio associated with the speaker and an environment of the speaker and communicates the audio to the processing device, the processing device includes instructions that identifies visual features of the video where the speaker is speaking and uses time dependencies to match portions of the audio to those visual features. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A machine accessible medium having associated instructions, which when accessed, results in a machine performing:
-
separating audio and video associated with a speaker speaking;
identifying visual features from the video that indicate a mouth of the speaker is moving or not moving; and
associating portions of the audio with selective ones of the visual features that indicate the mouth is moving. - View Dependent Claims (21, 22, 23, 24)
-
-
25. An apparatus, residing in a computer-accessible medium, comprising:
-
face detection logic;
mouth detection logic; and
audio-video matching logic, wherein the face detection logic detects a face of a speaker within a video, the mouth detection logic detects and monitors movement and non-movement of a mouth included within the face of the video, and the audio-video matching logic matches portions of captured audio with any movements identified by the mouth detection logic. - View Dependent Claims (26, 27, 28)
-
Specification