Audio and depth based sound source localization
First Claim
Patent Images
1. A system comprising:
- a microphone array having microphones that are responsive to sounds generated from within an environment to produce input microphone signals;
a depth sensor that is configured to capture a depth image of at least a portion of the environment; and
operating logic configured to perform acts comprising;
analyzing one or more of the input microphone signals to detect a sound that has been produced from a source position;
analyzing the one or more of the input microphone signals to determine differences in arrival times of the sound at the microphones of the microphone array;
analyzing the differences in arrival times to determine a first estimate of the source position of the sound, the first estimate of the source position of the sound including a first confidence level based on a signal-to-noise ratio associated with the one or more of the input microphone signals;
analyzing at least a portion of the depth image that encompasses the source position to detect a human body part in the depth image;
determining a position of the human body part based at least in part on the depth image, the position of the human body part being associated with a second confidence level based at least in part on an accuracy of the position; and
determining a second estimate of the source position based at least in part on a weighted average of the position of the human body part and the first estimate of the source position of the sound, the weighted average based at least in part on the first confidence level and the second confidence level.
2 Assignments
0 Petitions
Accused Products
Abstract
A system may utilize sound localization techniques, such as time-difference-of-arrival techniques, to estimate an audio-based sound source position from which a sound originates. An optical image or depth map of an area containing the sound source location may then captured and analyzed to detect an object that is known or expected to have produced the sound. The position of the object may also be determined based on the analysis of the optical image or depth map. The position of the sound source may then be determined based at least in part on the position of the detected object or on a combination of the audio-based sound source position and the determined position of the object.
31 Citations
22 Claims
-
1. A system comprising:
-
a microphone array having microphones that are responsive to sounds generated from within an environment to produce input microphone signals; a depth sensor that is configured to capture a depth image of at least a portion of the environment; and operating logic configured to perform acts comprising; analyzing one or more of the input microphone signals to detect a sound that has been produced from a source position; analyzing the one or more of the input microphone signals to determine differences in arrival times of the sound at the microphones of the microphone array; analyzing the differences in arrival times to determine a first estimate of the source position of the sound, the first estimate of the source position of the sound including a first confidence level based on a signal-to-noise ratio associated with the one or more of the input microphone signals; analyzing at least a portion of the depth image that encompasses the source position to detect a human body part in the depth image; determining a position of the human body part based at least in part on the depth image, the position of the human body part being associated with a second confidence level based at least in part on an accuracy of the position; and determining a second estimate of the source position based at least in part on a weighted average of the position of the human body part and the first estimate of the source position of the sound, the weighted average based at least in part on the first confidence level and the second confidence level. - View Dependent Claims (2, 3, 4)
-
-
5. A method comprising:
-
receiving multiple input microphone signals; analyzing one or more of the multiple input microphone signals to determine a first estimate of a source position of a sound produced by an object; capturing a depth image of an area that encompasses the first estimate of the source position of the sound; determining a position of the object based at least in part on the depth image; and determining a second estimate of the source position of the sound based at least in part on a weighted average of the first estimate of the source position of the sound and the position of the object, the weighted average based at least in part on a first confidence level associated with the first estimate of the source position of the sound and a second confidence level associated with the position of the object, the first confidence level based at least in part on a signal-to-noise ratio associated with the one or more of the multiple input microphone signals. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method comprising:
-
analyzing one or more input microphone signals to determine a first estimate of a source position of a sound emitted in an environment; capturing an image that encompasses the first estimate of the source position of the sound; analyzing the image to detect a position of a source of the sound in the image; and determining a second estimate of the source position of the sound based at least in part on a weighted average of the first estimate of the source position of the sound and the position of the source of the sound in the image, the weighted average based at least in part on a first confidence level associated with the first estimate of the source position of the sound and a second confidence level associated with the position of the source of the sound in the image, the first confidence level based at least in part on a signal-to-noise ratio associated with the one or more input microphone signals. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22)
-
Specification