VOICE-BODY IDENTITY CORRELATION
First Claim
Patent Images
1. In a multi-user application starting with an unknown set of users, a method of identifying a correlation between a user and user voice, the method comprising the steps of:
- (a) receiving a plurality of images of objects within a field of view of a video capture component taken over a plurality of time periods;
(b) determining whether the images received in said step (a) include one or more users;
(c) receiving audio within the range of a microphone array for a plurality of time periods;
(d) determining whether the audio received in said step (c) includes one or more human voices; and
(e) correlating a voice identified in said step (d) to a user of the one or more users within the field of view based on a plurality of samplings of determined positions of the user in different images and determined source locations of the voice at different times.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method are disclosed for tracking image and audio data over time to automatically identify a person based on a correlation of their voice with their body in a multi-user game or multimedia setting.
-
Citations
20 Claims
-
1. In a multi-user application starting with an unknown set of users, a method of identifying a correlation between a user and user voice, the method comprising the steps of:
-
(a) receiving a plurality of images of objects within a field of view of a video capture component taken over a plurality of time periods; (b) determining whether the images received in said step (a) include one or more users; (c) receiving audio within the range of a microphone array for a plurality of time periods; (d) determining whether the audio received in said step (c) includes one or more human voices; and (e) correlating a voice identified in said step (d) to a user of the one or more users within the field of view based on a plurality of samplings of determined positions of the user in different images and determined source locations of the voice at different times. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. In a multi-user application where correlation of a voice to a user may require more than a single sampling of the voice and user location, a method of identifying a correlation between a user and user voice, the method comprising the steps of:
-
(a) receiving a plurality of images of objects within a field of view of a video capture component taken over a plurality of time periods; (b) determining whether the images received in said step (a) include one or more users; (c) receiving audio within the range of a microphone array for a plurality of time periods covering the plurality of images; (d) determining whether the audio received in said step (c) includes one or more human voices; (e) performing an initial sampling examining a location of one or more users with respect to an image capture component and a location of a voice with respect to an audio capture component, the initial sampling determining the voice is correlated to a user of the one or more users above a threshold confidence level; and (f) performing additional samplings examining locations of the one or more users with respect to the image capture component and locations of the voice with respect to the audio capture component, the additional samplings confirming the correlation of the voice with the user or the additional samplings reducing a likelihood that the voice is correlated to the user. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A system for correlating a voice to user in a multi-user application, the system comprising:
-
an image camera component capable of providing a depth image of one or more users in a field of view of the image camera component; a microphone array capable of receiving audio within range of the microphone array, the microphone array capable of localizing a source of a voice to within a first tolerance; and a computing environment in communication with both the image capture component and microphone array, the computing environment capable of distinguishing between different users in the field of view to a second tolerance, the first and second tolerances at times preventing correlation of the voice to a user of the one or more users after an initial sampling of data from the image camera and data from the microphone array, the computing environment further performing additional samplings of data from the image camera and data from the microphone array, the additional samplings allowing the correlation of the voice with the user or the additional samplings reducing a likelihood that the voice is correlated to the user. - View Dependent Claims (18, 19, 20)
-
Specification