Voice-body identity correlation
First Claim
Patent Images
1. A method of identifying a correlation between a user and user voice, the method comprising the steps of:
- (a) receiving images of one or more users taken over a plurality of time periods;
(b) receiving audio of voices for a plurality of time periods; and
(c) correlating a voice identified in said step (b) to a user of the one or more users based on a plurality of samplings of determined positions of the user in different images and determined source locations of the voice at different times, said step (c) comprising the step of performing a first sampling of the plurality of samplings to derive a scored confidence level of an association between the voice and a user, the scored confidence level obtained by examining one or more of the following factors;
i. how close the estimated position of the voice source is to the one or more users;
ii. the number of voices which are being heard;
iii. the closeness of the one or more users to an estimated source of the voice;
iv. whether the source of the voice is estimated to be centered within a field of view of the image or closer to edges of the field of view.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method are disclosed for tracking image and audio data over time to automatically identify a person based on a correlation of their voice with their body in a multi-user game or multimedia setting.
199 Citations
18 Claims
-
1. A method of identifying a correlation between a user and user voice, the method comprising the steps of:
-
(a) receiving images of one or more users taken over a plurality of time periods; (b) receiving audio of voices for a plurality of time periods; and (c) correlating a voice identified in said step (b) to a user of the one or more users based on a plurality of samplings of determined positions of the user in different images and determined source locations of the voice at different times, said step (c) comprising the step of performing a first sampling of the plurality of samplings to derive a scored confidence level of an association between the voice and a user, the scored confidence level obtained by examining one or more of the following factors; i. how close the estimated position of the voice source is to the one or more users; ii. the number of voices which are being heard; iii. the closeness of the one or more users to an estimated source of the voice; iv. whether the source of the voice is estimated to be centered within a field of view of the image or closer to edges of the field of view. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. In a multi-user application where correlation of a voice to a user may require more than a single sampling of the voice and user location, a method of identifying a correlation between a user and user voice, the method comprising the steps of:
-
(a) receiving a plurality of images including one or more users; (b) receiving audio for a plurality of time periods covering the plurality of images; (c) performing an initial sampling examining a location of one or more users and a location of a voice, the initial sampling determining the voice is ambiguously correlated to a user of the one or more users; and (d) performing subsequent samplings, the subsequent samplings removing ambiguity as to the correlation determined in said step (c) or confirming the ambiguity to remove the correlation determined in said step (c). - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A system for correlating a voice to user in a multi-user application, the system comprising:
-
an image camera component capable of providing a depth image of one or more users in a field of view of the image camera component; a microphone array capable of receiving audio within range of the microphone array, the microphone array capable of localizing a source of a voice; and a computing environment in communication with both the image capture component and microphone array, the computing environment capable of distinguishing between different users in the field of view, the computing environment further performing additional samplings of data from the image camera and data from the microphone array, the additional samplings confirming the correlation of the voice with the user or the additional samplings reducing a likelihood that the voice is correlated to the user, wherein the computing environment distinguishes between different users in the field of view by detecting locations of joints of the one or more users. - View Dependent Claims (17, 18)
-
Specification