Speaker detection and tracking using audiovisual data
First Claim
Patent Images
1. One or more processor-accessible storage devices comprising processor-executable instructions for object tracking that, when executed, direct a device to perform actions comprising:
- receiving at least two audio input signals associated to an object;
receiving a video input signal associated to the object;
modeling a location of the object based at least in part on the at least two audio input signals and the video input signal; and
calculating an error in modeling the location of the object based at least in part on;
a precision matrix of an approximation error modeled by a zero mean Gaussian;
a product of a vertical position of the object and a difference in vertical position of a first audio input device and a second audio input device; and
a product of a horizontal position of the object and a difference in horizontal position of the first audio input device and the second audio input device.
2 Assignments
0 Petitions
Accused Products
Abstract
Object tracking includes an audio model that receives at least two audio input signals and a video model that receives a video input. The audio model and the video model employ probabilistic generative models which are combined to facilitate object tracking. Expectation maximization can be employed to modify trainable parameters of the audio model and the video model.
-
Citations
20 Claims
-
1. One or more processor-accessible storage devices comprising processor-executable instructions for object tracking that, when executed, direct a device to perform actions comprising:
-
receiving at least two audio input signals associated to an object; receiving a video input signal associated to the object; modeling a location of the object based at least in part on the at least two audio input signals and the video input signal; and calculating an error in modeling the location of the object based at least in part on; a precision matrix of an approximation error modeled by a zero mean Gaussian; a product of a vertical position of the object and a difference in vertical position of a first audio input device and a second audio input device; and a product of a horizontal position of the object and a difference in horizontal position of the first audio input device and the second audio input device. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An object tracker system, comprising:
-
a processor that executes the following computer executable components stored on a computer readable medium; an audio model component that models an original audio signal of an object; a video model component that models a location of the object; and an audio video tracker component that models the location of the object based, at least in part on the audio model and the video model, wherein the audio video tracker provides an output associated with the location of the object based at least in part on a linear mapping that approximates the location of the object, wherein error in approximating the location of the object is modeled by a zero mean Gaussian distribution associated with a precision matrix, and wherein the zero mean Gaussian distribution associated with the precision matrix is based at least in part on a product of a horizontal position of the object and a difference in horizontal position of a first audio input device and a second audio input device. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. One or more processor-accessible storage devices comprising processor-executable instructions for object tracking that, when executed, direct a device to perform actions comprising:
-
updating a posterior distribution over unobserved variables of a probabilistic generative audio model and a probabilistic generative video model; providing the probabilistic generative audio model with trainable parameters; providing the probabilistic generative video model with trainable parameters; updating trainable parameters of the probabilistic generative audio model and the probabilistic generative video model; determining a location of an object at least in part by combining the probabilistic generative audio model and the probabilistic generative video model using a probabilistic generative model, wherein an error in the determining the location of the object is based at least in part on at least one of; a product of a vertical position of the object and a difference in vertical position of a first audio input device and a second audio input device; and a product of a horizontal position of the object and a difference in horizontal position of the first audio input device and the second audio input device; and providing an output associated with the location of the object. - View Dependent Claims (20)
-
Specification