Speaker detection and tracking using audiovisual data
First Claim
Patent Images
1. An object tracker system, comprising:
- an audio model that models an original audio signal of an object, a time delay between at least two audio input signals and a variability component of the original audio signal, the audio model employing a probabilistic generative model, and employing, at least in part, the following equations;
p(r)=π
r,
p(a|r)=N(a|0,η
r),
p(x1|a)=N(x1|λ
1a,ν
1),
p(x2|a,τ
)=N(x2|λ
2Lτ
a,ν
2), where r is variability component of the original audio signal, π
is a prior probability parameter of r, a is the original audio signal of the object, x1 is a first audio input signal, x2 is a second audio input signal, τ
is the time delay between x1 and x2, λ
1 is an attenuation parameter associated with x1, λ
2 is an attenuation parameter associated with x2, η
r is a precision matrix parameter associated with r, ν
1 is a precision matrix parameter associated with additive noise of x1, ν
2 is a precision matrix parameter associated with additive noise of x2, Lr denotes a temporal shift operator;
a video model that models a location of the object, an original image of the object and a variability component of the original image, the video model employing a probabilistic generative model, the video model receiving a video input; and
, an audio video tracker that models the location of the object based, at least in part, upon the audio model and the video model, the audio video tracker providing an output associated with the location of the object.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method facilitating object tracking is provided. The system includes an audio model that receives at least two audio input signals and a video model that receives a video input. The audio model and the video model employ probabilistic generative models which are combined to facilitate object tracking. Expectation maximization can be employed to modify trainable parameters of the audio model and the video model.
-
Citations
24 Claims
-
1. An object tracker system, comprising:
-
an audio model that models an original audio signal of an object, a time delay between at least two audio input signals and a variability component of the original audio signal, the audio model employing a probabilistic generative model, and employing, at least in part, the following equations;
p(r)=π
r,
p(a|r)=N(a|0,η
r),
p(x1|a)=N(x1|λ
1a,ν
1),
p(x2|a,τ
)=N(x2|λ
2Lτ
a,ν
2),where r is variability component of the original audio signal, π
is a prior probability parameter of r,a is the original audio signal of the object, x1 is a first audio input signal, x2 is a second audio input signal, τ
is the time delay between x1 and x2,λ
1 is an attenuation parameter associated with x1,λ
2 is an attenuation parameter associated with x2,η
r is a precision matrix parameter associated with r,ν
1 is a precision matrix parameter associated with additive noise of x1,ν
2 is a precision matrix parameter associated with additive noise of x2,Lr denotes a temporal shift operator;
a video model that models a location of the object, an original image of the object and a variability component of the original image, the video model employing a probabilistic generative model, the video model receiving a video input; and
,an audio video tracker that models the location of the object based, at least in part, upon the audio model and the video model, the audio video tracker providing an output associated with the location of the object. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A method for object tracking, comprising:
-
updating a posterior distribution over unobserved variables of an audio model and a video model;
updating trainable parameters of the audio model and the video model;
employing, at least in part, the following equations in the audio model;
p(r)=π
r,
p(a|r)=N(a|0,η
r),
p(x1|a)=N(x1|λ
1a,ν
1),
p(x2|a,τ
)=N(x2|λ
2Lτ
a,ν
2),where r is variability component of the original audio signal, π
a prior probability parameter of r,a is the original audio signal of the object, x1 is a first audio input signal, x2 is a second audio input signal, τ
is the time delay between x1 and x2,λ
1 is an attenuation parameter associated with x1,λ
2 is an attenuation parameter associated with x2,η
r is a precision matrix parameter associated with r,ν
1 is a precision matrix parameter associated with additive noise of x1,ν
2 is a precision matrix parameter associated with additive noise of x2,Lr denotes a temporal shift operator; and
,providing an output associated with a location of an object. - View Dependent Claims (18, 19)
-
-
20. A data packet transmitted between two or more computer components that facilitates object tracking, the data packet comprising:
-
a first data field comprising information associated with a horizontal location of an object; and
,a second data field comprising information associated with a vertical location of the object, the horizontal location and the vertical location being based, at least in part, upon an object tracker system receiving at least two audio signal inputs and a video input signal;
wherein the object tracker system comprising at least an audio model employing, at least in part, the following equations;
p(r)=π
r,
p(a|r)=N(a|0,η
r),
p(x1|a)=N(x1|λ
1a,ν
1),
p(x2|a,τ
)=N(x2|λ
2Lτ
a,ν
2),where r is variability component of the original audio signal, π
r is a prior probability parameter of r,a is the original audio signal of the object, x1 is a first audio input signal, x2 is a second audio input signal. π
is the time delay between x1 and x2,λ
1 is an attenuation parameter associated with x1,λ
2 is an attenuation parameter associated with x2,η
r is a precision matrix parameter associated with r,ν
1 is a precision matrix parameter associated with additive noise of x1,ν
2 is a precision matrix parameter associated with additive noise of x2,Lr denotes a temporal shift operator.
-
-
21. A computer readable medium storing computer executable components of an object tracker system, comprising:
-
an audio model component that models an original audio signal of an object, a time delay between at least two audio input signals and a variability component of the original audio signal, the audio model employing a probabilistic generative model;
a video model component that models a location of the object, an original image of the object and a variability component of the original image, the video model employing a probabilistic generative model, the video model receiving a video input; and
employing, at least in part, the following equations;
p(s)=π
s,
p(ν
|s)=N(ν
|μ
s,φ
s),
p(y|ν
,l)=N(y|Glν
,ψ
),where π
s is a prior probability parameter of s,y is the video input signal, l is the location of the object, ν
is the original image of the object,μ
s is a mean parameter associated with s,φ
s is a precision matrix parameter associated with s,ψ
is a precision matrix parameter associated with additive noise of y,Gl denotes a shift operator; and
,an audio video tracker component that models the location of the object based, at least in part, upon the audio model and the video model, the audio video tracker providing an output associated with the location of the object.
-
-
22. An means for modeling audio that models an original audio signal of an object, a time delay between at least two audio input signals and a variability component of the original audio signal, the means for modeling audio employing a probabilistic generative model, and employing, at least in part, the following equations in the audio model:
-
p(r)=π
r,
p(a|r)=N(a|0,η
r),
p(x1|a)=N(x1|λ
1a,ν
1),
p(x2|a,τ
)=N(x2|λ
2Lτ
a,ν
2),where r is variability component of the original audio signal, π
a prior probability parameter of r,a is the original audio signal of the object, x1 is a first audio input signal, x2 is a second audio input signal, τ
is the time delay between x1 and x2,λ
1 is an attenuation parameter associated with x1,λ
2 is an attenuation parameter associated with x2,η
r is a precision matrix parameter associated with r,ν
1 is a precision matrix parameter associated with additive noise of x1,ν
2 is a precision matrix parameter associated with additive noise of x2,Lr denotes a temporal shift operator;
means for modeling video that models a location of the object, an original image of the object and a variability component of the original image, the means for modeling video employing a probabilistic generative model; and
,means for tracking the location of the object based, at least in part, upon the means for modeling audio and the means for model video, the means for tracking the location of the object providing an output associated with the location of the object.
-
-
23. An object tracker system, comprising:
-
an audio model that models an original audio signal of an object, a time delay between at least two audio input signals and a variability component of the original audio signal, the audio model employing a probabilistic generative model;
a video model that models a location of the object, an original image of the object, a variability component of the original image and a background image, the video model employing a probabilistic generative model, the video model receiving a video input, and employing, at least in part, the following equations;
p(s)=π
s,
p(ν
|s)=N(ν
|μ
s,φ
s),
p(y|ν
,l)=N(y|Glν
,ψ
),where π
s is a prior probability parameter of s,y is the video input signal, l is the location of the object, ν
is the original image of the object,μ
s is a mean parameter associated with s,φ
s is a precision matrix parameter associated with s,ψ
is a precision matrix parameter associated with additive noise of ν
,Gl denotes a shift operator. an audio video tracker that models the location of the object based, at least in part, upon the audio model and the video model, the audio video tracker providing an output associated with the location of the object.
-
-
24. An object tracker system, comprising:
-
an audio model that models an original audio signal of an object, a time delay between at least two audio input signals, a variability component of the original audio signal and a previous original audio signal of the object, the audio model employing a probabilistic generative model, and employing, at least in part, the following equations;
p(r)=π
r,
p(a|r)=N(a|0,η
r),
p(x1|a)=N(x1|λ
1a,ν
1),
p(x2|a,τ
)=N(x2|λ
2Lτ
a,ν
2),where r is variability component of the original audio signal, π
a prior probability parameter of r,a is the original audio signal of the object, x1 is a first audio input signal, x2 is a second audio input signal, τ
is the time delay between x1 and x2,λ
1 is an attenuation parameter associated with x1,λ
2 is an attenuation parameter associated with x2,η
r is a precision matrix parameter associated with r,ν
1 is a precision matrix parameter associated with additive noise of x1,ν
2 is a precision matrix parameter associated with additive noise of x2,Lr denotes a temporal shift operator;
a video model that models a location of the object, an original image of the object and a variability component of the original image, the video model employing a probabilistic generative model, the video model receiving a video input; and
,an audio video tracker that models the location of the object based, at least in part, upon the audio model, the video model and a previous location of the object, the audio video tracker providing an output associated with the location of the object.
-
Specification