System and method for identifying objects in video
First Claim
1. A method for identifying objects in a video, the method comprising:
- detecting a first input probable to identify an object in one or more video frames in a video stream of the video, the first input being an image of the object;
determining one or more second inputs probable to identify the object in the video frames, wherein the second inputs comprise additional data extracted from at least one of the video stream and an accompanying audio stream of the video;
associating the second inputs with the object;
obtaining distance values between each input and a plurality of reference objects, wherein a distance value indicates a closeness of an input to an identity of a reference object;
responsive to obtaining distance values for an input, associating a relative weight with the input based on the likelihood of the input to identify the object as a reference object;
calculating joint distance values between the object and the reference objects, wherein a joint distance value is a weighted transformation of distance values between a plurality of inputs and a reference object;
comparing the joint distance values calculated for the object; and
identifying the object as a reference object based on the comparing.
5 Assignments
0 Petitions
Accused Products
Abstract
A method for processing digital media is described. In one example embodiment, the method may include detecting an unknown object in a video frame, receiving inputs representing probable identities of the unknown object in the video frame from various sources, and associating each input with the unknown object detected in the video frame. The received inputs may be processed, compared with reference data and, based on the comparison, probable identities of the object associated with the input derived. The method may further include retrieving a likelihood of the input to match the unknown object from historical data and producing weights corresponding to the inputs, fusing the inputs and the relative weight associated with each input, and identifying the unknown object based on a comparison of the weighted distances from the unknown identify to a reference identity. The relative weights are chosen from the historical data to maximize correct recognition rate based on the history of recognitions and manual verification results.
22 Citations
25 Claims
-
1. A method for identifying objects in a video, the method comprising:
-
detecting a first input probable to identify an object in one or more video frames in a video stream of the video, the first input being an image of the object; determining one or more second inputs probable to identify the object in the video frames, wherein the second inputs comprise additional data extracted from at least one of the video stream and an accompanying audio stream of the video; associating the second inputs with the object; obtaining distance values between each input and a plurality of reference objects, wherein a distance value indicates a closeness of an input to an identity of a reference object; responsive to obtaining distance values for an input, associating a relative weight with the input based on the likelihood of the input to identify the object as a reference object; calculating joint distance values between the object and the reference objects, wherein a joint distance value is a weighted transformation of distance values between a plurality of inputs and a reference object; comparing the joint distance values calculated for the object; and identifying the object as a reference object based on the comparing. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system for identifying objects in a video, the system comprising:
-
a buffered frame sequence processor to process a plurality of video frames in a video stream of the video; a facial context extraction processor to detect and extract a first input probable to identify an object in one or more video frames in the plurality of video fames the first input being an image of the object; extraction processors to detect and extract one or more second inputs probable to identify the object in the video frames, wherein the second inputs comprise additional data extracted from at least one of the video stream and an accompanying audio stream of the video; an associating module to associate the second inputs with the object detected in the video frames and to associate a relative weight with the input based on the likelihood of the input to identify the object as a reference object; a computing module to obtain values of a distance function from the first and second inputs to reference objects, wherein a distance function value indicates a closeness of an input to an identity of a reference object, and to obtain values of a joint distance function from the object to the reference objects, wherein a joint distance function value is a weighted transformation of distance values between a plurality of inputs and a reference object; a comparing module to compare the values of the joint distance function for the object; and an identification module to identify the object as a reference object based on the comparing. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
-
-
24. A method for identifying objects in a video, the method comprising;
-
means of detecting a first input probable to identify an object in one or more video frames in a video stream of the video, the first input being an image of the object; means of determining one or more second inputs probable to identify the object in the video frames, wherein the second inputs comprise additional data extracted from at least one of the video stream and an accompanying audio stream of the video; means of associating the second inputs with the object; means of obtaining distance values between each input and a plurality of reference object, wherein a distance value indicates a closeness of an input to an identity of a reference object; means of associating a relative weight with an input based on the likelihood of the input to identify the object as a reference object responsive to obtaining distance value of the input; means of calculating joint distance values between the object and the reference objects, wherein a joint distance value is a weighted transformation of distance values between a plurality of inputs and a reference object; means of comparing the joint distance values calculated for the object; and means of identifying the object as a reference object based on the comparing.
-
-
25. A non-transitory machine-readable medium comprising instructions, which when implemented by one or more processors perform the following operations:
-
detect a first input probable to identify an object in one or more video frames in a video stream of the video, the first input being an image of the object; determine one or more second inputs probable to identify the object in the video frames, wherein the second inputs comprise additional data extracted from at least one of the video stream and an accompanying audio stream of the video; associate the second inputs with the object; obtain distance values between each input and a plurality of reference objects, wherein a distance value indicates a closeness of an input to an identity of a reference object; responsive to obtaining distance values for an input, associate a relative weight with the input based on the likelihood of the input to identify the object as a reference object; calculate joint distance values between the object and the reference objects, wherein a joint distance value is a weighted transformation of distance values between a plurality of inputs and a reference object; compare the joint distance values calculated for the object; and identify the object as a reference object based on the comparing.
-
Specification