Object recognition and database population for video indexing
First Claim
1. A method of processing a video stream including a plurality of video frames, the method comprising:
- detecting appearances of a person in one or more of the plurality of video frames in the video stream;
responsive to detecting an appearance of the person in a video frame;
extracting an image of the person from the video frame;
identifying first metadata from at least one of a video frame or audio track associated with the plurality of video frames; and
associating the first metadata with the image of the person;
determining a distance between the extracted image and at least one image from an object cluster;
associating the image with the object cluster responsive to the distance, the object cluster comprising images of the person;
comparing at least one image from the object cluster to a reference image of a known person;
determining whether the image from the object cluster is of the known person based on the comparison; and
associating second metadata with the object cluster, the second metadata identifying the object cluster as comprising images of the known person.
5 Assignments
0 Petitions
Accused Products
Abstract
A method for processing digital media is described. The method, in one example embodiment, includes identification of objects in a video stream by detecting, for each video frame, an object in the video frame and selectively associating the object with an object cluster. The method may further include comparing the object in the object cluster to a reference object and selectively associating object data of the reference object with all objects within the object cluster based on the comparing. The method may further include manually associating the object data of the reference object with all objects within the object cluster having no associated reference object and populating a reference database with the reference object for the object cluster.
-
Citations
25 Claims
-
1. A method of processing a video stream including a plurality of video frames, the method comprising:
-
detecting appearances of a person in one or more of the plurality of video frames in the video stream; responsive to detecting an appearance of the person in a video frame; extracting an image of the person from the video frame; identifying first metadata from at least one of a video frame or audio track associated with the plurality of video frames; and associating the first metadata with the image of the person; determining a distance between the extracted image and at least one image from an object cluster; associating the image with the object cluster responsive to the distance, the object cluster comprising images of the person; comparing at least one image from the object cluster to a reference image of a known person; determining whether the image from the object cluster is of the known person based on the comparison; and associating second metadata with the object cluster, the second metadata identifying the object cluster as comprising images of the known person. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system for processing a video stream including a plurality of video frames, the system comprising:
a non-transitory computer-readable storage medium storing executable computer program instructions that when executed by one or more processors cause the processors to; detect appearances of a person in one or more of the plurality of video frames in the video stream; responsive to detecting an appearance of the person in a video frame; extract an image of the person from the video frame; identify first metadata from at least one of a video frame or audio track associated with the plurality of video frames; and associate the first metadata with the image of the person; determine a distance between the extracted image and at least one image from an object cluster; associate the extracted facial image with the object cluster responsive to the distance, the object cluster comprising images of the person; compare at least one image from the object cluster to a reference image of a known person; determine whether the image from the object cluster is of the known person based on the comparison; and associate second metadata with the object cluster, the second metadata identifying the object cluster as comprising images of the known person. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23)
-
24. A method to process a video stream including a plurality of video frames, the method comprising:
-
means for detecting appearances of a person in one or more of the plurality of video frames in the video stream, and responsive to detecting an appearance of the person in a video frame; extracting an image of the person from the video frame; identifying first metadata from at least one of a video frame or audio track associated with the plurality of video frames; and associating the first metadata with the image of the person; means for determining a distance between the extracted image and at least one image from an object cluster; means for associating the extracted facial image with the object cluster responsive to the distance, the object cluster comprising images of the person; means for comparing at least one image from the object cluster to a reference image of a known person, and determining whether the image from the object cluster is of the known person based on the comparison; and means for associating second metadata with the object cluster, the second metadata identifying the object cluster as comprising images of the known person.
-
-
25. A non-transitory computer-readable storage medium storing executable computer program instructions that when executed by one or more processors cause the processors to:
-
detect appearances of a person in one or more of the plurality of video frames in the video stream; responsive to detecting an appearance of the person in a video frame; extract an image of the person from the video frame; identify first metadata from at least one of a video frame or audio track associated with the plurality of video frames; and associate the first metadata with the image of the person; determine a distance between the extracted image and at least one image from an object cluster; associate the extracted facial image with the object cluster responsive to the distance, the object cluster comprising images of the person; compare at least one image from the object cluster to a reference image of a known person; determine whether the image from the object cluster is of the known person based on the comparison; and associate second metadata with the object cluster, the second metadata identifying the object cluster as comprising images of the known person.
-
Specification