Semantic parsing of objects in video
First Claim
1. A method comprising:
- producing and storing a plurality of versions of an image of an object derived from a video input, said image cropped from said video input, and wherein each version of said image has a different resolution of said image of said object;
computing an appearance score at each of a plurality of regions on the lowest resolution version of said versions of said image for a plurality of semantic attributes with associated parts for said object, said appearance score for at least one semantic attribute of the plurality of semantic attributes for each region denoting a probability of each semantic attribute of the at least one semantic attribute appearing in the region;
analyzing increasingly higher resolution versions than the lowest resolution version to compute a resolution context score for each region in the lowest resolution version, said resolution context score being indicative of an extent to which finer spatial structure exists in the increasingly higher resolution versions than in the lowest resolution version for each region; and
ascertaining an optimized configuration of body parts and associated semantic attributes in the lowest resolution version, said ascertaining utilizing the appearance scores and the resolution context scores in the regions in the lowest resolution version;
computing a geometric score for each region of said plurality of regions on the lowest resolution version, said geometric score computing a probability of a region matching stored reference data for a reference object corresponding to the detected object with respect to angles and distances among the plurality of regions, and displaying and/or storing said optimized configuration of body parts and associated semantic attributes.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques, systems, and computer program products for parsing objects in a video are provided herein. A method includes producing and storing a plurality of versions of an image of an object derived from a video input, wherein each version of said image has a different resolution of said image; computing an appearance score at each of a plurality of regions on the lowest resolution version of said image for a plurality of semantic attributes with associated parts for said object, said appearance score denoting a probability of each semantic attribute appearing in the region; analyzing increasingly higher resolution versions than the lowest resolution version to compute a resolution context score for each region in the lowest resolution version; and ascertaining an optimized configuration of body parts and associated semantic attributes in the lowest resolution version, said ascertaining utilizing the appearance scores and the resolution context scores.
74 Citations
16 Claims
-
1. A method comprising:
-
producing and storing a plurality of versions of an image of an object derived from a video input, said image cropped from said video input, and wherein each version of said image has a different resolution of said image of said object; computing an appearance score at each of a plurality of regions on the lowest resolution version of said versions of said image for a plurality of semantic attributes with associated parts for said object, said appearance score for at least one semantic attribute of the plurality of semantic attributes for each region denoting a probability of each semantic attribute of the at least one semantic attribute appearing in the region; analyzing increasingly higher resolution versions than the lowest resolution version to compute a resolution context score for each region in the lowest resolution version, said resolution context score being indicative of an extent to which finer spatial structure exists in the increasingly higher resolution versions than in the lowest resolution version for each region; and ascertaining an optimized configuration of body parts and associated semantic attributes in the lowest resolution version, said ascertaining utilizing the appearance scores and the resolution context scores in the regions in the lowest resolution version;
computing a geometric score for each region of said plurality of regions on the lowest resolution version, said geometric score computing a probability of a region matching stored reference data for a reference object corresponding to the detected object with respect to angles and distances among the plurality of regions, and displaying and/or storing said optimized configuration of body parts and associated semantic attributes. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer readable storage device having a computer program product comprising:
-
A computer readable program code embodied in the storage device , said computer readable program code containing instructions that perform a method for estimating parts and attributes of an object in video, said method comprising; producing and storing a plurality of versions of an image of an object derived from a video input, said image cropped from said video input, and wherein each version of said image has a different resolution of said image of said object; computing an appearance score at each of a plurality of regions on the lowest resolution version of said versions of said image for a plurality of semantic attributes with associated parts for said object, said appearance score for at least one semantic attribute of the plurality of semantic attributes for each region denoting a probability of each semantic attribute of the at least one semantic attribute appearing in the region; analyzing increasingly higher resolution versions than the lowest resolution version to compute a resolution context score for each region in the lowest resolution version, said resolution context score being indicative of an extent to which finer spatial structure exists in the increasingly higher resolution versions than in the lowest resolution version for each region; and ascertaining an optimized configuration of body parts and associated semantic attributes in the lowest resolution version, said ascertaining utilizing the appearance scores and the resolution context scores in the regions in the lowest resolution version;
computing a geometric score for each region of said plurality of regions on the lowest resolution version, said geometric score computing a probability of a region matching stored reference data for a reference object corresponding to the detected object with respect to angles and distances among the plurality of regions and displaying and/or storing said optimized configuration of body parts and associated semantic attributes. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer system comprising a processor and a computer readable memory unit coupled to the processor, said computer readable memory unit containing instructions that when run by the processor implement a method for estimating parts and attributes of an object in video, said method comprising:
-
producing and storing a plurality of versions of an image of an object derived from a video input, said image cropped from said video input, and wherein each version of said image has a different resolution of said image of said object; computing an appearance score at each of a plurality of regions on the lowest resolution version of said versions of said image for a plurality of semantic attributes with associated parts for said object, said appearance score for at least one semantic attribute of the plurality of semantic attributes for each region denoting a probability of each semantic attribute of the at least one semantic attribute appearing in the region; computing a geometric score for each region of said plurality of regions on the lowest resolution version, said geometric score computing a probability of a region matching stored reference data for a reference object corresponding to the detected object with respect to angles and distances among the plurality of regions; and analyzing increasingly higher resolution versions than the lowest resolution version to compute a resolution context score for each region in the lowest resolution version, said resolution context score being indicative of an extent to which finer spatial structure exists in the increasingly higher resolution versions than in the lowest resolution version for each region; and ascertaining an optimized configuration of body parts and associated semantic attributes in the lowest resolution version, said ascertaining utilizing the appearance scores and the resolution context scores in the regions in the lowest resolution version. - View Dependent Claims (14)
-
-
15. A process for supporting computer infrastructure, said process comprising providing at least one support service for at least one of creating, integrating, hosting, maintaining, and deploying computer-readable code in a computer system, wherein the code in combination with the computing system is capable of performing a method for estimating parts and attributes of an object in video, said method comprising:
-
producing and storing a plurality of versions of an image of an object derived from a video input, said image cropped from said video input, and wherein each version of said image has a different resolution of said image of said object; computing an appearance score at each of a plurality of regions on the lowest resolution version of said versions of said image for a plurality of semantic attributes with associated parts for said object, said appearance score for at least one semantic attribute of the plurality of semantic attributes for each region denoting a probability of each semantic attribute of the at least one semantic attribute appearing in the region; computing a geometric score for each region of said plurality of regions on the lowest resolution version, said geometric score computing a probability of a region matching stored reference data for a reference object corresponding to the detected object with respect to angles and distances among the plurality of regions; and analyzing increasingly higher resolution versions than the lowest resolution version to compute a resolution context score for each region in the lowest resolution version, said resolution context score being indicative of an extent to which finer spatial structure exists in the increasingly higher resolution versions than in the lowest resolution version for each region; and ascertaining an optimized configuration of body parts and associated semantic attributes in the lowest resolution version, said ascertaining utilizing the appearance scores and the resolution context scores in the regions in the lowest resolution version. - View Dependent Claims (16)
-
Specification