Semantic Parsing of Objects in Video

US 20130177249A1
Filed: 03/04/2013
Published: 07/11/2013
Est. Priority Date: 07/28/2010
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

producing and storing a plurality of versions of an image of an object derived from a video input, said image cropped from said video input, and wherein each version of said image has a different resolution of said image of said object;

computing an appearance score at each of a plurality of regions on the lowest resolution version of said versions of said image for a plurality of semantic attributes with associated parts for said object, said appearance score for at least one semantic attribute of the plurality of semantic attributes for each region denoting a probability of each semantic attribute of the at least one semantic attribute appearing in the region;

analyzing increasingly higher resolution versions than the lowest resolution version to compute a resolution context score for each region in the lowest resolution version, said resolution context score being indicative of an extent to which finer spatial structure exists in the increasingly higher resolution versions than in the lowest resolution version for each region; and

ascertaining an optimized configuration of body parts and associated semantic attributes in the lowest resolution version, said ascertaining utilizing the appearance scores and the resolution context scores in the regions in the lowest resolution version.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques, systems, and computer program products for parsing objects in a video are provided herein. A method includes producing and storing a plurality of versions of an image of an object derived from a video input, wherein each version of said image has a different resolution of said image; computing an appearance score at each of a plurality of regions on the lowest resolution version of said image for a plurality of semantic attributes with associated parts for said object, said appearance score denoting a probability of each semantic attribute appearing in the region; analyzing increasingly higher resolution versions than the lowest resolution version to compute a resolution context score for each region in the lowest resolution version; and ascertaining an optimized configuration of body parts and associated semantic attributes in the lowest resolution version, said ascertaining utilizing the appearance scores and the resolution context scores.

Citations

20 Claims

1. A method comprising:
- producing and storing a plurality of versions of an image of an object derived from a video input, said image cropped from said video input, and wherein each version of said image has a different resolution of said image of said object;
  
  computing an appearance score at each of a plurality of regions on the lowest resolution version of said versions of said image for a plurality of semantic attributes with associated parts for said object, said appearance score for at least one semantic attribute of the plurality of semantic attributes for each region denoting a probability of each semantic attribute of the at least one semantic attribute appearing in the region;
  
  analyzing increasingly higher resolution versions than the lowest resolution version to compute a resolution context score for each region in the lowest resolution version, said resolution context score being indicative of an extent to which finer spatial structure exists in the increasingly higher resolution versions than in the lowest resolution version for each region; and
  
  ascertaining an optimized configuration of body parts and associated semantic attributes in the lowest resolution version, said ascertaining utilizing the appearance scores and the resolution context scores in the regions in the lowest resolution version.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, comprising:
    - displaying and/or storing said optimized configuration of body parts and associated semantic attributes.
  - 3. The method of claim 1, comprising:
    - computing a geometric score for each region of said plurality of regions on the lowest resolution version, said geometric score computing a probability of a region matching stored reference data for a reference object corresponding to the detected object with respect to angles and distances among the plurality of regions.
  - 4. The method of claim 3, wherein the resolution context score for the lower resolution version of said image is computed as a weighted average score computed from a plurality of scores for a next higher resolution version of said higher resolution versions of said image.
  - 5. The method of claim 4, wherein said plurality of scores for said next higher resolution version of said image comprise appearance scores and geometric scores.
  - 6. The method of claim 4, wherein said plurality of scores for said next higher resolution version of said image comprise appearance scores, geometric scores and resolution context scores.
  - 7. The method of claim 6, wherein said weighted average score for the next higher resolution version of the image is computed using the following formula divided by I:
  - 8. The method claim of 7, comprising:
    - storing and/or displaying output of at least one portion of said image in at least one version of said higher level versions of said image with spatial information on semantic attributes and associated parts.

9. A computer program product comprising:
- a computer readable storage medium having computer readable program code embodied in the storage medium, said computer readable program code containing instructions that perform a method for estimating parts and attributes of an object in video, said method comprising;
  
  producing and storing a plurality of versions of an image of an object derived from a video input, said image cropped from said video input, and wherein each version of said image has a different resolution of said image of said object;
  
  computing an appearance score at each of a plurality of regions on the lowest resolution version of said versions of said image for a plurality of semantic attributes with associated parts for said object, said appearance score for at least one semantic attribute of the plurality of semantic attributes for each region denoting a probability of each semantic attribute of the at least one semantic attribute appearing in the region;
  
  analyzing increasingly higher resolution versions than the lowest resolution version to compute a resolution context score for each region in the lowest resolution version, said resolution context score being indicative of an extent to which finer spatial structure exists in the increasingly higher resolution versions than in the lowest resolution version for each region; and
  
  ascertaining an optimized configuration of body parts and associated semantic attributes in the lowest resolution version, said ascertaining utilizing the appearance scores and the resolution context scores in the regions in the lowest resolution version.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The computer program product of claim 9, said method comprising:
    - displaying and/or storing said optimized configuration of body parts and associated semantic attributes.
  - 11. The computer program product of claim 9, said method comprising:
    - computing a geometric score for each region of said plurality of regions on the lowest resolution version, said geometric score computing a probability of a region matching stored reference data for a reference object corresponding to the detected object with respect to angles and distances among the plurality of regions.
  - 12. The computer program product of claim 11, wherein the resolution context score for the lower resolution version of said image is computed as a weighted average score computed from a plurality of scores for a next higher resolution version of said higher resolution versions of said image.
  - 13. The computer program product of claim 12, wherein said plurality of scores for said next higher resolution version of said image comprise appearance scores and geometric scores.
  - 14. The computer program product of claim 12, wherein said plurality of scores for said next higher resolution version of said image comprise appearance scores, geometric scores and resolution context scores.
  - 15. The computer program product of claim 14, wherein said weighted average score for the next higher resolution version of the image is computed using the following formula divided by I:
  - 16. The computer program product of claim 15, said method comprising:
    - storing and/or displaying output of at least one portion of said image in at least one version of said higher level versions of said image with spatial information on semantic attributes and associated parts.

17. A computer system comprising a processor and a computer readable memory unit coupled to the processor, said computer readable memory unit containing instructions that when run by the processor implement a method for estimating parts and attributes of an object in video, said method comprising:
- producing and storing a plurality of versions of an image of an object derived from a video input, said image cropped from said video input, and wherein each version of said image has a different resolution of said image of said object;
  
  computing an appearance score at each of a plurality of regions on the lowest resolution version of said versions of said image for a plurality of semantic attributes with associated parts for said object, said appearance score for at least one semantic attribute of the plurality of semantic attributes for each region denoting a probability of each semantic attribute of the at least one semantic attribute appearing in the region;
  
  analyzing increasingly higher resolution versions than the lowest resolution version to compute a resolution context score for each region in the lowest resolution version, said resolution context score being indicative of an extent to which finer spatial structure exists in the increasingly higher resolution versions than in the lowest resolution version for each region; and
  
  ascertaining an optimized configuration of body parts and associated semantic attributes in the lowest resolution version, said ascertaining utilizing the appearance scores and the resolution context scores in the regions in the lowest resolution version.
- View Dependent Claims (18)
- - 18. The system of claim 17, said method comprising:
    - displaying and/or storing said optimized configuration of body parts and associated semantic attributes.

19. A process for supporting computer infrastructure, said process comprising providing at least one support service for at lease one of creating, integrating, hosting, maintaining, and deploying computer-readable code in a computer system, wherein the code in combination with the computing system is capable of performing a method for estimating parts and attributes of an object in video, said method comprising:
- producing and storing a plurality of versions of an image of an object derived from a video input, said image cropped from said video input, and wherein each version of said image has a different resolution of said image of said object;
  
  computing an appearance score at each of a plurality of regions on the lowest resolution version of said versions of said image for a plurality of semantic attributes with associated parts for said object, said appearance score for at least one semantic attribute of the plurality of semantic attributes for each region denoting a probability of each semantic attribute of the at least one semantic attribute appearing in the region;
  
  analyzing increasingly higher resolution versions than the lowest resolution version to compute a resolution context score for each region in the lowest resolution version, said resolution context score being indicative of an extent to which finer spatial structure exists in the increasingly higher resolution versions than in the lowest resolution version for each region; and
  
  ascertaining an optimized configuration of body parts and associated semantic attributes in the lowest resolution version, said ascertaining utilizing the appearance scores and the resolution context scores in the regions in the lowest resolution version.
- View Dependent Claims (20)
- - 20. The process of claim 19, said method comprising:
    - displaying and/or storing said optimized configuration of body parts and associated semantic attributes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Kyndryl Incorporated (Kyndryl Holdings, Inc.)
Original Assignee
International Business Machines Corporation
Inventors
Brown, Lisa Marie, Feris, Rogerio Schmidt, Hampapur, Arun, Vaquero, Daniel Andre

Granted Patent

US 8,588,533 B2
Time in Patent Office

Days
Field of Search
US Class Current

382/195
CPC Class Codes

G06F 18/22   Matching criteria, e.g. pro...

G06V 10/426   Graphical representations

G06V 20/10   Terrestrial scenes scenes u...

G06V 20/41   Higher-level, semantic clus...

G06V 20/70   Labelling scene content, e....

G06V 30/2504   Coarse or fine approaches, ...

G06V 40/103   Static body considered as a...

Semantic Parsing of Objects in Video

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Semantic Parsing of Objects in Video

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links