SCALABLE ATTRIBUTE-DRIVEN IMAGE RETRIEVAL AND RE-RANKING

US 20150154229A1
Filed: 11/26/2014
Published: 06/04/2015
Est. Priority Date: 11/29/2013
Status: Active Grant

First Claim

Patent Images

1. A method for building a database of searchable images comprising:

extracting visual features from regions of multiple images in a first set of labeled images and applying the labels to the extracted visual features, wherein the regions from which visual features are extracted have salient visual characteristics;

learning a transformation that uses the labels of the labeled visual features to transform the visual features into a discrimination vector, wherein the transformation is learned such that the discrimination vector discriminates between the labels;

extracting visual features from multiple images in a second set of images different from the labeled images, wherein the regions from which the visual features are extracted have salient visual characteristics;

applying the learned transformation to the visual features extracted from the second set of images so as to transform the visual features into respective discrimination vectors for each image in the second set of images; and

storing the labeled images in the first set of images and the images from the second set of images in a database in association with the respective discrimination vectors for each such image, wherein each such image is stored for retrieval by a search which at least in part uses the associated discrimination vectors.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Retrieval of images of objects from a large-scale database of object images, based on a query image. The database may, for example, contain images of objects such as faces, vehicles, people and luggage. Semantic attributes such as doors or windows in the case of vehicles are used as high level semantic cues to determine identities of objects in the images. Salient visual characteristics of the images are labeled with attribute information, and a transformation is learned so as to transform the labeled visual characteristics into a discrimination vector that discriminates between the labels. A similarity metric is learned using the discrimination vectors, such that different images depicting the same object are determined to be close while those having different objects are determined to be far apart. Candidates are retrieved based on a query image, and a re-ranking step may be applied to improve results. Validation experiments are described.

Citations

49 Claims

1. A method for building a database of searchable images comprising:
- extracting visual features from regions of multiple images in a first set of labeled images and applying the labels to the extracted visual features, wherein the regions from which visual features are extracted have salient visual characteristics;
  
  learning a transformation that uses the labels of the labeled visual features to transform the visual features into a discrimination vector, wherein the transformation is learned such that the discrimination vector discriminates between the labels;
  
  extracting visual features from multiple images in a second set of images different from the labeled images, wherein the regions from which the visual features are extracted have salient visual characteristics;
  
  applying the learned transformation to the visual features extracted from the second set of images so as to transform the visual features into respective discrimination vectors for each image in the second set of images; and
  
  storing the labeled images in the first set of images and the images from the second set of images in a database in association with the respective discrimination vectors for each such image, wherein each such image is stored for retrieval by a search which at least in part uses the associated discrimination vectors.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method according to claim 1, wherein the learned transformation is learned by application of a supervised learning analysis which reduces dimensionality of the extracted visual features while preserving local neighborhood structure in the regions from which the visual features are extracted.
  - 3. The method according to claim 2, wherein the supervised learning analysis uses at least local fisher discriminant analysis (LFDA).
  - 4. The method according to claim 1, wherein the labeled images are labeled manually.
  - 5. The method according to claim 1, wherein some of the labeled images are labeled manually and others of the labeled images are labeled by trained classification engines which are trained to label the images automatically.
  - 6. The method according to claim 1, wherein the regions from which visual features are extracted are determined automatically so as to accentuate salient visual characteristics of facial images.
  - 7. The method according to claim 6, wherein the images are face images and the regions include more than one of at least an eye region, a nose region, a mouth region, a hair region and an ear region.
  - 8. The method according to claim 1, wherein the object images comprise images in a same class of objects, wherein the class of objects is selected from a group consisting essentially of faces, vehicles, people and luggage.
  - 9. The method according to claim 1, wherein the images are face images and the regions from which salient visual characteristics are extracted include more than one of at least an eye region, a nose region, a mouth region, a hair region and an ear region.
  - 10. The method according to claim 1, wherein the images are vehicle images and the regions from which salient visual characteristics are extracted include more than one of at least a wheel region, a door region, a window region, a headlight region, a grille region, a side mirror region and a bumper region.
  - 11. The method according to claim 1, wherein the images are images of people and the regions from which salient visual characteristics are extracted include more than one of at least a head region, an arm region, a leg region, a torso region, a face region and a hair region.
  - 12. The method according to claim 1, wherein the images are images of luggage and the regions from which salient visual characteristics are extracted include more than one of at least a base region, a wheel region, a handle region, a corner region, and a luggage tag region.
  - 13. The method according to claim 1, wherein the method is applied for identification and discrimination of objects in a surveillance system.
  - 14. An apparatus for building a database of searchable images, comprising:
    - an interface to a large-scale database of multiple images;
      
      memory for storing computer-executable process steps; and
      
      one or more processors for executing the computer-executable process step stored in the memory;
      
      wherein the computer-executable process steps include steps for causing the apparatus to perform the method according to claim 1.
  - 15. A non-transitory computer-readable memory medium which retrievably stores computer-executable process steps for causing a computer to perform the method according to claim 1.

16. A method for retrieval of images from a searchable database based on similarity to a query image, comprising:
- extracting visual features from regions of the query image, wherein the regions from which the visual features are extracted have salient visual characteristics;
  
  applying a learned transformation to the extracted visual features so as to transform the extracted visual features into a discrimination vector, wherein the learned transformation is learned by using labels of a labeled database of visual features to learn a transformation of the visual features into discrimination vectors that discriminate between the labels;
  
  generating an image similarity measure between the discrimination vector for the query image and a discrimination vector for multiple images in the searchable database, wherein the similarity measure is generated using a calculation learned from a database of multiple images labeled with identities of labelable objects represented in the multiple images, and wherein the calculation measures whether the objects represented in the images are the same objects or are different objects; and
  
  obtaining a candidate list of images in the searchable database that are similar to the query image based at least in part on the similarity measure.
- View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
- - 17. The method according to claim 16, wherein the database of multiple images is comprised of a database of face images and the objects are comprised of individuals whose face images are included in the database, and wherein the calculation by which the similarity measure is generated measures whether the individuals represented in the images are the same individual or are different individuals.
  - 18. The method according to claim 16, wherein the similarity measure calculates a binary vector using the discrimination vector.
  - 19. The method according to claim 16, wherein the similarity measure applies a covariance matrix to the discrimination vector for the query image and to the discrimination vector for images in the searchable database, so as to provide the measurement of whether the objects represented in the images are the same object or are different objects.
  - 20. The method according to claim 16, further comprising the step of re-ranking the candidates in the candidate list based at least in part of similarity of salient visual features in the images.
  - 21. The method according to claim 20, wherein re-ranking is based at least in part on a weighted combination of similarity of salient visual features in the images and the visual features extracted from the regions of the images having salient visual characteristics.
  - 22. The method according to claim 16, wherein the object images comprise images in a same class of objects, wherein the class of objects is selected from a group consisting essentially of faces, vehicles, people and luggage.
  - 23. The method according to claim 16, wherein the images are face images and the regions from which salient visual characteristics are extracted include more than one of at least an eye region, a nose region, a mouth region, a hair region and an ear region.
  - 24. The method according to claim 16, wherein the images are vehicle images and the regions from which salient visual characteristics are extracted include more than one of at least a wheel region, a door region, a window region, a headlight region, a grille region, a side mirror region and a bumper region.
  - 25. The method according to claim 16, wherein the images are images of people and the regions from which salient visual characteristics are extracted include more than one of at least a head region, an arm region, a leg region, a torso region, a face region and a hair region.
  - 26. The method according to claim 16, wherein the images are images of luggage and the regions from which salient visual characteristics are extracted include more than one of at least a base region, a wheel region, a handle region, a corner region, and a luggage tag region.
  - 27. The method according to claim 16, wherein the method is applied for identification and discrimination of objects in a surveillance system.
  - 28. An apparatus for retrieval of images from a searchable database based on similarity to a query image, comprising:
    - an interface to a large-scale database of multiple images;
      
      memory for storing computer-executable process steps; and
      
      one or more processors for executing the computer-executable process step stored in the memory;
      
      wherein the computer-executable process steps include steps for causing the apparatus to perform the method according to claim 16.
  - 29. A non-transitory computer-readable memory medium which retrievably stores computer-executable process steps for causing a computer to perform the method according to claim 16.

30. A method for comparing objects in images, the method comprising:
- obtaining a plurality of respective low-level features and a plurality of respective attribute scores for a plurality of reference object images;
  
  generating a refined low-level feature transformation based at least in part on the plurality of respective low-level features from more than one region of the object and the plurality of respective attribute scores; and
  
  generating an object-similarity measure of a first object image and a second object image based at least in part on low-level features of the first object image, on low-level features of the second object image, and on the refined low-level feature transformation.
- View Dependent Claims (31, 32)
- - 31. An apparatus for comparing objects in images, comprising:
    - an interface to a large-scale database of multiple images;
      
      memory for storing computer-executable process steps; and
      
      one or more processors for executing the computer-executable process step stored in the memory;
      
      wherein the computer-executable process steps include steps for causing the apparatus to perform the method according to claim 30.
  - 32. A non-transitory computer-readable memory medium which retrievably stores computer-executable process steps for causing a computer to perform the method according to claim 30.

33. A method for retrieval of objects in images, the method comprising:
- obtaining a plurality of respective low-level features and a plurality of respective attribute scores for a plurality of reference object images;
  
  generating a refined low-level feature transformation based at least in part on the plurality of respective low-level features from more than one region of the object and the plurality of respective attribute scores;
  
  generating an object-similarity measure of a first object image and a second object image based at least in part on a low-level features of the first object image, a low-level features of the second object image, and the refined low-level feature transformation;
  
  retrieving a subset of images from a plurality of images wherein the subset of images are retrieved based on the respective object-similarity measures of a third object image and a plurality of fourth object images; and
  
  ranking the subset of images based at least in part on the respective object-similarity measure of the third object image and one or more of the subset images and based at least in part on a low-level feature similarity of the third object image and the one or more of the subset images.
- View Dependent Claims (34, 35)
- - 34. An apparatus for retrieval of objects in images, comprising:
    - an interface to a large-scale database of multiple images;
      
      memory for storing computer-executable process steps; and
      
      one or more processors for executing the computer-executable process step stored in the memory;
      
      wherein the computer-executable process steps include steps for causing the apparatus to perform the method according to claim 33.
  - 35. A non-transitory computer-readable memory medium which retrievably stores computer-executable process steps for causing a computer to perform the method according to claim 33.

36. A method for creating an attribute similarity metric, comprising:
- receiving a plurality of identified object images wherein identities of the object images are such that a least some of the images of the same object are labeled with the same identifier;
  
  receiving a plurality of attributes describing object images associated with an identifier;
  
  extracting a plurality of respective low-level region features from a plurality of regions of the images;
  
  learning an attribute subspace mapping based at least in part on the plurality of the respective low-level region features from the plurality of image regions and on the attribute labels for a plurality of attributes;
  
  mapping the plurality of low-level region features to a respective plurality of subspace region features based at least in part on the attribute subspace mapping; and
  
  creating an attribute similarity measure based at least in part on the plurality of subspace region features and on the identifier of the individual object in the image.
- View Dependent Claims (37, 38, 39, 40, 41)
- - 37. The method according to claim 36, wherein the attribute similarity metric is used to measure the similarity of two objects wherein the identifier of one or more of the objects is unknown.
  - 38. The method according to claim 36, wherein learning an attribute subspace mapping comprises an LDA subspace construction using the plurality of attributes from the identifier associated with the object image.
  - 39. The method according to claim 36, wherein the attribute similarity metric is based at least in part on a distance metric using a metric learning technique.
  - 40. An apparatus for creating an attribute similarity metric, comprising:
    - an interface to a large-scale database of multiple images;
      
      memory for storing computer-executable process steps; and
      
      one or more processors for executing the computer-executable process step stored in the memory;
      
      wherein the computer-executable process steps include steps for causing the apparatus to perform the method according to claim 36.
  - 41. A non-transitory computer-readable memory medium which retrievably stores computer-executable process steps for causing a computer to perform the method according to claim 36.

42. A method for retrieval of images from a large-scale database of images based on a query image, comprising:
- accessing a low level feature transformation, a low dimensional projection into a semantic attribute subspace, and a distance metric;
  
  applying the low level feature transformation to the query image so as to extract low level features representative of the query image;
  
  obtaining a candidate set of images from the large-scale database of images based at least in part on similarity of the low level features for the query image to low level features of the images in the large-scale database of images;
  
  applying the low dimensional projection to the query image so as to obtain a semantic attribute projection of the query image; and
  
  ranking the candidate images based at least in part on similarity of the semantic attribute projection for the query image to semantic attribute projections of the images in the large-scale database of images so as to result in a ranked retrieval of images, wherein similarity of the semantic attribute projection is measured by the distance metric.
- View Dependent Claims (43, 44, 45, 46, 47, 48, 49)
- - 43. The method according to claim 42, wherein the low dimensional projection into a semantic attribute subspace, and the distance metric, are both learned in a training phase from the large-scale database of images labeled with semantic attributes.
  - 44. The method according to claim 43, wherein the low dimensional projection into a semantic attribute subspace uses linear discriminant analysis (LDA) from part-based HOG features with human attributes as LDA class labels.
  - 45. The method according to claim 42, further comprising object detection and alignment of the query image using algorithms that are also applied to the images in the large-scale database of images during the training phase.
  - 46. The method according to claim 42, wherein ranking is also based at least in part on similarity of the low level features for the query image to low level features of the candidate images, using a weighting factor that applies respective weights to similarity of the semantic attribute projection and to similarity of the low level features.
  - 47. The method according to claim 46, wherein the weighting factor is learned in a training phase from the large-scale database of images labeled with semantic attributes.
  - 48. An apparatus for retrieval of images from a large-scale database of images based on a query image, comprising:
    - an interface to a large-scale database of multiple images;
      
      memory for storing computer-executable process steps; and
      
      one or more processors for executing the computer-executable process step stored in the memory;
      
      wherein the computer-executable process steps include steps for causing the apparatus to perform the method according to claim 42.
  - 49. A non-transitory computer-readable memory medium which retrievably stores computer-executable process steps for causing a computer to perform the method according to claim 42.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Original Assignee
Canon Kabushiki Kaisha (Canon Inc.)
Inventors
Zhang, Liyan, Denney, Bradley Scott, Dusberger, Dariusz, An, Le, Zou, Changjian

Granted Patent

US 10,120,879 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/583   using metadata automaticall...

G06F 18/2132   based on discrimination cri...

G06F 18/2178   based on feedback of a supe...

G06F 18/24137   Distances to cluster centroïds

G06V 10/764   using classification, e.g. ...

G06V 10/7784   based on feedback from supe...

G06V 40/171   Local features and componen...

SCALABLE ATTRIBUTE-DRIVEN IMAGE RETRIEVAL AND RE-RANKING

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

49 Claims

Specification

Solutions

Use Cases

Quick Links

SCALABLE ATTRIBUTE-DRIVEN IMAGE RETRIEVAL AND RE-RANKING

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

49 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links