AUTOMATIC IMAGE ANNOTATION USING SEMANTIC DISTANCE LEARNING

US 20090313294A1
Filed: 06/11/2008
Published: 12/17/2009
Est. Priority Date: 06/11/2008
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented process for automatically annotating a new image, comprising using a computing device to perform the following process actions:

inputting a set of training images, wherein the new image is not in Tmanually annotating each training image in with a vector of keyword annotations;

partitioning into a plurality of semantic clusters of training images, wherein k is a variable which uniquely identifies each cluster, comprises training images that are semantically similar, and each training image is partitioned into a single cluster;

for each semantic cluster of training images,learning a semantic distance function (SDF) f^(k)for utilizing f^(k)to compute a pair-wise feature-based semantic distance score between the new image and each training image in resulting in a set of pair-wise feature-based semantic distance scores for wherein each feature-based score in the set specifies a metric for an intuitive semantic distance between the new image and a particular training image in utilizing the set of pair-wise feature-based semantic distance scores for to generate a ranking list for wherein said list ranks each training image in according to its intuitive semantic distance from the new image,estimating a cluster association probability p(k) for wherein p(k) specifies a probability of the new image being semantically associated with andprobabilistically propagating the vector of keyword annotations for each training image in to the new image, resulting in a cluster-specific vector w^(k)of probabilistic annotations for the new image; and

utilizing p(k) and w^(k)for all the semantic clusters of training images to generate a vector w of final keyword annotations for the new image.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Images are automatically annotated using semantic distance learning. Training images are manually annotated and partitioned into semantic clusters. Semantic distance functions (SDFs) are learned for the clusters. The SDF for each cluster is used to compute semantic distance scores between a new image and each image in the cluster. The scores for each cluster are used to generate a ranking list which ranks each image in the cluster according to its semantic distance from the new image. An association probability is estimated for each cluster which specifies the probability of the new image being semantically associated with the cluster. Cluster-specific probabilistic annotations for the new image are generated from the manual annotations for the images in each cluster. The association probabilities and cluster-specific probabilistic annotations for all the clusters are used to generate final annotations for the new image.

Citations

20 Claims

1. A computer-implemented process for automatically annotating a new image, comprising using a computing device to perform the following process actions:
- inputting a set of training images, wherein the new image is not in Tmanually annotating each training image in with a vector of keyword annotations;
  
  partitioning into a plurality of semantic clusters of training images, wherein k is a variable which uniquely identifies each cluster, comprises training images that are semantically similar, and each training image is partitioned into a single cluster;
  
  for each semantic cluster of training images,learning a semantic distance function (SDF) f^(k)for utilizing f^(k)to compute a pair-wise feature-based semantic distance score between the new image and each training image in resulting in a set of pair-wise feature-based semantic distance scores for wherein each feature-based score in the set specifies a metric for an intuitive semantic distance between the new image and a particular training image in utilizing the set of pair-wise feature-based semantic distance scores for to generate a ranking list for wherein said list ranks each training image in according to its intuitive semantic distance from the new image,estimating a cluster association probability p(k) for wherein p(k) specifies a probability of the new image being semantically associated with andprobabilistically propagating the vector of keyword annotations for each training image in to the new image, resulting in a cluster-specific vector w^(k)of probabilistic annotations for the new image; and
  
  utilizing p(k) and w^(k)for all the semantic clusters of training images to generate a vector w of final keyword annotations for the new image.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The process of claim 1, wherein,the vector of keyword annotations for each training image serves as a metadata tag for the image, said vector comprising one or more textual keywords, wherein,the keywords are drawn from a prescribed vocabulary of keywords, andeach keyword describes a different low-level visual feature in the image.
  - 3. The process of claim 2, wherein the prescribed vocabulary of keywords comprises a Corel keyword database.
  - 4. The process of claim 2, wherein the vector of keyword annotations for each training image comprises between one and five different keywords.
  - 5. The process of claim 1, wherein the semantics of each training image are assumed to be represented by the vector of keyword annotations for said image, and the process action of partitioning into a plurality of semantic clusters of training images comprises actions of:
    - computing a pair-wise annotation-based semantic distance score SD( ) between every possible pair of training images in wherein each score SD( ) specifies a metric for the intuitive semantic distance between a particular pair of training images in andutilizing said scores SD( ) to partition the training images in into H different semantic clusters of training images.
  - 6. The process of claim 5, wherein the pair-wise annotation-based semantic distance score SD( ) for a particular pair of training images in is given by the equation
  - 7. The process of claim 6, wherein,a Jiang and Conrath (JCN) keyword similarity measure JCN( ) is employed to compute SD(a_i,b_j), andSD(a_i,b_j) is given by the equation
  - 8. The process of claim 5, wherein the process action of utilizing said scores SD( ) to partition the training images in into H different semantic clusters of training images comprises actions of:
    - utilizing a constant shift embedding framework to embed the training images in into a Euclidean vector space; and
      
      utilizing an x-means algorithm to group the embedded training images into the H different semantic clusters of training images based upon said scores SD( ), wherein the x-means algorithm automatically determines an optimal value for H.
  - 9. The process of claim 1, wherein is given by the equation ={x₁, x₂, . . . , x_n}, n is a total number of training images in x_iis a feature vector for the i-th training image comprising the low-level visual features contained within said image, is given by the equation ={x_i^(k)}_i=1ⁿ^k= wherein n_kis a number of training images in and the process action of learning a semantic distance function (SDF) f^(k)for comprises actions of:
    - generating a set of relaxed relative comparison constraints for wherein is given by the equation ={(x_a,x_b,x_c)} and (x_a,x_b,x_c) is a subset of all the possible triples of training images in which satisfies either,a first condition in which the intuitive semantic distance between x_aand x_cis greater than said distance between x_aand x_b, ora second condition in which the intuitive semantic distance between x_aand x_cequals said distance between x_aand x_bbut the difference between the features in x_aand the features in x_cis greater than the difference between the features in x_aand the features in x_b;
      
      randomly sampling a prescribed number m of the constraints from resulting in a subset of relaxed relative comparison constraints for given by (i=1, . . . , m);
      
      training m different pair-wise SDFs {f₁^(k), . . . , f_m^(k)} for wherein each pair-wise SDF f_i^(k)is trained using andgenerating f^(k)by computing an average of the m different pair-wise SDFs {f₁^(k), . . . , f_m^(k)}.
  - 10. The process of claim 9, wherein each pair-wise SDF f_i^(k)is given by theequation
  - 11. The process of claim 10, wherein the diagonal matrix W is computed using the quadratic programming algorithm
  - 12. The process of claim 10, wherein the process action of estimating a cluster association probability p(k) for comprises actions of:
    - generating a probability density function (PDF) which estimates the visual features in the training images in andutilizing the PDF to estimate the cluster association probability p(k).
  - 13. The process of claim 1, wherein the process action of probabilistically propagating the vector of keyword annotations for each training image in to the new image comprises actions of:
    - utilizing the ranking list for to rank the vectors of keyword annotations for all the training images in resulting in a set of ranked keyword annotations for given by {t₁^(k), t₂^(k), . . . , t_n_k^(k)}, wherein n_kis a total number of training images in and t_i^(k)is the vector of keyword annotations for the i-th training image in the ranking list;
      
      utilizing the ranking list for to rank the pair-wise feature-based semantic distance scores between the new image and each training image in resulting in a set of ranked pair-wise feature-based semantic distance scores for given by {d₁^(k), d₂^(k), . . . , d_n_k^(k)}, wherein d_i^(k)is the pair-wise feature-based semantic distance score between the new image and the i-th training image in the ranking list;
      
      computing the cluster-specific vector w^(k)of probabilistic annotations as
  - 14. The process of claim 13, wherein α
    - ^(k)is set such that
  - 15. The process of claim 13, wherein w^(k)is normalized such that the L−
    - 1 norm of w^(k)is one.
  - 16. The process of claim 1, wherein,the plurality of semantic clusters of training images comprises H different clusters,the vector w of final keyword annotations for the new image is given by the equation

17. A computer-implemented process for comparing the annotation precision of two different automatic image annotation (AIA) algorithms, comprising using a computing device to perform the following process actions:
- inputting a set T of images;
  
  manually applying ground-truth keyword annotations to each image in T, wherein T comprises a total number of images given by n;
  
  utilizing a first AIA algorithm to automatically generate first keyword annotations for each image in T;
  
  utilizing a second AIA algorithm to automatically generate second keyword annotations for each image in T;
  
  computing a first pair-wise semantic distance score SD( ) for each image in T, wherein said first score SD( ) specifies a metric for a semantic distance between the first keyword annotations and the ground-truth keyword annotations;
  
  computing a second pair-wise semantic distance score SD( ) for each image in T, wherein said second score SD( ) specifies a metric for the semantic distance between the second keyword annotations and the ground-truth keyword annotations; and
  
  generating a semantic relative comparative score (RCS) which compares the annotation precision of the first and second AIA algorithms by first determining a number of images in T for which the first score SD( ) is less than the second score SD( ), and then dividing said number of images by n.
- View Dependent Claims (18, 19)
- - 18. The process of claim 17, wherein,whenever the semantic RCS is greater than 0.5 the annotation precision of AIA algorithm 1 is greater than that of AIA algorithm 2,whenever the semantic RCS is less than 0.5 the annotation precision of AIA algorithm 2 is greater than that of AIA algorithm 1, andwhenever the semantic RCS is equal to 0.5 the annotation precision of AIA algorithms 1 and 2 is the same.
  - 19. The process of claim 17, wherein the pair-wise semantic distance score SD( ) for a each image in T is given by the equation

20. A computer-implemented process for automatically annotating a new image, comprising using a computing device to perform the following process actions:
- inputting a set of training images, wherein the new image is not in manually annotating each training image in with a vector of annotations comprising one or more textual keywords, wherein each keyword describes a different low-level visual feature in the image;
  
  computing a pair-wise annotation-based semantic distance score between every possible pair of training images in utilizing a constant shift embedding framework to embed the training images in into a Euclidean vector space;
  
  utilizing an x-means algorithm to group the embedded training images into H different semantic clusters of training images based upon the annotation-based scores, wherein k is a variable which uniquely identifies each cluster for each semantic cluster of training images,generating a set of relaxed relative comparison constraints for wherein is given by the equation ={x_i^(k)}_i=1ⁿ^k, x_i^(k)is a feature vector for the i-th training image in n_kis a number of training images in is given by the equation ={(x_a,x_b,x_c)}, and (x_a,x_b,x_c) is a subset of all the possible triples of training images in which satisfies either,a first condition in which the intuitive semantic distance between x_aand x_cis greater than said distance between x_aand x_b, ora second condition in which the intuitive semantic distance between x_aand x_cequals said distance between x_aand x_bbut the difference between the features in x_aand the features in x_cis greater than the difference between the features in x_aand the features in x_b,randomly sampling a prescribed number m of the constraints from resulting in a subset of relaxed relative comparison constraints for given by (i=1, . . . , m),training m different pair-wise semantic distance functions (SDFs) {f₁^(k), . . . , f_m^(k)} for wherein each pair-wise SDF f_i^(k)is trained using generating an SDF f^(k)for by computing an average of the m different pair-wise SDFs {f₁^(k), . . . , f_m^(k)},utilizing f^(k)to compute a pair-wise feature-based semantic distance score between the new image and each training image in resulting in a set of pair-wise feature-based semantic distance scores for utilizing the set of pair-wise feature-based semantic distance scores for to generate a ranking list for wherein said list ranks each training image in according to its intuitive semantic distance from the new image,generating a probability density function (PDF) which estimates the visual features in the training images in utilizing the PDF to estimate a cluster association probability p(k) for wherein p(k) specifies a probability of the new image being semantically associated with utilizing the ranking list for to rank the vectors of annotations for all the training images in resulting in a set of ranked annotations for given by {t₁^(k), t₂^(k), . . . , t_n_k^(k)}, wherein t_i^(k)is the vector of annotations for the i-th training image in the ranking list,utilizing the ranking list for to rank the pair-wise feature-based semantic distance scores between the new image and each training image in resulting in a set of ranked pair-wise feature-based semantic distance scores for given by {d₁^(k), d₂^(k), . . . , d_n_k^(k)}, wherein d_i^(k)is the pair-wise feature-based semantic distance score between the new image and the i-th training image in the ranking list,computing a cluster-specific vector w^(k)of probabilistic annotations for the new image as

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Wang, Yong, Mei, Tao, Hua, Xian-Sheng, Li, Shipeng

Granted Patent

US 7,890,512 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/51 Indexing; Data structures t...

G06F 16/58 Retrieval characterised by ...

AUTOMATIC IMAGE ANNOTATION USING SEMANTIC DISTANCE LEARNING

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

AUTOMATIC IMAGE ANNOTATION USING SEMANTIC DISTANCE LEARNING

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links