AUTOMATIC IMAGE ANNOTATION USING SEMANTIC DISTANCE LEARNING
First Claim
1. A computer-implemented process for automatically annotating a new image, comprising using a computing device to perform the following process actions:
- inputting a set of training images, wherein the new image is not in Tmanually annotating each training image in with a vector of keyword annotations;
partitioning into a plurality of semantic clusters of training images, wherein k is a variable which uniquely identifies each cluster, comprises training images that are semantically similar, and each training image is partitioned into a single cluster;
for each semantic cluster of training images,learning a semantic distance function (SDF) f(k) for utilizing f(k) to compute a pair-wise feature-based semantic distance score between the new image and each training image in resulting in a set of pair-wise feature-based semantic distance scores for wherein each feature-based score in the set specifies a metric for an intuitive semantic distance between the new image and a particular training image in utilizing the set of pair-wise feature-based semantic distance scores for to generate a ranking list for wherein said list ranks each training image in according to its intuitive semantic distance from the new image,estimating a cluster association probability p(k) for wherein p(k) specifies a probability of the new image being semantically associated with andprobabilistically propagating the vector of keyword annotations for each training image in to the new image, resulting in a cluster-specific vector w(k) of probabilistic annotations for the new image; and
utilizing p(k) and w(k) for all the semantic clusters of training images to generate a vector w of final keyword annotations for the new image.
2 Assignments
0 Petitions
Accused Products
Abstract
Images are automatically annotated using semantic distance learning. Training images are manually annotated and partitioned into semantic clusters. Semantic distance functions (SDFs) are learned for the clusters. The SDF for each cluster is used to compute semantic distance scores between a new image and each image in the cluster. The scores for each cluster are used to generate a ranking list which ranks each image in the cluster according to its semantic distance from the new image. An association probability is estimated for each cluster which specifies the probability of the new image being semantically associated with the cluster. Cluster-specific probabilistic annotations for the new image are generated from the manual annotations for the images in each cluster. The association probabilities and cluster-specific probabilistic annotations for all the clusters are used to generate final annotations for the new image.
-
Citations
20 Claims
-
1. A computer-implemented process for automatically annotating a new image, comprising using a computing device to perform the following process actions:
-
inputting a set of training images, wherein the new image is not in T manually annotating each training image in with a vector of keyword annotations; partitioning into a plurality of semantic clusters of training images, wherein k is a variable which uniquely identifies each cluster, comprises training images that are semantically similar, and each training image is partitioned into a single cluster; for each semantic cluster of training images, learning a semantic distance function (SDF) f(k) for utilizing f(k) to compute a pair-wise feature-based semantic distance score between the new image and each training image in resulting in a set of pair-wise feature-based semantic distance scores for wherein each feature-based score in the set specifies a metric for an intuitive semantic distance between the new image and a particular training image in utilizing the set of pair-wise feature-based semantic distance scores for to generate a ranking list for wherein said list ranks each training image in according to its intuitive semantic distance from the new image, estimating a cluster association probability p(k) for wherein p(k) specifies a probability of the new image being semantically associated with and probabilistically propagating the vector of keyword annotations for each training image in to the new image, resulting in a cluster-specific vector w(k) of probabilistic annotations for the new image; and utilizing p(k) and w(k) for all the semantic clusters of training images to generate a vector w of final keyword annotations for the new image. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer-implemented process for comparing the annotation precision of two different automatic image annotation (AIA) algorithms, comprising using a computing device to perform the following process actions:
-
inputting a set T of images; manually applying ground-truth keyword annotations to each image in T, wherein T comprises a total number of images given by n; utilizing a first AIA algorithm to automatically generate first keyword annotations for each image in T; utilizing a second AIA algorithm to automatically generate second keyword annotations for each image in T; computing a first pair-wise semantic distance score SD( ) for each image in T, wherein said first score SD( ) specifies a metric for a semantic distance between the first keyword annotations and the ground-truth keyword annotations; computing a second pair-wise semantic distance score SD( ) for each image in T, wherein said second score SD( ) specifies a metric for the semantic distance between the second keyword annotations and the ground-truth keyword annotations; and generating a semantic relative comparative score (RCS) which compares the annotation precision of the first and second AIA algorithms by first determining a number of images in T for which the first score SD( ) is less than the second score SD( ), and then dividing said number of images by n. - View Dependent Claims (18, 19)
-
-
20. A computer-implemented process for automatically annotating a new image, comprising using a computing device to perform the following process actions:
-
inputting a set of training images, wherein the new image is not in manually annotating each training image in with a vector of annotations comprising one or more textual keywords, wherein each keyword describes a different low-level visual feature in the image; computing a pair-wise annotation-based semantic distance score between every possible pair of training images in utilizing a constant shift embedding framework to embed the training images in into a Euclidean vector space; utilizing an x-means algorithm to group the embedded training images into H different semantic clusters of training images based upon the annotation-based scores, wherein k is a variable which uniquely identifies each cluster for each semantic cluster of training images, generating a set of relaxed relative comparison constraints for wherein is given by the equation ={xi(k)}i=1n k , xi(k) is a feature vector for the i-th training image in nk is a number of training images in is given by the equation ={(xa,xb,xc)}, and (xa,xb,xc) is a subset of all the possible triples of training images in which satisfies either,a first condition in which the intuitive semantic distance between xa and xc is greater than said distance between xa and xb, or a second condition in which the intuitive semantic distance between xa and xc equals said distance between xa and xb but the difference between the features in xa and the features in xc is greater than the difference between the features in xa and the features in xb, randomly sampling a prescribed number m of the constraints from resulting in a subset of relaxed relative comparison constraints for given by (i=1, . . . , m), training m different pair-wise semantic distance functions (SDFs) {f1(k), . . . , fm(k)} for wherein each pair-wise SDF fi(k) is trained using generating an SDF f(k) for by computing an average of the m different pair-wise SDFs {f1(k), . . . , fm(k)}, utilizing f(k) to compute a pair-wise feature-based semantic distance score between the new image and each training image in resulting in a set of pair-wise feature-based semantic distance scores for utilizing the set of pair-wise feature-based semantic distance scores for to generate a ranking list for wherein said list ranks each training image in according to its intuitive semantic distance from the new image, generating a probability density function (PDF) which estimates the visual features in the training images in utilizing the PDF to estimate a cluster association probability p(k) for wherein p(k) specifies a probability of the new image being semantically associated with utilizing the ranking list for to rank the vectors of annotations for all the training images in resulting in a set of ranked annotations for given by {t1(k), t2(k), . . . , tn k (k)}, wherein ti(k) is the vector of annotations for the i-th training image in the ranking list,utilizing the ranking list for to rank the pair-wise feature-based semantic distance scores between the new image and each training image in resulting in a set of ranked pair-wise feature-based semantic distance scores for given by {d1(k), d2(k), . . . , dn k (k)}, wherein di(k) is the pair-wise feature-based semantic distance score between the new image and the i-th training image in the ranking list,computing a cluster-specific vector w(k) of probabilistic annotations for the new image as
-
Specification