Classifying images
First Claim
1. A method performed by one or more computers, the method comprising:
- obtaining data that associates each term in a vocabulary of terms with a respective high-dimensional representation of the term, wherein the high-dimensional representation of the term is a numeric representation of the term in a high-dimensional space, and wherein the vocabulary of terms comprises a plurality of object category labels; and
training a modified visual recognition system on a plurality of training images, wherein the modified visual recognition system includes a neural network having multiple layers, wherein each of the plurality of training images is associated with a respective known category label from the plurality of object category labels, wherein the modified visual recognition system is configured to, for each of the training images, receive the training image and to output a high-dimensional representation in the high-dimensional space for the training image, and wherein the training comprises;
performing multiple iterations of a training procedure to minimize a loss function to determine trained values of parameters of the neural network, wherein the loss function satisfies, for each of the training images;
loss(image,label)=Σ
j≠
labelmax[0,margin−
tlabel·
representation+tj·
representation], wherein;
image is the training imagelabel is a known category label for the training image,representation is a current iteration high-dimensional representation for the training image in a current iteration of the training procedure,tlabel is a high-dimensional representation of the known label,tj is a high-dimensional representation of an object category label j in the vocabulary of terms other than the known label, andmargin is a constant value.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying images. One of the methods includes obtaining data that associates each of a plurality of object category labels with a respective high-dimensional representation of the object category label, wherein the high-dimensional representation of the object category label is a numeric representation of the object category label in a high-dimensional space; receiving an input image; processing the input image using one or more core layers to generate an alternative representation of the input image; processing the alternative representation of the input image using a transformation layer to determine a high-dimensional representation for the input image; selecting, from the high-dimensional representations associated with the object category labels, a closest high-dimensional representation to the high-dimensional representation for the input image; and selecting the category label associated with the closest high-dimensional representation as a predicted label for the input image.
12 Citations
17 Claims
-
1. A method performed by one or more computers, the method comprising:
-
obtaining data that associates each term in a vocabulary of terms with a respective high-dimensional representation of the term, wherein the high-dimensional representation of the term is a numeric representation of the term in a high-dimensional space, and wherein the vocabulary of terms comprises a plurality of object category labels; and training a modified visual recognition system on a plurality of training images, wherein the modified visual recognition system includes a neural network having multiple layers, wherein each of the plurality of training images is associated with a respective known category label from the plurality of object category labels, wherein the modified visual recognition system is configured to, for each of the training images, receive the training image and to output a high-dimensional representation in the high-dimensional space for the training image, and wherein the training comprises; performing multiple iterations of a training procedure to minimize a loss function to determine trained values of parameters of the neural network, wherein the loss function satisfies, for each of the training images;
loss(image,label)=Σ
j≠
labelmax[0,margin−
tlabel·
representation+tj·
representation], wherein;image is the training image label is a known category label for the training image, representation is a current iteration high-dimensional representation for the training image in a current iteration of the training procedure, tlabel is a high-dimensional representation of the known label, tj is a high-dimensional representation of an object category label j in the vocabulary of terms other than the known label, and margin is a constant value. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising:
-
obtaining data that associates each term in a vocabulary of terms with a respective high-dimensional representation of the term, wherein the high-dimensional representation of the term is a numeric representation of the term in a high-dimensional space, and wherein the vocabulary of terms comprises a plurality of object category labels; and training a modified visual recognition system on a plurality of training images, wherein the modified visual recognition system includes a neural network having multiple layers, wherein each of the plurality of training images is associated with a respective known category label from the plurality of object category labels, wherein the modified visual recognition system is configured to, for each of the training images, receive the training image and to output a high-dimensional representation in the high-dimensional space for the training image, and wherein the training comprises; performing multiple iterations of a training procedure to minimize a loss function to determine trained values of parameters of the neural network, wherein the loss function satisfies, for each of the training images;
loss(image,label)=Σ
j≠
labelmax[0,margin−
tlabel·
representation+tj·
representation], wherein;image is the training image label is a known category label for the training image, representation is a current iteration high-dimensional representation for the training image in a current iteration of the training procedure, tlabel is a high-dimensional representation of the known label, tj is a high-dimensional representation of an object category label j in the vocabulary of terms other than the known label, and margin is a constant value. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. One or more non-transitory computer storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
obtaining data that associates each term in a vocabulary of terms with a respective high-dimensional representation of the term, wherein the high-dimensional representation of the term is a numeric representation of the term in a high-dimensional space, and wherein the vocabulary of terms comprises a plurality of object category labels; and training a modified visual recognition system on a plurality of training images, wherein the modified visual recognition system includes a neural network having multiple layers, wherein each of the plurality of training images is associated with a respective known category label from the plurality of object category labels, wherein the modified visual recognition system is configured to, for each of the training images, receive the training image and to output a high-dimensional representation in the high-dimensional space for the training image, and wherein the training comprises; performing multiple iterations of a training procedure to minimize a loss function to determine trained values of parameters of the neural network, wherein the loss function satisfies, for each of the training images;
loss(image,label)=Σ
j≠
labelmax[0,margin−
tlabel·
representation+tj·
representation], wherein;image is the training image label is a known category label for the training image, representation is a current iteration high-dimensional representation for the training image in a current iteration of the training procedure, tlabel is a high-dimensional representation of the known label, tj is a high-dimensional representation of an object category label j in the vocabulary of terms other than the known label, and margin is a constant value. - View Dependent Claims (16, 17)
-
Specification