Classifying images

US 10,127,475 B1
Filed: 09/22/2016
Issued: 11/13/2018
Est. Priority Date: 05/31/2013
Status: Active Grant

First Claim

Patent Images

1. A method performed by one or more computers, the method comprising:

obtaining data that associates each term in a vocabulary of terms with a respective high-dimensional representation of the term, wherein the high-dimensional representation of the term is a numeric representation of the term in a high-dimensional space, and wherein the vocabulary of terms comprises a plurality of object category labels; and

training a modified visual recognition system on a plurality of training images, wherein the modified visual recognition system includes a neural network having multiple layers, wherein each of the plurality of training images is associated with a respective known category label from the plurality of object category labels, wherein the modified visual recognition system is configured to, for each of the training images, receive the training image and to output a high-dimensional representation in the high-dimensional space for the training image, and wherein the training comprises;

performing multiple iterations of a training procedure to minimize a loss function to determine trained values of parameters of the neural network, wherein the loss function satisfies, for each of the training images;

loss(image,label)=Σ

_j≠

labelmax[0,margin−

t_label·

representation+t_j·

representation], wherein;

image is the training imagelabel is a known category label for the training image,representation is a current iteration high-dimensional representation for the training image in a current iteration of the training procedure,t_labelis a high-dimensional representation of the known label,t_jis a high-dimensional representation of an object category label j in the vocabulary of terms other than the known label, andmargin is a constant value.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for classifying images. One of the methods includes obtaining data that associates each of a plurality of object category labels with a respective high-dimensional representation of the object category label, wherein the high-dimensional representation of the object category label is a numeric representation of the object category label in a high-dimensional space; receiving an input image; processing the input image using one or more core layers to generate an alternative representation of the input image; processing the alternative representation of the input image using a transformation layer to determine a high-dimensional representation for the input image; selecting, from the high-dimensional representations associated with the object category labels, a closest high-dimensional representation to the high-dimensional representation for the input image; and selecting the category label associated with the closest high-dimensional representation as a predicted label for the input image.

12 Citations

View as Search Results

17 Claims

1. A method performed by one or more computers, the method comprising:
- obtaining data that associates each term in a vocabulary of terms with a respective high-dimensional representation of the term, wherein the high-dimensional representation of the term is a numeric representation of the term in a high-dimensional space, and wherein the vocabulary of terms comprises a plurality of object category labels; and
  
  training a modified visual recognition system on a plurality of training images, wherein the modified visual recognition system includes a neural network having multiple layers, wherein each of the plurality of training images is associated with a respective known category label from the plurality of object category labels, wherein the modified visual recognition system is configured to, for each of the training images, receive the training image and to output a high-dimensional representation in the high-dimensional space for the training image, and wherein the training comprises;
  
  performing multiple iterations of a training procedure to minimize a loss function to determine trained values of parameters of the neural network, wherein the loss function satisfies, for each of the training images;
  
  loss(image,label)=Σ
  
  _j≠
  
  labelmax[0,margin−
  
  t_label·
  
  representation+t_j·
  
  representation], wherein;
  
  image is the training imagelabel is a known category label for the training image,representation is a current iteration high-dimensional representation for the training image in a current iteration of the training procedure,t_labelis a high-dimensional representation of the known label,t_jis a high-dimensional representation of an object category label j in the vocabulary of terms other than the known label, andmargin is a constant value.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the modified visual recognition system is configured to, for each of the training images, receive the training image and to select an object category label having a high-dimensional representation that is closest to the high-dimensional representation for the training image as an object category label for the training image,wherein the object category label for the training image is a category label for an object category that the modified visual recognition system determines an object pictured in the training image belongs to,wherein the neural network of the modified visual recognition system comprises two or more core layers and a transformation layer,wherein the two or more core layers are configured to receive the training image and generate an alternative representation of the training image, andwherein the transformation layer is configured to receive the alternative representation of the training image from the two or more core layers and generate the high-dimensional representation for the training image.
  - 3. The method of claim 2, further comprising:
    - training an initial visual recognition system on the plurality of training images to determine pre-trained values of parameters of the two or more core layers, wherein the initial visual recognition system includes a neural network having multiple layers,wherein the initial visual recognition system is configured to, for each of the training images, receive the training image and predict a respective score for each of a plurality of object categories,wherein the respective score for each of the plurality of object categories represents a predicted likelihood that the training image contains an image of an object from the object category,wherein the initial visual recognition system comprises the two or more core layers, which the initial visual recognition system has in common with the modified visual recognition system, and a classifier layer,wherein the classifier layer is configured to receive the alternative representation of the training image and generate the respective scores for the training image, andwherein the initial visual recognition system does not include the transformation layer of the modified visual recognition system; and
      
      training the modified visual recognition system after training the initial visual recognition system, wherein training the modified visual recognition system comprises further training the two or more core layers.
  - 4. The method of claim 3, wherein training the modified visual recognition system comprises generating trained values of parameters of the two or more core layers and of parameters of the transformation layer from the pre-trained values of the parameters of the two or more core layers and initial values of the parameters of the transformation layer.
  - 5. The method of claim 2, wherein selecting an object category label having a high-dimensional representation that is closest to the high-dimensional representation for the training image comprises selecting an object category label that is associated with a high-dimensional representation that has a largest cosine similarity value with the high-dimensional representation for the training image.
  - 6. The method of claim 1, wherein training the modified visual recognition system comprises training the modified visual recognition system to produce, for each of the training images, a higher cosine similarity between the high-dimensional representation for the training image and the high-dimensional representation of a label for a corresponding known object category for the training image than between the predicted high-dimensional representation for the training image and representations of other terms in the vocabulary.
  - 7. The method of claim 1, wherein obtaining the data that associates each term in a vocabulary of terms with a respective high-dimensional representation of the term comprises:
    - training a machine learning system configured to process each term in the vocabulary of terms to obtain the respective high-dimensional representation of the term in the vocabulary and to associate each term in the vocabulary with the respective high-dimensional representation of the term.

8. A system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising:
- obtaining data that associates each term in a vocabulary of terms with a respective high-dimensional representation of the term, wherein the high-dimensional representation of the term is a numeric representation of the term in a high-dimensional space, and wherein the vocabulary of terms comprises a plurality of object category labels; and
  
  training a modified visual recognition system on a plurality of training images, wherein the modified visual recognition system includes a neural network having multiple layers, wherein each of the plurality of training images is associated with a respective known category label from the plurality of object category labels, wherein the modified visual recognition system is configured to, for each of the training images, receive the training image and to output a high-dimensional representation in the high-dimensional space for the training image, and wherein the training comprises;
  
  performing multiple iterations of a training procedure to minimize a loss function to determine trained values of parameters of the neural network, wherein the loss function satisfies, for each of the training images;
  
  loss(image,label)=Σ
  
  _j≠
  
  labelmax[0,margin−
  
  t_label·
  
  representation+t_j·
  
  representation], wherein;
  
  image is the training imagelabel is a known category label for the training image,representation is a current iteration high-dimensional representation for the training image in a current iteration of the training procedure,t_labelis a high-dimensional representation of the known label,t_jis a high-dimensional representation of an object category label j in the vocabulary of terms other than the known label, andmargin is a constant value.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the modified visual recognition system is configured to, for each of the training images, receive the training image and to select an object category label having a high-dimensional representation that is closest to the high-dimensional representation for the training image as an object category label for the training image,wherein the object category label for the training image is a category label for an object category that the modified visual recognition system determines an object pictured in the training image belongs to,wherein the neural network of the modified visual recognition system comprises two or more core layers and a transformation layer,wherein the two or more core layers are configured to receive the training image and generate an alternative representation of the training image, andwherein the transformation layer is configured to receive the alternative representation of the training image from the two or more core layers and generate the high-dimensional representation for the training image.
  - 10. The system of claim 9, the operations further comprising:
    - training an initial visual recognition system on the plurality of training images to determine pre-trained values of parameters of the two or more core layers, wherein the initial visual recognition system includes a neural network having multiple layers,wherein the initial visual recognition system is configured to, for each of the training images, receive the training image and predict a respective score for each of a plurality of object categories,wherein the respective score for each of the plurality of object categories represents a predicted likelihood that the training image contains an image of an object from the object category,wherein the initial visual recognition system comprises the two or more core layers, which the initial visual recognition system has in common with the modified visual recognition system, and a classifier layer,wherein the classifier layer is configured to receive the alternative representation of the training image and generate the respective scores for the training image, andwherein the initial visual recognition system does not include the transformation layer of the modified visual recognition system; and
      
      training the modified visual recognition system after the training the initial visual recognition system, wherein training the modified recognition system comprises further training the two or more core layers.
  - 11. The system of claim 10, wherein training the modified visual recognition system comprises generating trained values of parameters of the two or more core layers and of parameters of the transformation layer from the pre-trained values of the parameters of the two or more core layers and initial values of the parameters of the transformation layer.
  - 12. The system of claim 9, wherein selecting an object category label having a high-dimensional representation that is closest to the high-dimensional representation for the training image comprises selecting an object category label that is associated with a high-dimensional representation that has a largest cosine similarity value with the high-dimensional representation for the training image.
  - 13. The system of claim 8, wherein training the modified visual recognition system comprises training the modified visual recognition system to produce, for each of the training images, a higher cosine similarity between the high-dimensional representation for the training image and the high-dimensional representation of a label for a corresponding known object category for the training image than between the predicted high-dimensional representation for the training image and representations of other terms in the vocabulary.
  - 14. The system of claim 8, wherein obtaining the data that associates each term in a vocabulary of terms with a respective high-dimensional representation of the term comprises:
    - training a machine learning system configured to process each term in the vocabulary of terms to obtain the respective high-dimensional representation of the term in the vocabulary and to associate each term in the vocabulary with the respective high-dimensional representation of the term.

15. One or more non-transitory computer storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
- obtaining data that associates each term in a vocabulary of terms with a respective high-dimensional representation of the term, wherein the high-dimensional representation of the term is a numeric representation of the term in a high-dimensional space, and wherein the vocabulary of terms comprises a plurality of object category labels; and
  
  training a modified visual recognition system on a plurality of training images, wherein the modified visual recognition system includes a neural network having multiple layers, wherein each of the plurality of training images is associated with a respective known category label from the plurality of object category labels, wherein the modified visual recognition system is configured to, for each of the training images, receive the training image and to output a high-dimensional representation in the high-dimensional space for the training image, and wherein the training comprises;
  
  performing multiple iterations of a training procedure to minimize a loss function to determine trained values of parameters of the neural network, wherein the loss function satisfies, for each of the training images;
  
  loss(image,label)=Σ
  
  _j≠
  
  labelmax[0,margin−
  
  t_label·
  
  representation+t_j·
  
  representation], wherein;
  
  image is the training imagelabel is a known category label for the training image,representation is a current iteration high-dimensional representation for the training image in a current iteration of the training procedure,t_labelis a high-dimensional representation of the known label,t_jis a high-dimensional representation of an object category label j in the vocabulary of terms other than the known label, andmargin is a constant value.
- View Dependent Claims (16, 17)
- - 16. The computer storage media of claim 15, wherein training the modified visual recognition system comprises training the modified visual recognition system to produce, for each of the training images, a higher cosine similarity between the high-dimensional representation for the training image and the high-dimensional representation of a label for a corresponding known object category for the training image than between the predicted high-dimensional representation for the training image and representations of other terms in the vocabulary.
  - 17. The computer storage media of claim 15, wherein obtaining the data that associates each term in a vocabulary of terms with a respective high-dimensional representation of the term comprises:
    - training a machine learning system configured to process each term in the vocabulary of terms to obtain the respective high-dimensional representation of the term in the vocabulary and to associate each term in the vocabulary with the respective high-dimensional representation of the term.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Corrado, Gregory S., Dean, Jeffrey A., Bengio, Samy, Frome, Andrea L., Shlens, Jonathon
Primary Examiner(s)
Potts, Ryan P

Application Number

US15/273,572
Time in Patent Office

782 Days
Field of Search

None
US Class Current
CPC Class Codes

G06F 18/213   Feature extraction, e.g. by...

G06F 18/214   Generating training pattern...

G06F 18/22   Matching criteria, e.g. pro...

G06F 18/2413   based on distances to train...

G06N 3/04   Architecture, e.g. intercon...

G06N 3/084   Backpropagation, e.g. using...

G06T 2207/20084   Artificial neural networks ...

G06V 10/761   Proximity, similarity or di...

G06V 10/764   using classification, e.g. ...

G06V 10/7715   Feature extraction, e.g. by...

G06V 10/774   Generating sets of training...

G06V 10/82   using neural networks

Classifying images

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

12 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Classifying images

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

12 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links