LATENT EMBEDDINGS FOR WORD IMAGES AND THEIR SEMANTICS

US 20170011279A1
Filed: 07/07/2015
Published: 01/12/2017
Est. Priority Date: 07/07/2015
Status: Active Grant

First Claim

Patent Images

1. A semantic comparison method, comprising:

providing training word images labeled with concepts;

with the training word images and their labels, learning a first embedding function for embedding word images in a semantic subspace into which the concepts are embedded with a second embedding function;

receiving a query comprising at least one test word image or at least one concept;

where the query comprises at least one test word image, generating a representation of each of the at least one test word image, comprising embedding the test word image in the semantic subspace with the first embedding function;

where the query comprises at least one concept, providing a representation of the at least one concept generated by embedding each of the at least one concept the embedding function;

computing a comparison between;

a) at least one of the test word image representations, andb) at least one of;

at least one of the concept representations, andanother of test word image representations; and

outputting information based on the comparison.

View all claims

6 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method enable semantic comparisons to be made between word images and concepts. Training word images and their concept labels are used to learn parameters of a neural network for embedding word images and concepts in a semantic subspace in which comparisons can be made between word images and concepts without the need for transcribing the text content of the word image. The training of the neural network aims to minimize a ranking loss over the training set where non relevant concepts for an image which are ranked more highly than relevant ones penalize the ranking loss.

107 Citations

20 Claims

1. A semantic comparison method, comprising:
- providing training word images labeled with concepts;
  
  with the training word images and their labels, learning a first embedding function for embedding word images in a semantic subspace into which the concepts are embedded with a second embedding function;
  
  receiving a query comprising at least one test word image or at least one concept;
  
  where the query comprises at least one test word image, generating a representation of each of the at least one test word image, comprising embedding the test word image in the semantic subspace with the first embedding function;
  
  where the query comprises at least one concept, providing a representation of the at least one concept generated by embedding each of the at least one concept the embedding function;
  
  computing a comparison between;
  
  a) at least one of the test word image representations, andb) at least one of;
  
  at least one of the concept representations, andanother of test word image representations; and
  
  outputting information based on the comparison.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The method of claim 1, wherein at least one of the learning of the first embedding function, generating of the representation of each of the at least one test word image, providing a representation of the at least one concept, and computing of the comparison is performed with a processor.
  - 3. The method of claim 1, wherein the embedding of the at least one test word image is performed without generating a transcription of the test word image.
  - 4. The method of claim 1, wherein the providing of the labeled training word images comprises for each of a set of training word images labeled with respective entity names, accessing a hierarchy of hypernyms with the entity name to identify a concept which is a hypernym of the entity name and labeling the training word image with the concept.
  - 5. The method of claim 1, wherein the learning of the first embedding function for embedding word images in the semantic subspace into which the concepts are embedded with a second embedding function includes computing a ranking loss objective function that favors the concepts that are relevant to a test image being ranked ahead of those that are not relevant.
  - 6. The method of claim 5, wherein the ranking loss objective function is a weighted approximate pairwise ranking loss (WARP).
  - 7. The method of claim 1, wherein the first and second embedding functions are learned with a convolutional neural network (CNN).
  - 8. The method of claim 7, wherein the embedding of each of the at least one word image is based on the activations of a first selected fully-connected layer of the trained convolutional neural network.
  - 9. The method of claim 8, wherein the first selected fully-connected layer comprises one of a penultimate fully-connected layer and a last fully-connected layer of the convolutional neural network.
  - 10. The method of claim 7, wherein the embedding of each of the at least one concept is constructed with the weights of a second selected layer of the fully-connected layers of the trained convolutional neural network.
  - 11. The method of claim 10, wherein the second selected fully-connected layer is a last fully-connected layer of the convolutional neural network.
  - 12. The method of claim 1, further comprising identifying at least one most related concept to the test image, based on the comparison.
  - 13. The method of claim 1, further comprising identifying at least one most related images to a concept, based on the comparison.
  - 14. The method of claim 1, further comprising identifying the most related images to a test image, based on the comparison, where images are related if they have concepts in common.
  - 15. The method of claim 1, wherein the computing of the comparison comprises computing a dot product or Euclidean distance between at least one test image representation and at least one concept representation.
  - 16. A computer program product comprising a non-transitory recording medium storing instructions, which when executed on a computer, causes the computer to perform the method of claim 1.
  - 17. A system comprising memory which stores instructions for performing the method of claim 1 and a processor in communication with the memory for executing the instructions.

18. A semantic comparison system, comprising:
- memory which stores a neural network having parameters which have been trained with training word images labeled with concepts from a set of concepts, each the concepts corresponding to a set of entity names, the training word images each being an image of one of the set of entity names for the concept with which it is labeled, the neural network having been learned to embed the training word images and the concepts into a common semantic space with a ranking loss objective function which favors the concepts that are relevant to a word image being ranked, by the neural network, ahead of those that are not relevant;
  
  a comparison component for computing a compatibility between a word image and a concept which have both been embedded in the common semantic space using the trained neural network;
  
  an output component which outputs information based on the comparison; and
  
  a processor in communication with the memory which implements the comparison component and output component.
- View Dependent Claims (19)
- - 19. The system of claim 18, further comprising at least one of:
    - a concept mining component which identifies concepts for entity names of the training word images; and
      
      a learning component which learns the neural network.

20. A semantic comparison method, comprising:
- providing a neural network having parameters which have been learned with training word images labeled with concepts from a set of concepts, the neural network having been learned to embed the training word images and the concepts into a common semantic space with a ranking loss objective function which favors the concepts that are relevant to a training word image being ranked, by the neural network, ahead of those that are not relevant;
  
  computing a compatibility between a word image and a concept which have both been embedded in the common semantic space using the trained neural network; and
  
  outputting information based on the compatibility computation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Xerox Corporation (Xerox Holdings Corp.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Almazn Almazn, Jon, Soldevila, Albert Gordo, Murray, Naila, Perronnin, Florent C.

Granted Patent

US 10,635,949 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/5846   using extracted text

G06N 3/04   Architecture, e.g. intercon...

G06N 3/084   Backpropagation, e.g. using...

G06V 30/194   References adjustable by an...

G06V 30/226   of cursive writing

LATENT EMBEDDINGS FOR WORD IMAGES AND THEIR SEMANTICS

First Claim

6 Assignments

0 Petitions

Accused Products

Abstract

107 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

LATENT EMBEDDINGS FOR WORD IMAGES AND THEIR SEMANTICS

First Claim

6 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

107 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links