Privacy-preserving text to image matching

US 9,367,763 B1
Filed: 01/12/2015
Issued: 06/14/2016
Est. Priority Date: 01/12/2015
Status: Active Grant

First Claim

Patent Images

1. A method for text-to-image matching comprising:

storing a set of text image representations, each text image representation having been generated by embedding a respective text image into a first vectorial space with a first embedding function;

with a second embedding function, embedding a character string into a second vectorial space to generate a character string representation;

for each of at least some of the text image representations, computing a compatibility between the character string representation and the text image representation, comprising computing a function of the text image representation, character string representation, and a transformation, the transformation having being derived by minimizing a loss function on a set of labeled training images, the loss function including a text-to-image-loss and an image-to-text loss; and

identifying a matching text image based on the computed compatibilities,wherein at least one of the embedding and the computing of the compatibility is performed with a processor.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for text-to-image matching includes generating representations of text images, such as license plate images, by embedding each text image into a first vectorial space with a first embedding function. With a second embedding function, a character string, such as a license plate number to be matched, is embedded into a second vectorial space to generate a character string representation. A compatibility is computed between the character string representation and one or more of the text image representations to identify a matching one. The compatibility is computed with a function that uses a transformation which is learned on a training set of labeled images. The learning uses a loss function that aggregates a text-to-image-loss and an image-to-text loss over the training set. The image-to-text loss penalizes the transformation when it correctly ranks a pair of character string representations, given an image representation corresponding to one of them.

50 Citations

View as Search Results

22 Claims

1. A method for text-to-image matching comprising:
- storing a set of text image representations, each text image representation having been generated by embedding a respective text image into a first vectorial space with a first embedding function;
  
  with a second embedding function, embedding a character string into a second vectorial space to generate a character string representation;
  
  for each of at least some of the text image representations, computing a compatibility between the character string representation and the text image representation, comprising computing a function of the text image representation, character string representation, and a transformation, the transformation having being derived by minimizing a loss function on a set of labeled training images, the loss function including a text-to-image-loss and an image-to-text loss; and
  
  identifying a matching text image based on the computed compatibilities,wherein at least one of the embedding and the computing of the compatibility is performed with a processor.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1, wherein the loss function further includes a term which is not used in the computing of the compatibility.
  - 3. The method of claim 1, wherein the text-to-image-loss penalizes incorrect ranking of images, given a label corresponding to only one of the images and the image-to-text loss penalizes correct ranking of labels, given an image corresponding to only one of the labels.
  - 4. The method of claim 1, wherein the compatibility function is a function of φ
    - (x)^TWφ
      
      (y), where φ
      
      (x) represents the text image representation, φ
      
      (y) represents the character string representation, and W represents the transformation.
  - 5. The method of claim 1, further comprises outputting information based on the identified text image.
  - 6. The method of claim 5, wherein the output information is derived from metadata associated with the text image.
  - 7. The method of claim 1, wherein the matching text image is identified by identifying the text image for which the compatibility function is maximized.
  - 8. The method of claim 1, wherein the transformation is a d×
    - D dimensional compatibility matrix, where D is the dimensionality of the character string representation and d is the dimensionality of the text image representations.
  - 9. The method of claim 8, wherein d is larger than D.
  - 10. The method of claim 1, wherein the text image representations are Fisher vectors.
  - 11. The method of claim 1, wherein the character string representations are derived by partitioning the character string into a plurality of regions at each of a plurality of levels, generating a representation of each region based on the characters in the region, and computing a representation of the character string based on the region representations.
  - 12. The method of claim 1, wherein the text images are license plate images and the character strings are license plate numbers.
  - 13. A computer program product comprising a non-transitory recording medium storing instructions which when executed by a computer, perform the method of claim 1.
  - 14. A system comprising memory which stores instructions for performing the method of claim 1 and a processor in communication with the memory for executing the instruction.

15. A method for learning a compatibility function for use in a text-to-image matching system comprising:
- for each of a set of training images, each labeled with a respective character string, generating a text image representation by embedding the training image into a first vectorial space with a first embedding function;
  
  with a second embedding function, embedding the character strings into a second vectorial space to generate character string representations;
  
  learning a compatibility function which is a function of a compatibility matrix, a character string representation of a character string to be matched, and a text image representation for one of a set of text images, the learning including identifying a compatibility matrix which minimizes a loss function on the set of labeled training images, the loss function aggregating a text-to-image-loss and an image-to-text loss; and
  
  outputting the learned compatibility function including the compatibility matrix which minimizes the loss function,wherein at least one of the embedding and the learning of the compatibility function is performed with a processor.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The method of claim 15, wherein the learned compatibility function enables text-to-image matching to be performed with higher accuracy than image-to-text matching.
  - 17. The method of claim 15, wherein the compatibility function is of the form F(x,y;
    - W)=φ
      
      (x)^TWφ
      
      (y), where φ
      
      (x) represents the text image representation, φ
      
      (y) represents the character string representation, and W represents the compatibility matrix.
  - 18. The method of claim 15, wherein the text-to-image loss is an aggregate of losses of the form =max(0,1+Δ
    - ^TWφ
      
      (y_i) and the image-to-text loss is an aggregate of losses of the form =max(0,1+φ
      
      (x_i)^TW{circumflex over (Δ
      
      )}), where Δ
      
      represents a difference between two of the image representations and φ
      
      (y_i)) is a character string representation for a label of one of them and {circumflex over (Δ
      
      )} represents a difference between the character string representations of two of the labels and φ
      
      (x_i)) is a text image representation of a text imaging having one of the labels.
  - 19. The method of claim 15, wherein the loss function minimizes the empirical loss over the training set as a function of a weighted sum of the text-to-image loss and the image-to-text loss.
  - 20. The method of claim 19, wherein the loss function further includes a term which is not used in the compatibility function.

21. A system comprising:
- a text image representation generator which, for each of a set of training images, each labeled with a respective character string, generates a text image representation by embedding the training image into a first vectorial space with a first embedding function;
  
  a text string representation generator which embeds the character strings into a second vectorial space with a second embedding function to generate character string representations;
  
  a training component which learns a compatibility function which is a function of a compatibility matrix, a character string representation of a character string to be matched, and a text image representation for one of a set of text images, the learning including identifying a compatibility matrix which minimizes a loss function on the set of labeled training images, the loss function aggregating, over the set of images, a text-to-image-loss and an image-to-text loss;
  
  a processor which implements the text image representation generator, text string representation generator, and training component.
- View Dependent Claims (22)
- - 22. The system of claim 21, further comprising:
    - a comparator which, given a representation of a text string to be matched, performs a comparison between the representation of the text string to be matched and each of a set of image representations using the learned a compatibility function.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Conduent Business Services, LLC (Conduent, Inc.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Gordo Soldevila, Albert, Perronnin, Florent C.
Primary Examiner(s)
Goradia, Shefali

Application Number

US14/594,321
Time in Patent Office

519 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06F 16/24578   using ranking

G06F 16/50   of still image data

G06F 16/5846   using extracted text

G06F 18/213   Feature extraction, e.g. by...

G06F 21/60   Protecting data

G06F 21/6245   Protecting personal data, e...

G06V 10/7715   Feature extraction, e.g. by...

G06V 20/62   Text, e.g. of license plate...

Privacy-preserving text to image matching

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

50 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Privacy-preserving text to image matching

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

50 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links