Tunable multi-part perceptual image hashing

US 9,628,805 B2
Filed: 05/19/2015
Issued: 04/18/2017
Est. Priority Date: 05/20/2014
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving first image data, wherein said first image data comprises a first icon;

performing a discrete cosine transformation (DCT) on at least a portion of the first image data to create a DCT matrix;

determining a plurality of features from coefficients of a plurality of areas of the DCT matrix, wherein the features comprise a sign of a coefficient, a magnitude of the coefficient, a neighbor variance of the coefficient, and a differential between a magnitude of the coefficient and a reference average magnitude;

encoding the plurality of features of the coefficients into a first hash string; and

determining a weighted distance between the first hash string and a second hash string associated with a second icon for use in determining whether the first icon is a suspicious icon that is potentially associated with malware.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods generate a perceptual image hash of an image. The perceptual image hash can be generated from multiple features extracted from a DCT transformation of the image. The perceptual image hash can be compared to other perceptual image hash values using a weighted Hamming distance function.

16 Citations

View as Search Results

30 Claims

1. A method comprising:
- receiving first image data, wherein said first image data comprises a first icon;
  
  performing a discrete cosine transformation (DCT) on at least a portion of the first image data to create a DCT matrix;
  
  determining a plurality of features from coefficients of a plurality of areas of the DCT matrix, wherein the features comprise a sign of a coefficient, a magnitude of the coefficient, a neighbor variance of the coefficient, and a differential between a magnitude of the coefficient and a reference average magnitude;
  
  encoding the plurality of features of the coefficients into a first hash string; and
  
  determining a weighted distance between the first hash string and a second hash string associated with a second icon for use in determining whether the first icon is a suspicious icon that is potentially associated with malware.
- View Dependent Claims (2, 3, 4, 5, 6, 19, 20, 21, 22)
- - 2. The method of claim 1, wherein the neighbor variance of the coefficient is determined according to an area of the portion of the DCT matrix.
  - 3. The method of claim 1, further comprising:
    - uniting a first area of the plurality of areas with a second area of the plurality of areas to create a temporary area;
      
      wherein the magnitude of the coefficient is determined based, at least in part, on statistical valued computed from the temporary area.
  - 4. The method of claim 1, further comprising:
    - determining a plurality of DCT matrices for a plurality of reference images; and
      
      determining a mean value for each corresponding coefficient of the plurality of DCT matrices;
      
      wherein the reference average magnitude comprises the mean value.
  - 5. The method of claim 1, wherein the weighted distance comprises a weighted hamming distance that is weighted according to a position of a feature encoded in the first hash string and the second hash string.
  - 6. The method of claim 1, further comprising:
    - composing a constant image pattern with the first image data, wherein the DCT transformation is performed on the first image data composed with the constant image pattern.
  - 19. The method of claim 1, wherein the icon comprises a first image domain and the weighted distance function utilizes adjustable weights, said method further comprising tuning the weights to fit a second image domain.
  - 20. The method of claim 19, said method further comprising adjusting the adjustable weights to shift preferences to particular traits of an image in a given frequency range.
  - 21. The method of claim 20, said method further comprising, based upon a priority of said features, using a plurality of sets of adjustable weights.
  - 22. The method of claim 21, said method further comprising using a different set of said sets of adjustable weights for each of a plurality of perceptual similarity determination passes.

7. A non-transitory machine-readable medium having stored thereon instructions, that when executed by one or more processors of a device, cause the device to:
- receive first image data, wherein said first image data comprises a first icon;
  
  perform a discrete cosine transformation (DCT) on at least a portion of the first image data to create a DCT matrix;
  
  determine a plurality of features from coefficients of a plurality of areas of the DCT matrix, wherein the features comprise a sign of a coefficient, a magnitude of the coefficient, a neighbor variance of the coefficient, and a differential between a magnitude of the coefficient and a reference average magnitude;
  
  encode the plurality of features of the coefficients into a first hash string; and
  
  determine a weighted distance between the first hash string and a second hash string associated with a second icon for use in determining whether the first icon is a suspicious icon that is potentially associated with malware.
- View Dependent Claims (8, 9, 10, 11, 12, 23, 24, 25, 26)
- - 8. The non-transitory machine-readable medium of claim 7, wherein the neighbor variance of the coefficient is determined according to an area of the portion of the DCT matrix.
  - 9. The non-transitory machine-readable medium of claim 7, wherein the instructions further include instructions to cause the device to:
    - unite a first area of the plurality of areas with a second area of the plurality of areas to create a temporary area;
      
      wherein the magnitude of the coefficient is determined based, at least in part, on statistical valued computed from the temporary area.
  - 10. The non-transitory machine-readable medium of claim 7, wherein the instructions further include instructions to cause the device to:
    - determine a plurality of DCT matrices for a plurality of reference images; and
      
      determine a mean value for each corresponding coefficient of the plurality of DCT matrices;
      
      wherein the reference average magnitude comprises the mean value.
  - 11. The non-transitory machine-readable medium of claim 7, wherein the weighted distance comprises a weighted hamming distance that is weighted according to a position of a feature encoded in the first hash string and the second hash string.
  - 12. The non-transitory machine-readable medium of claim 7, wherein the instructions further include instructions to cause the device to:
    - compose a constant image pattern with the first image data, wherein the DCT transformation is performed on the first image data composed with the constant image pattern.
  - 23. The non-transitory machine-readable medium of claim 7, wherein the icon comprises a first image domain and the weighted distance function utilizes adjustable weights, wherein the instructions further include instructions to tune the weights to fit a second image domain.
  - 24. The non-transitory machine-readable medium of claim 23, wherein the instructions further include instructions to adjust the adjustable weights to shift preferences to particular traits of an image in a given frequency range.
  - 25. The non-transitory machine-readable medium of claim 24, wherein the instructions further include instructions to, based upon a priority of said features, use a plurality of sets of adjustable weights.
  - 26. The non-transitory machine-readable medium of claim 25, wherein the instructions further include instructions to use a different set of said sets of adjustable weights for each of a plurality of perceptual similarity determination passes.

13. An apparatus comprising:
- one or more processors;
  
  a non-transitory machine-readable medium coupled to the one or more processors; and
  
  a perceptual image hash unit executable by the one or more processors and configured to;
  
  receive first image data, wherein said first image data comprises a first icon,perform a discrete cosine transformation (DCT) on at least a portion of the first image data to create a DCT matrix,determine a plurality of features from coefficients of a plurality of areas of the DCT matrix, wherein the features comprise a sign of a coefficient, a magnitude of the coefficient, a neighbor variance of the coefficient, and a differential between a magnitude of the coefficient and a reference average magnitude,encode the plurality of features of the coefficients into a first hash string, anda detection engine configured to determine a weighted distance between the first hash string and a second hash string associated with a second icon for use in determining whether the first icon is a suspicious icon that is potentially associated with malware.
- View Dependent Claims (14, 15, 16, 17, 18, 27, 28, 29, 30)
- - 14. The apparatus of claim 13, wherein the neighbor variance of the coefficient is determined according to an area of the portion of the DCT matrix.
  - 15. The apparatus of claim 13, wherein the perceptual image hash unit is further configured to:
    - unite a first area of the plurality of areas with a second area of the plurality of areas to create a temporary area;
      
      wherein the magnitude of the coefficient is determined based, at least in part, on statistical valued computed from the temporary area.
  - 16. The apparatus of claim 13, wherein the non-transitory machine-readable medium includes instructions to cause the apparatus to:
    - determine a plurality of DCT matrices for a plurality of reference images; and
      
      determine a mean value for each corresponding coefficient of the plurality of DCT matrices;
      
      wherein the reference average magnitude comprises the mean value.
  - 17. The apparatus of claim 13, wherein the weighted distance comprises a weighted hamming distance that is weighted according to a position of a feature encoded in the first hash string and the second hash string.
  - 18. The apparatus of claim 13, wherein perceptual image hash unit is further configured to:
    - compose a constant image pattern with the first image data, wherein the DCT transformation is performed on the first image data composed with the constant image pattern.
  - 27. The apparatus of claim 13, wherein the icon comprises a first image domain and the weighted distance function utilizes adjustable weights, wherein the the weights are tuned to fit a second image domain.
  - 28. The apparatus of claim 27, wherein the adjustable weights are adjusted to shift preferences to particular traits of an image in a given frequency range.
  - 29. The apparatus of claim 28, wherein, based upon a priority of said features, a plurality of sets of adjustable weights is used.
  - 30. The apparatus of claim 29, wherein a different set of said sets of adjustable weights is used for each of a plurality of perceptual similarity determination passes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Avast Software SRO (Gen Digital Inc.)
Original Assignee
Avast Software SRO (Gen Digital Inc.)
Inventors
Smarda, Martin, Sramek, Pavel
Primary Examiner(s)
Liu, Li

Application Number

US14/716,685
Publication Number

US 20150339829A1
Time in Patent Office

700 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/583   using metadata automaticall...

G06F 21/564   by virus signature recognition

G06F 21/565   by checking file integrity

G06F 3/0481   based on specific propertie...

G06F 3/04817   using icons graphical or vi...

H04N 1/32283   Hashing

H04N 19/126   Details of normalisation or...

H04N 19/625   using discrete cosine trans...

Tunable multi-part perceptual image hashing

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

16 Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Tunable multi-part perceptual image hashing

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

16 Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links