×

Using hash signatures of DOM objects to identify website similarity

  • US 9,386,037 B1
  • Filed: 11/11/2015
  • Issued: 07/05/2016
  • Est. Priority Date: 09/16/2015
  • Status: Active Grant
First Claim
Patent Images

1. A method for determining a similarity between two websites, the method comprising, at a computer system:

  • receiving website information from a web server corresponding to a website;

    rendering a document object model (DOM) object of the website using the website information;

    separating content within the DOM object into a plurality of data portions, each of the plurality of data portions having a fixed length;

    generating, by a hardware processor of the computer system, a hash signature of the DOM object by;

    applying a predetermined number of hashing functions to each of the plurality of data portions, wherein the predetermined number of hashing functions are generated using a common seed value, and wherein applying the predetermined number of hashing functions results in a predetermined number of values for each of the plurality of data portions; and

    selecting, using a selection policy, a predetermined number of hashed data portions of the plurality of hashed data portions, wherein the predetermined number of hashed data portions are selected to create a hash signature of the DOM object;

    comparing the hash signature of the DOM object to a known hash signature of a DOM object associated with a known website having a first classification, wherein comparing the hash signature of the DOM object to the known hash signature of the DOM object associated with the known website includes comparing each of the plurality of hashed data portions to a plurality of known hashed data portions of the known hash signature;

    calculating a similarity measurement between the hash signature of the DOM object and the known hash signature of the DOM object associated with the known website;

    comparing the similarity measurement to a threshold; and

    determining that the website has the first classification based on the similarity measurement exceeding the threshold.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×