×

Discrete wavelet transform method for document structure similarity

  • US 9,405,750 B2
  • Filed: 10/31/2011
  • Issued: 08/02/2016
  • Est. Priority Date: 10/31/2011
  • Status: Active Grant
First Claim
Patent Images

1. A method for determining document structure similarity, comprising:

  • segmenting, by a computing device, path sequences of Document Object Model (DOM) trees from a number of web pages into B components;

    determining path signals corresponding to the path sequences based on a count of the occurrences of particular paths in the Bth component, wherein determining path signals comprises weighting the path signals based on path sequence characteristics of a DOM tree;

    transforming unique path signals into discrete wavelet signals;

    analyzing the discrete wavelet signals at multiple DOM tree resolution level, wherein analyzing the discrete wavelet signals comprises;

    computing a distance value for every common signal path of two DOM trees; and

    summing the distance values as a final tree distance for each of the two DOM trees; and

    outputting a document structure similarity decision based on the analyses of the discrete wavelet signals.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×