×

DOCUMENT OBJECT MODEL (DOM) BASED PAGE UNIQUENESS DETECTION

  • US 20120005211A1
  • Filed: 06/23/2011
  • Published: 01/05/2012
  • Est. Priority Date: 06/30/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • receiving a hypertext markup language (HTML) page at a computer;

    identifying HTML page elements in response to the receiving, the HTML page elements comprising parent nodes, the parent nodes comprising child nodes;

    processing each of the HTML page elements, the processing comprising;

    grouping the child nodes by parent node into a group of child nodes;

    detecting patterns in the group of child nodes in response to the grouping;

    reducing the group of child nodes to text strings in response to the detecting; and

    storing the text strings as text values in the parent nodes; and

    generating a unique identifier (ID) of the HTML page in response to the processing.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×