Document object model (DOM) based page uniqueness detection
First Claim
1. A non-transitory computer system comprising:
- a host system in communication with at least one client system over a network;
a page based unique ID generation application for execution on the host system, the page based for unique ID generation application including logic for implementing a method comprising;
receiving a hypertext markup language (HTML) page at a computer;
identifying HTML page elements in response to the receiving, the HTML page elements comprising parent nodes, the parent nodes comprising child nodes;
processing each of the HTML page elements, the processing comprising;
grouping the child nodes by parent node into a group of child nodes;
detecting patterns in the group of child nodes in response to the grouping;
reducing the group of child nodes to text strings in response to the detecting; and
storing the text strings as text values in the parent nodes; and
generating a unique identifier (ID) of the HTML page in response to the processing;
wherein the HTML page is a Web 2.0 page, the Web 2.0 page comprising content, the content being generated dynamically and filtering HTML page elements in response to the identifying, the filtering removing the child nodes and the parent nodes that meet filter criteria, the filter criteria comprising;
extensible markup language path language instructions;
regular expression (regex) instructions; and
a list of html nodes.
5 Assignments
0 Petitions
Accused Products
Abstract
DOM based unique ID generation, including receiving a hypertext markup language (HTML) page at a computer, and identifying HTML page elements in response to the receiving, the HTML page elements comprising parent nodes, the parent nodes comprising child nodes. The method further comprising processing each of the HTML page elements, the processing comprising: grouping the child nodes by parent node into a group of child nodes, detecting patterns in the group of child nodes in response to the grouping, reducing the group of child nodes to text strings in response to the detecting, storing the text strings as text values in the parent nodes, and generating a unique identifier (ID) of the HTML page in response to the processing.
30 Citations
13 Claims
-
1. A non-transitory computer system comprising:
-
a host system in communication with at least one client system over a network;
a page based unique ID generation application for execution on the host system, the page based for unique ID generation application including logic for implementing a method comprising;receiving a hypertext markup language (HTML) page at a computer; identifying HTML page elements in response to the receiving, the HTML page elements comprising parent nodes, the parent nodes comprising child nodes; processing each of the HTML page elements, the processing comprising; grouping the child nodes by parent node into a group of child nodes;
detecting patterns in the group of child nodes in response to the grouping;
reducing the group of child nodes to text strings in response to the detecting; and
storing the text strings as text values in the parent nodes; and
generating a unique identifier (ID) of the HTML page in response to the processing;wherein the HTML page is a Web 2.0 page, the Web 2.0 page comprising content, the content being generated dynamically and filtering HTML page elements in response to the identifying, the filtering removing the child nodes and the parent nodes that meet filter criteria, the filter criteria comprising; extensible markup language path language instructions; regular expression (regex) instructions; and a list of html nodes. - View Dependent Claims (2, 3, 4)
-
-
5. A computer program product comprising a non-transitory storage medium storing instructions, the computer program product implementing a method, the method comprising:
-
receiving a hypertext markup language (HTML) page at a computer; identifying HTML page elements in response to the receiving, the HTML page elements comprising parent nodes, the parent nodes comprising child nodes; processing each of the HTML page elements, the processing comprising; grouping the child nodes by parent node into a group of child nodes; detecting patterns in the group of child nodes in response to the grouping; reducing the group of child nodes to text strings in response to the detecting; and storing the text strings as text values in the parent nodes; and generating a unique identifier (ID) of the HTML page in response to the processing; wherein the HTML page is a Web 2.0 page, the Web 2.0 page comprising content, the content being generated dynamically and filtering HTML page elements in response to the identifying, the filtering removing the child nodes and the parent nodes that meet filter criteria, the filter criteria comprising;
extensible markup language path language instructions;regular expression (regex) instructions; and a list of html nodes. - View Dependent Claims (6, 7, 8)
-
-
9. An apparatus comprising:
-
web indexing application logic communicating with a computer processor to receive a hypertext markup language (HTML) page at a computer and identify HTML page elements, wherein the HTML page elements comprising parent nodes, the parent nodes comprising child nodes, and process each of the HTML page elements, wherein wherein the computer processor is configured for; grouping the child nodes by parent node into a group of child nodes; detecting patterns in the group of child nodes in response to the grouping; reducing the group of child nodes to text strings in response to the detecting; and storing the text strings as text values in the parent nodes; and
the web indexing application logic further configured to generate a unique identifier (ID) of the HTML page in response to the processing;wherein the HTML page is a Web 2.0 page, the Web 2.0 page comprising content, the content being generated dynamically and wherein the web indexing application logic is further configured to filter HTML page elements in response to the identifying, the filtering removing the child nodes and the parent nodes that meet filter criteria, the filter criteria comprising; extensible markup language path language instructions; regular expression (reqex) instructions; and
a list of html nodes. - View Dependent Claims (10, 11, 12)
-
-
13. A computing system comprising:
-
at least one processor; at least one memory; a network transceiver; a bus communicatively linking said at least one processor, memory, and network transceiver to each other, wherein said at least one memory stores an executable instructions, which the at least one processor executes, wherein said network transceiver of the computing system comprising hardware is operable to receive a hypertext markup language (HTML) page from a server over a network, wherein the computing system is configured to function as a client in a client server arrangement with the server, wherein the processor executing the instructions is operable to identify HTML page elements in response to the receiving of the HTML page, the HTML page elements comprising parent nodes, the parent nodes comprising child nodes; wherein the processor executing the instructions is operable to process each of the HTML page elements, the processing comprising; grouping the child nodes by parent node into a group of child nodes; detecting patterns in the group of child nodes in response to the grouping; reducing the group of child nodes to text strings in response to the detecting; and storing the text strings as text values in the parent nodes; and wherein the processor executing the instructions is operable to generate a unique identifier (ID) of the HTML page in response to the processing; wherein the HTML page is a Web 2.0 page, the Web 2.0 page comprising content, the content being generated dynamically and filtering HTML page elements in response to the identifying, the filtering removing the child nodes and the parent nodes that meet filter criteria, the filter criteria comprising;
extensible markup language path language instructions;regular expression (regex) instructions; and a list of html nodes.
-
Specification