System and method for exclusion of irrelevant data from a DOM equivalence
First Claim
1. A computer-implemented process for computing excluded data, the computer-implemented process comprising:
- identifying a web page of interest to form an identified page;
loading the identified page a first time to form a first load;
responsive to a determination that a delta has not been computed for the identified web page, loading the identified page a second time to form a second load, wherein the second load is based, at least in part, upon the use of a proxy;
determining whether portions of the first load differ from portions of the second load;
responsive to a determination portions of the first load differ from portions of the second load, identifying the portions that differ to form a delta;
storing the delta to form a stored delta;
excluding the stored delta from a document object model associated with the identified page to form a modified document object model;
excluding the stored delta from a document object model comparison process, wherein the document object model comparison process from which the stored delta is excluded is a document object model equivalence function, wherein the excluded stored delta includes one or more page sections ignored by crawlers; and
if the identified page is part of a rich Internet application, adding the identified page to a rich Internet application model.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-implemented process, computer program product, and apparatus for computing excluded data. A web page of interest is identified to form an identified page. The identified page is loaded a first time to form a first load, and responsive to a determination that a delta has not been computed for the identified web page, the identified page is loaded a second time to form a second load. Whether portions of the first load differ from portions of the second load is determined. Responsive to a determination portions of the first load differ from portions of the second load, the portions that differ to form a delta are identified. The delta is stored to form stored delta and the stored delta is excluded from a document object model associated with the identified page to form a modified document object model.
-
Citations
20 Claims
-
1. A computer-implemented process for computing excluded data, the computer-implemented process comprising:
-
identifying a web page of interest to form an identified page; loading the identified page a first time to form a first load; responsive to a determination that a delta has not been computed for the identified web page, loading the identified page a second time to form a second load, wherein the second load is based, at least in part, upon the use of a proxy; determining whether portions of the first load differ from portions of the second load; responsive to a determination portions of the first load differ from portions of the second load, identifying the portions that differ to form a delta; storing the delta to form a stored delta; excluding the stored delta from a document object model associated with the identified page to form a modified document object model; excluding the stored delta from a document object model comparison process, wherein the document object model comparison process from which the stored delta is excluded is a document object model equivalence function, wherein the excluded stored delta includes one or more page sections ignored by crawlers; and if the identified page is part of a rich Internet application, adding the identified page to a rich Internet application model. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer program product residing on a non-transitory computer readable medium having a plurality of instructions stored thereon which, when executed by a processor, cause the processor to perform operations comprising:
-
identifying a web page of interest to form an identified page; loading the identified page a first time to form a first load, wherein the second load is based, at least in part, upon the use of a proxy; responsive to a determination that a delta has not been computed for the identified web page, loading the identified page a second time to form a second load; determining whether portions of the first load differ from portions of the second load;
responsive to a determination portions of the first load differ from portions of the second load, identifying the portions that differ to form a delta;storing the delta to form a stored delta; excluding the delta from a document object model associated with the identified page to form a modified document object model; excluding the stored delta from a document object model comparison process, wherein the document object model comparison process from which the stored delta is excluded is a document object model equivalence function, wherein the excluded stored delta includes one or more page sections ignored by crawlers; and if the identified page is part of a rich Internet application, adding the identified page to a rich Internet application model. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. An apparatus for computing excluded data, the apparatus comprising:
-
a communications fabric; a memory connected to the communications fabric, wherein the memory contains computer executable program code; a communications unit connected to the communications fabric; an input/output unit connected to the communications fabric; a display connected to the communications fabric; and a processor unit connected to the communications fabric, wherein the processor unit executes the computer executable program code to direct the apparatus to; identify a web page of interest to form an identified page; load the identified page a first time to form a first load; responsive to a determination that a delta has not been computed for the identified web page, load the identified page a second time to form a second load; wherein the second load is based, at least in part, upon the use of a proxy; determine whether portions of the first load differ from portions of the second load; responsive to a determination portions of the first load differ from portions of the second load, identify the portions that differ to form a delta; store the delta to form a stored delta; exclude the stored delta from a document object model associated with the identified page to form a modified document object model; exclude the stored delta from a document object model comparison process, wherein the document object model comparison process from which the stored delta is excluded is a document object model equivalence function, wherein the excluded stored delta includes one or more page sections ignored by crawlers; and if the identified page is part of a rich Internet application, add the identified page to a rich Internet application model. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification