Method And System For Characterising A Web Site By Sampling
First Claim
Patent Images
1. A method of characterising a web site by sampling, the method comprising, the repeated steps of:
- accessing a URL;
receiving a web page;
analysing the URL and received webpage and recording characteristics thereof;
identifying links within the received web page;
grouping links within the received web-page based on proximity; and
selecting one of the selected links for subsequent access based on the grouping.
8 Assignments
0 Petitions
Accused Products
Abstract
A method of characterising a web site by sampling, the method comprising, the repeated steps of: accessing a URL; receiving a web page; analysing the URL and received webpage and recording characteristics thereof; identifying links within the received web page; grouping links within the received web-page based on proximity; and selecting one of the selected links for subsequent access based on the grouping. The method can be applied in a web application assessment tool.
-
Citations
15 Claims
-
1. A method of characterising a web site by sampling, the method comprising, the repeated steps of:
-
accessing a URL; receiving a web page; analysing the URL and received webpage and recording characteristics thereof; identifying links within the received web page; grouping links within the received web-page based on proximity; and selecting one of the selected links for subsequent access based on the grouping. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for characterising a web site by sampling, the system comprising a crawl function for repeatedly:
-
accessing a URL; receiving a web page; analysing the URL and received webpage and recording characteristics thereof; identifying links within the received web page; grouping links within the received web-page based on proximity; and selecting one of the selected links for subsequent access based on the grouping. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A web application assessment tool comprising a precrawl function for repeatedly:
-
accessing a URL within a target website; receiving a web page; analysing the URL and received webpage and recording characteristics thereof; identifying links within the received web page; grouping links within the received web-page based on the number of characters in the web page content that are not part of tag data between the end of a first tag and the beginning of a second tag; scoring each link within the received web page so that a stored score variable corresponds, at least in part, to the frequency of occurrence of the link in the received web pages; selecting one of the selected links for subsequent access based on the grouping and on the stored score variable; excluding links for selection based on the presence of keywords in data associated with the link, wherein the keywords are arranged in a white list comprising a plurality of words that indicate that the link may be of significance to the overall structure of the site and a black list comprising a plurality words that indicate the link is likely not relevant to the structure of the site;
a settings function, wherein the recorded characteristics are used to adjust the settings;and a crawl and attack function for vulnerability scanning the target website using the adjusted settings. - View Dependent Claims (15)
-
Specification