METHOD FOR TRACKING SYNTACTIC PROPERTIES OF A URL
First Claim
Patent Images
1. A method for tracking syntactic properties of a URL, said method comprising:
- using a web crawler to discover a plurality of URLs;
analyzing each of said plurality of URLs to identify one of a plurality of classes to which each of said plurality of URLs belong;
determining for each of said plurality of classes a count of distinct prefixes; and
performing an action based on the value of said count of distinct prefixes.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of classifying URLs by analyzing each URL discovered by a crawler and matching against a set of words corresponding to each class such as pornography, archive, obituary, business news, archive, politics, terrorism, etc. A count of the prefix of the URL to the class is updated and an action is performed with respect to electronic documents on the computer system based on the count. The action performed could be blocking the computer system from the crawling, or adjusting the frequency with which the computer system should be crawled.
18 Citations
13 Claims
-
1. A method for tracking syntactic properties of a URL, said method comprising:
-
using a web crawler to discover a plurality of URLs; analyzing each of said plurality of URLs to identify one of a plurality of classes to which each of said plurality of URLs belong; determining for each of said plurality of classes a count of distinct prefixes; and performing an action based on the value of said count of distinct prefixes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
Specification