×

Method and system for preventing web crawling detection

  • US 7,953,868 B2
  • Filed: 01/31/2007
  • Issued: 05/31/2011
  • Est. Priority Date: 01/31/2007
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of preventing a detection of web crawling, comprising:

  • receiving, by a randomizing HTTP proxy server including a CPU coupled to a web crawling module and included in a computer system, a first request from said web crawling module to scan a target website provided by a web server that attempts to detect web crawling by identifying identical source Internet Protocol (IP) addresses of multiple requests to scan said target website and determining the number of said multiple requests to scan said target website exceeds a predefined threshold level, wherein said multiple requests include said first request and a second request from said web crawling module to scan said target website;

    forwarding, by said randomizing HTTP proxy server, said first request to a first HTTP proxy computing unit of a plurality of HTTP proxy computing units coupled to said randomizing HTTP proxy server via a network;

    said first HTTP proxy computing unit selecting a first router from a plurality of routers based on a first routing table associating a destination IP address of said target website with said first router;

    said first router sending said first request to said web server by utilizing a first instance of a network address translation (NAT) algorithm that associates a first plurality of source IP addresses with corresponding HTTP proxy computing units of said plurality of HTTP proxy computing units, and that further associates a first source IP address of said first plurality of source IP addresses with said first HTTP proxy computing unit;

    randomly selecting, by said randomizing HTTP proxy server, a second HTTP proxy computing unit of said plurality of HTTP proxy computing units, said second HTTP proxy computing unit being different from said first HTTP proxy computing unit;

    receiving, by said randomizing HTTP proxy server, said second request from said web crawling module to scan said target website;

    forwarding, by said randomizing HTTP proxy server, said second request to said second HTTP proxy computing unit;

    said second HTTP proxy computing unit selecting a second router from said plurality of routers based on a second routing table associating said destination IP address of said target website with said second router;

    said second router sending said second request to said web server by utilizing a second instance of said NAT algorithm that associates a second plurality of source IP addresses with said corresponding HTTP proxy computing units, and that further associates a second source IP address of said second plurality of source IP addresses with said second HTTP proxy computing unit, wherein said second source IP address is different from said first source IP address based on said forwarding said first request to said first HTTP proxy computing unit, further based on said first HTTP proxy computing unit selecting said first router from said plurality of routers based on said first routing table associating said destination IP address with said first router and said first router sending said first request to said web server, still further based on said forwarding said second request to said randomly selected second HTTP proxy computing unit, and further yet based on said second HTTP proxy computing unit selecting said second router from said plurality of routers based on said second routing table associating said destination IP address with said second router and said second router sending said second request to said web server; and

    a central processing unit (CPU) of said computer system preventing said web server from detecting said web crawling by presenting said first request and said second request to said web server as originating from different sources based on said first source IP address being different from said second source IP address.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×