×

Systems and methods for inferring uniform resource locator (URL) normalization rules

  • US 7,680,785 B2
  • Filed: 03/25/2005
  • Issued: 03/16/2010
  • Est. Priority Date: 03/25/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method for determining a rule applicable to uniform resource locators (URLs) corresponding to a plurality of web resources, comprising:

  • analyzing the content of web resources from at least one web site;

    grouping web resources by content so that each group comprises all of the web resources from the at least one web site that have substantially identical content, wherein each group of substantially identical web resources is referred to as an equivalence class;

    analyzing URLs corresponding to all substantially identical web resources in an equivalence class to determine a per equivalence class URL rewrite rule applicable to the URLs;

    analyzing the per equivalence class URL rewrite rule compared to at least one other per equivalence class URL rewrite rule for at least one different equivalence class to determine a trans-equivalence class URL rewrite rule; and

    applying the trans-equivalence class URL rewrite rule to additional web resources from the at least one website to predict that different URLs reference substantially identical web resources, thereby avoiding a plurality of references to or downloads of substantially identical web resources.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×