×

Method and system for efficient and exhaustive URL categorization

  • US 8,935,390 B2
  • Filed: 12/08/2010
  • Issued: 01/13/2015
  • Est. Priority Date: 12/11/2009
  • Status: Active Grant
First Claim
Patent Images

1. A method for categorizing URLs (Uniform Resource Locators) of web pages accessed by users over an IP (Internet Protocol) based data network, the method comprising:

  • collecting by means of at least one monitoring probe real time data from IP data traffic occurring on the IP based data network;

    extracting from said collected real time data parameters related to a web page, said parameters including an URL of the web page;

    processing said URL with a rule based categorization engine, to associate a matching category to the URL of said web page, the matching category being inferred from a pre-defined list of categories;

    when no matching category is inferred, transferring said URL of said web page to a semantic based categorization engine; and

    processing said transferred URL by the semantic based categorization engine, said processing consisting in;

    extracting textual content from content of said web page associated to said URL,performing a semantic analysis of said textual content, andassociating a matching category to the transferred URL of the web page based on the semantic analysis of the textual content extracted from the web page, the matching category being inferred from a pre-defined list of categories,wherein the URLs for which no matching category has been inferred by the rule based categorization engine over a determined period of time are memorized, wherein only the N URLs having the highest occurrence for which no matching category has been inferred by the rule based categorization engine over the determined period of time are transferred to the semantic based categorization engine, and wherein N is a pre-defined number of URLs.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×