×

TECHNIQUES FOR TOKENIZING URLS

  • US 20090083266A1
  • Filed: 11/06/2007
  • Published: 03/26/2009
  • Est. Priority Date: 09/20/2007
  • Status: Abandoned Application
First Claim
Patent Images

1. A method for tokenizing URLs, comprising:

  • tokenizing, based upon generic delimiters, URLs of each of a plurality of documents of a website into a plurality of components;

    for each particular component of the plurality of components, locating website-specific delimiters in the particular component;

    calculating a delimiter support threshold for each particular website-specific delimiter of located site-specific delimiters;

    determining whether delimiter support for each particular website-specific delimiter is greater than a specified delimiter support threshold;

    in response to determining that the site specific delimiter support for the particular website-specific delimiter is greater than the specified delimiter support threshold, tokenizing the particular component based upon the particular website-specific delimiter;

    for each particular token of the particular component, calculating a token support threshold for the particular token;

    determining whether token support for the particular token is greater than a specified token support threshold; and

    in response to determining that the token support for the component token is greater than the specified token support threshold, using the particular token to generate a description of the website.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×