×

System and method for characterizing a web page using multiple anchor sets of web pages

  • US 7,912,831 B2
  • Filed: 10/03/2006
  • Issued: 03/22/2011
  • Est. Priority Date: 10/03/2006
  • Status: Expired due to Fees
First Claim
Patent Images

1. A method comprising:

  • accessing, by one or more computing devices, a set of web pages within a context, wherein;

    the set of web pages comprises a first subset of web pages with a first known characterization with respect to the context and a second subset of web pages with unknown characterization with respect to the context; and

    the set of web pages are directly or indirectly linked to each other via one or more hyperlinks;

    representing, by the one or more computing devices, the set of web pages using a graph comprising a set of nodes and a set of edges, wherein;

    each node represents a web page; and

    each edge links two nodes and represents a hyperlink that links two corresponding web pages represented by the two nodes;

    generating, by the one or more computing devices, a first probability distribution over the set of nodes of the graph using a first algorithm, wherein the first probability distribution indicates a first measure of closeness, as defined by the first algorithm, among the set of web pages, comprising;

    generating a first initial probability distribution over a first subset of nodes of the graph representing the first subset of web pages; and

    propagating the first initial probability distribution to other nodes of the graph in a direction same as the one or more hyperlinks using the first algorithm;

    generating, by the one or more computing devices, a second probability distribution over the set of nodes of the graph using the first algorithm, wherein the second probability distribution indicates a second measure of closeness, as defined by the first algorithm, among the set of web pages, comprising;

    generating a second initial probability distribution over a second subset of nodes of the graph representing the second subset of web pages; and

    propagating the second initial probability distribution to other nodes of the graph in a direction opposite to the one or more hyperlinks using the first algorithm;

    determining, by the one or more computing devices, a characterization with respect to the context for a web page from the second subset of web pages based on the first probability distribution and the second probability distribution; and

    outputting, by the one or more computing devices, an indication of the characterization of the web page from the second subset of web pages.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×