System and method for characterizing a web page using multiple anchor sets of web pages
First Claim
Patent Images
1. A method comprising:
- accessing, by one or more computing devices, a set of web pages within a context, wherein;
the set of web pages comprises a first subset of web pages with a first known characterization with respect to the context and a second subset of web pages with unknown characterization with respect to the context; and
the set of web pages are directly or indirectly linked to each other via one or more hyperlinks;
representing, by the one or more computing devices, the set of web pages using a graph comprising a set of nodes and a set of edges, wherein;
each node represents a web page; and
each edge links two nodes and represents a hyperlink that links two corresponding web pages represented by the two nodes;
generating, by the one or more computing devices, a first probability distribution over the set of nodes of the graph using a first algorithm, wherein the first probability distribution indicates a first measure of closeness, as defined by the first algorithm, among the set of web pages, comprising;
generating a first initial probability distribution over a first subset of nodes of the graph representing the first subset of web pages; and
propagating the first initial probability distribution to other nodes of the graph in a direction same as the one or more hyperlinks using the first algorithm;
generating, by the one or more computing devices, a second probability distribution over the set of nodes of the graph using the first algorithm, wherein the second probability distribution indicates a second measure of closeness, as defined by the first algorithm, among the set of web pages, comprising;
generating a second initial probability distribution over a second subset of nodes of the graph representing the second subset of web pages; and
propagating the second initial probability distribution to other nodes of the graph in a direction opposite to the one or more hyperlinks using the first algorithm;
determining, by the one or more computing devices, a characterization with respect to the context for a web page from the second subset of web pages based on the first probability distribution and the second probability distribution; and
outputting, by the one or more computing devices, an indication of the characterization of the web page from the second subset of web pages.
9 Assignments
0 Petitions
Accused Products
Abstract
A system and method are provided to accessing a set of web pages within a context. The set of web pages may be represented using a graph comprising a set of nodes and a set of edges. First and second probability distributions may be generated over a set of nodes of the graph using a first algorithm to indicate a measure of closeness among the set of web pages. A characterization may be determined with respect to context for a web page from a second subset of web pages based on the first and second probability distributions. An indication of the characterization of the web page from the second subset of web pages may be outputted.
32 Citations
18 Claims
-
1. A method comprising:
-
accessing, by one or more computing devices, a set of web pages within a context, wherein; the set of web pages comprises a first subset of web pages with a first known characterization with respect to the context and a second subset of web pages with unknown characterization with respect to the context; and the set of web pages are directly or indirectly linked to each other via one or more hyperlinks; representing, by the one or more computing devices, the set of web pages using a graph comprising a set of nodes and a set of edges, wherein; each node represents a web page; and each edge links two nodes and represents a hyperlink that links two corresponding web pages represented by the two nodes; generating, by the one or more computing devices, a first probability distribution over the set of nodes of the graph using a first algorithm, wherein the first probability distribution indicates a first measure of closeness, as defined by the first algorithm, among the set of web pages, comprising; generating a first initial probability distribution over a first subset of nodes of the graph representing the first subset of web pages; and propagating the first initial probability distribution to other nodes of the graph in a direction same as the one or more hyperlinks using the first algorithm; generating, by the one or more computing devices, a second probability distribution over the set of nodes of the graph using the first algorithm, wherein the second probability distribution indicates a second measure of closeness, as defined by the first algorithm, among the set of web pages, comprising; generating a second initial probability distribution over a second subset of nodes of the graph representing the second subset of web pages; and propagating the second initial probability distribution to other nodes of the graph in a direction opposite to the one or more hyperlinks using the first algorithm; determining, by the one or more computing devices, a characterization with respect to the context for a web page from the second subset of web pages based on the first probability distribution and the second probability distribution; and outputting, by the one or more computing devices, an indication of the characterization of the web page from the second subset of web pages. - View Dependent Claims (2, 3, 4)
-
-
5. A method comprising:
-
accessing, by one or more computing devices, a set of web pages within a context, wherein; the set of web pages comprises a first subset of web pages with a first known characterization with respect to the context, a second subset of web pages with a second known characterization with respect to the context, and a third subset of web pages with unknown characterization with respect to the context; and the set of web pages are directly or indirectly linked to each other via one or more hyperlinks; representing, by the one or more computing devices, the set of web pages using a graph comprising a set of nodes and a set of edges, wherein; each node represents a web page; and each edge links two nodes and represents a hyperlink that links two corresponding web pages represented by the two nodes; generating, by the one or more computing devices, a first probability distribution over the set of nodes of the graph using a first algorithm, wherein the first probability distribution indicates a first measure of closeness, as defined by the first algorithm, among the set of web pages, comprising; generating a first initial probability distribution over a second subset of nodes of the graph representing the second subset of web pages; and propagating the first initial probability distribution to other nodes of the graph in a direction same as the one or more hyperlinks using the first algorithm; and generating, by the one or more computing devices, a second probability distribution over the set of nodes of the graph using the first algorithm, wherein the second probability distribution indicates a second measure of closeness, as defined by the first algorithm, among the set of web pages, comprising; generating a second initial probability distribution over a third subset of nodes of the graph representing the third subset of web pages; and propagating the second initial probability distribution to other nodes of the graph in a direction opposite to the one or more hyperlinks using the first algorithm; and determining, by the one or more computing devices, a characterization with respect to the context for a web page from the third subset of web pages based on the first probability distribution and the second probability distribution; and outputting, by the one or more computing devices, an indication of the characterization of the web page from the third subset of web pages. - View Dependent Claims (6)
-
-
7. A system, comprising:
-
a memory comprising instructions executable by one or more processors; and one or more processors coupled to the memory and operable to execute the instructions, the one or more processors being operable when executing the instructions to; access a set of web pages within a context, wherein; the set of web pages comprises a first subset of web pages with a first known characterization with respect to the context and a second subset of web pages with unknown characterization with respect to the context; and the set of web pages are directly or indirectly linked to each other via one or more hyperlinks; represent the set of web pages using a graph comprising a set of nodes and a set of edges, wherein; each node represents a web page; and each edge links two nodes and represents a hyperlink that links two corresponding web pages represented by the two nodes; generate a first probability distribution over the set of nodes of the graph using a first algorithm, wherein the first probability distribution indicates a first measure of closeness, as defined by the first algorithm, among the set of web pages, comprising; generate a first initial probability distribution over a first subset of nodes of the graph representing the first subset of web pages; and propagate the first initial probability distribution to other nodes of the graph in a direction same as the one or more hyperlinks using the first algorithm; generate a second probability distribution over the set of nodes of the graph using the first algorithm, wherein the second probability distribution indicates a second measure of closeness, as defined by the first algorithm, among the set of web pages, comprising; generate a second initial probability distribution over a second subset of nodes of the graph representing the second subset of web pages; and propagate the second initial probability distribution to other nodes of the graph in a direction opposite to the one or more hyperlinks using the first algorithm; determine a characterization with respect to the context for a web page from the second subset of web pages based on the first probability distribution and the second probability distribution; and output an indication of the characterization of the web page from the second subset of web pages. - View Dependent Claims (8, 9, 10)
-
-
11. A system, comprising:
-
a memory comprising instructions executable by one or more processors; and one or more processors coupled to the memory and operable to execute the instructions, the one or more processors being operable when executing the instructions to; access a set of web pages within a context, wherein; the set of web pages comprises a first subset of web pages with a first known characterization with respect to the context, a second subset of web pages with a second known characterization with respect to the context, and a third subset of web pages with unknown characterization with respect to the context; and the set of web pages are directly or indirectly linked to each other via one or more hyperlinks; represent the set of web pages using a graph comprising a set of nodes and a set of edges, wherein; each node represents a web page; and each edge links two nodes and represents a hyperlink that links two corresponding web pages represented by the two nodes; generate a first probability distribution over the set of nodes of the graph using a first algorithm, wherein the first probability distribution indicates a first measure of closeness, as defined by the first algorithm, among the set of web pages, comprising; generate a first initial probability distribution over a second subset of nodes of the graph representing the second subset of web pages; and propagate the first initial probability distribution to other nodes of the graph in a direction same as the one or more hyperlinks using the first algorithm; and generate a second probability distribution over the set of nodes of the graph using the first algorithm, wherein the second probability distribution indicates a second measure of closeness, as defined by the first algorithm, among the set of web pages, comprising; generate a second initial probability distribution over a third subset of nodes of the graph representing the third subset of web pages; and propagate the second initial probability distribution to other nodes of the graph in a direction opposite to the one or more hyperlinks using the first algorithm; and determine a characterization with respect to the context for a web page from the third subset of web pages based on the first probability distribution and the second probability distribution; and output an indication of the characterization of the web page from the third subset of web pages. - View Dependent Claims (12)
-
-
13. One or more computer-readable non-transitory storage media embodying software operable when executed by one or more computer systems to:
-
access a set of web pages within a context, wherein; the set of web pages comprises a first subset of web pages with a first known characterization with respect to the context and a second subset of web pages with unknown characterization with respect to the context; and the set of web pages are directly or indirectly linked to each other via one or more hyperlinks; represent the set of web pages using a graph comprising a set of nodes and a set of edges, wherein; each node represents a web page; and each edge links two nodes and represents a hyperlink that links two corresponding web pages represented by the two nodes; generate a first probability distribution over the set of nodes of the graph using a first algorithm, wherein the first probability distribution indicates a first measure of closeness, as defined by the first algorithm, among the set of web pages, comprising; generate a first initial probability distribution over a first subset of nodes of the graph representing the first subset of web pages; and propagate the first initial probability distribution to other nodes of the graph in a direction same as the one or more hyperlinks using the first algorithm; generate a second probability distribution over the set of nodes of the graph using the first algorithm, wherein the second probability distribution indicates a second measure of closeness, as defined by the first algorithm, among the set of web pages, comprising; generate a second initial probability distribution over a second subset of nodes of the graph representing the second subset of web pages; and propagate the second initial probability distribution to other nodes of the graph in a direction opposite to the one or more hyperlinks using the first algorithm; determine a characterization with respect to the context for a web page from the second subset of web pages based on the first probability distribution and the second probability distribution; and output an indication of the characterization of the web page from the second subset of web pages. - View Dependent Claims (14, 15, 16)
-
-
17. One or more computer-readable non-transitory storage media embodying software operable when executed by one or more computer systems to:
-
access a set of web pages within a context, wherein; the set of web pages comprises a first subset of web pages with a first known characterization with respect to the context, a second subset of web pages with a second known characterization with respect to the context, and a third subset of web pages with unknown characterization with respect to the context; and the set of web pages are directly or indirectly linked to each other via one or more hyperlinks; represent the set of web pages using a graph comprising a set of nodes and a set of edges, wherein; each node represents a web page; and each edge links two nodes and represents a hyperlink that links two corresponding web pages represented by the two nodes; generate a first probability distribution over the set of nodes of the graph using a first algorithm, wherein the first probability distribution indicates a first measure of closeness, as defined by the first algorithm, among the set of web pages, comprising; generate a first initial probability distribution over a second subset of nodes of the graph representing the second subset of web pages; and propagate the first initial probability distribution to other nodes of the graph in a direction same as the one or more hyperlinks using the first algorithm and generate a second probability distribution over the set of nodes of the graph using the first algorithm, wherein the second probability distribution indicates a second measure of closeness, as defined by the first algorithm, among the set of web pages, comprising; generate a second initial probability distribution over a third subset of nodes of the graph representing the third subset of web pages; and propagate the second initial probability distribution to other nodes of the graph in a direction opposite to the one or more hyperlinks using the first algorithm; and determine a characterization with respect to the context for a web page from the third subset of web pages based on the first probability distribution and the second probability distribution; and output an indication of the characterization of the web page from the third subset of web pages. - View Dependent Claims (18)
-
Specification