System and method for ranking hyperlinked documents based on a stochastic backoff processes
First Claim
1. A method for searching for one or more hyperlinked documents that match a search query, comprising:
- generating a directed graph from a crawl through one or more documents, the directed graph having one or more nodes representing one or more documents traversed during the crawl and one or more directed edges wherein each directed edge represents a hyperlink from a first document to a second document;
determining a weight for each document in the directed graph based on a stochastic backoff process that includes;
determining a forward probability 1-p(A) for each possible forward step from a node A that is taken by following a directed edge, wherein the forward step includes proceeding from node A to a node B where there is a directed edge from node A to node B;
determining a backward probability p(A) for each possible backward step from the node A, wherein the backward step is taken by using a browser back button, the backward step including proceeding from the node A to a previously visited node from which a forward step was taken into node A; and
determining a pair probability p(A,B) for a directed graph from the node A to the node B such that a sum of all p(A,B) is 1;
retrieving one or more documents that match a search query; and
generating a ranking of the documents based on the determined weight such that relative ranks of the documents depend on the forward probability 1-p(A), the backward probability p(A), and the pair probability p(A,B) associated with each document.
4 Assignments
0 Petitions
Accused Products
Abstract
A system and method for ranking hyperlinked documents, such as web pages, is provided wherein a stochastic backoff process is used to rank those hyperlinked documents. In more detail, the stochastic process is derived from a random walk through the pages of the web. First, a directed graph may be generated from a crawl wherein the nodes are documents in the crawl and a directed edge from one node A to another node B indicates the presence of a hyperlink from the corresponding document docA to document docB. Using a stochastic backoff process on this graph, a weight between 0 and 1 is assigned to each document so that the documents may be ranked according to the weights.
55 Citations
32 Claims
-
1. A method for searching for one or more hyperlinked documents that match a search query, comprising:
-
generating a directed graph from a crawl through one or more documents, the directed graph having one or more nodes representing one or more documents traversed during the crawl and one or more directed edges wherein each directed edge represents a hyperlink from a first document to a second document;
determining a weight for each document in the directed graph based on a stochastic backoff process that includes;
determining a forward probability 1-p(A) for each possible forward step from a node A that is taken by following a directed edge, wherein the forward step includes proceeding from node A to a node B where there is a directed edge from node A to node B;
determining a backward probability p(A) for each possible backward step from the node A, wherein the backward step is taken by using a browser back button, the backward step including proceeding from the node A to a previously visited node from which a forward step was taken into node A; and
determining a pair probability p(A,B) for a directed graph from the node A to the node B such that a sum of all p(A,B) is 1;
retrieving one or more documents that match a search query; and
generating a ranking of the documents based on the determined weight such that relative ranks of the documents depend on the forward probability 1-p(A), the backward probability p(A), and the pair probability p(A,B) associated with each document. - View Dependent Claims (2, 3, 13, 14, 15, 16, 29, 30)
making each forward and backward probability a constant; and
setting each pair probability to 1/d wherein d is a number of directed edges from node A to another node.
-
-
15. The method of claim 1 further comprising:
-
counting a number of times a specific node is accessed from node A by taking a backward step according to a directed graph; and
tracking a fraction of forward steps in which a node A is followed by a forward step to a node B.
-
-
16. The method of claim 1 further comprising determining the weight of node A based on the values p(A) at various nodes and p(A,B) for various pairs.
-
29. The method of claim 1, wherein determining the backward probability p(A) comprises collecting usage statistics regarding visits to the node A that are followed by a click on the browser back button such that the usage statistics affect the ranking of the documents.
-
30. The method of claim 29, wherein determining the pair probability p(A,B) comprises collecting usage statistics regarding visit to the node A that are followed by a visit to the node B by using a link without using the browser back button.
-
4. A system for searching for one or more hyperlinked documents that match a search query, comprising:
-
a directed graph from a crawl through one or more documents, the directed graph having one or more nodes representing one or more document traversed during the crawl and one or more directed edges wherein each directed edge represent a hyperlink from a first document to a second document;
means for determining a weight of each document in the directed graph based on a stochastic backoff process that includes;
means for determining a forward probability 1-p(A) for each possible forward step from a node A that is taken by following a directed edge, wherein the forward step includes proceeding from node A to a node B where there is a directed edge from node A to node B;
means for determining a backward probability p(A) for each possible backward step from the node A, wherein the backward step is taken by using a browser back button, the backward step including proceeding from the node A to a previously visited node from which a forward step was taken into node A;
mean for retrieving one or more documents that match a search query; and
means for generating a ranking of the documents based on the determined weight such that relative ranks of the document depend on the forward probability 1-p(A), the backward probability p(A), and the pair probability p(A,B) associated with each document. - View Dependent Claims (5, 6, 17, 18, 19, 20, 31, 32)
means for making each forward and backward probability a constant; and
means for setting each pair probability to 1/d wherein d is a number of directed edges from node A to another node.
-
-
19. The system of claim 4 further comprising:
-
means for counting a number of times a specific node is accessed from node A by taking a backward step according to a directed graph; and
means for tracking a fraction of forward steps in which node A is followed by a forward step to a node B.
-
-
20. The system of claim 4 further comprising means for determining the weight of node A based on the values p(A) at various nodes and p(A,B) for various pairs.
-
31. The system of claim 4, wherein the means for determining the backward probability p(A) comprises means for collecting usage statistics regarding visits to the node A that are followed by a click on the browser back button such that the usage statistics affect the ranking of the documents.
-
32. The system of claim 31, wherein determining the pair probability p(A,B) comprises collecting usage statistics regarding visit to the node A that are followed by a visit to the node B by using a link without using the browser back button.
-
7. A method for ranking of one or more hyperlinked documents, comprising:
-
generating a directed graph from a crawl through one or more documents, the directed graph having one or more nodes representing one or more documents traversed during the crawl and one or more directed edges wherein each directed edge represents a hyperlink from a first document to a second document;
determining a weight of each document based on a stochastic process that includes;
determining a backward probability p(A) for each possible backward step from a node A that is taken by following a directed edge, wherein the forward step includes proceeding from node A to a node B where there is a directed edge from node A to node B;
determining a backward probability p(A) for each possible backward step from the node A, wherein the backward step is taken by using a browser back button, the backward step including proceeding from the node A to a previously visited node from which a forward step was taken into node A;
determining a pair probability p(A,B) for a directed graph from the node A to a node B such that a sum of all p(A, B) is 1, and generating a ranking of the documents based on the determined weight such that relative ranks of the documents depend on the forward probability 1-p(A), the backward probability p(A), and the pair probability p(A, B) associated with each document. - View Dependent Claims (8, 9, 21, 22, 23, 24)
making each forward and backward probability a constant; and
setting each pair probability to 1/d wherein d is a number of directed edges from node A to another node.
-
-
23. The method of claim 7 further comprising:
-
counting a number of times a specific node is accessed from node A by taking a backward step according to a directed graph; and
tracking a fraction of forward steps in which node A is followed by a forward step to a node B.
-
-
24. The method of claim 7 further comprising determining the weight of node A based on the values p(A) at various nodes and p(A,B) for various pairs.
-
10. A system for ranking one or more hyperlink documents, comprising:
-
a directed graph generated from a crawl through one or more documents, the directed graph having one or more nodes representing one or more documents traversed during the crawl and one or more directed edges wherein each directed edge representing a hyperlink from a first document to a second document;
means for determining a forward probability 1-p(A) for each possible forward step from a node A that is taken by following a directed edge, wherein the forward step includes proceeding from node A to a node B where there is a directed edge from node A to node B;
mean for determining a backward probability p(A) for each possible backward step from the node A, wherein the backward step is taken by using a browser back button, the backward step including proceeding from the node A to a previously visited node from which a forward step was taken into node A;
means for determining a pair probability p(A, B) for a directed graph from the node A to node B such that a sum of all p(A, B) is 1; and
means for generating a ranking of the documents based on determined weight such that relative ranks of the documents depend on the forward probability 1-p(A), the backward probability p(A), and the pair probability p(A, B) associated with each document. - View Dependent Claims (11, 12, 25, 26, 27, 28)
means for making each forward and backward probability a constant; and
means for setting each pair probability to 1/d wherein d is a number of directed edges from node A to another node.
-
-
27. The system of claim 10 further comprising:
-
means for counting a number of times a specific node is accessed from node A by taking a backward step according to a directed graph; and
means for tracking a fraction of forward steps in which node A is followed by a forward step to a node B.
-
-
28. The system of claim 10 further comprising means for determining the weight of node A based on the values p(A) at various nodes and p(A,B) for various pairs.
Specification