System and method employing random walks for mining web page associations and usage to optimize user-oriented web page refresh and pre-fetch scheduling
First Claim
1. On a computer network having a set of Web pages V and a set of links E between those Web pages represented as an undirected neighborhood graph GN(VN,EN), the computer network further including seed Web pages va and vb in V, a method executable on the computer network for estimating associations between va and vb and other Web pages viε
- V, the method comprising the steps of;
constructing a directed random walk graph by creating a new vi′
in V for each viε
VN, and creating two directed edges e′
2×
k=<
vi′
,vj′
) and e′
2×
k−
1=<
vj′
,vi′
>
in E for each ek=<
vi,vj>
ε
EN wherein both vi and Vj are in VN, computing a penalty value penalty(vi′
) for all vertices vi′
ε
V, and constructing a |V|×
|V| transition matrix T, where T[j,i] represents a transition value for each directed edge in E denoting a likelihood of moving to vertex vi from vertex vj; and
calculating a steady state distribution convergence vector t of T, wherein for each viε
V, t[i] represents the association between the seed Web pages and vi.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for estimating an association between the media objects and the seed Web page accessed by a user. The method is employed in the context of a Web space on a network having Web pages and links between those Web pages modeled as a directed graph. Each Web page comprises a set of media objects and a page author. For each object a size, a user preference and a page author preference are determined. The network has an available pre-fetch bandwidth. The method calculates a weight for each Web object by applying preference rules defined by and user preference and page author preference to the contents of the set of media objects. Next, a random walk graph is generated, and object gains are calculated by finding a steady state distribution of the random walk graph. The object gain represents an association between the object and the seed Web page.
168 Citations
42 Claims
-
1. On a computer network having a set of Web pages V and a set of links E between those Web pages represented as an undirected neighborhood graph GN(VN,EN), the computer network further including seed Web pages va and vb in V, a method executable on the computer network for estimating associations between va and vb and other Web pages viε
- V, the method comprising the steps of;
constructing a directed random walk graph by creating a new vi′
in V for each viε
VN, and creating two directed edges e′
2×
k=<
vi′
,vj′
) and e′
2×
k−
1=<
vj′
,vi′
>
in E for each ek=<
vi,vj>
ε
EN wherein both vi and Vj are in VN,computing a penalty value penalty(vi′
) for all vertices vi′
ε
V, andconstructing a |V|×
|V| transition matrix T, where T[j,i] represents a transition value for each directed edge in E denoting a likelihood of moving to vertex vi from vertex vj; and
calculating a steady state distribution convergence vector t of T, wherein for each viε
V, t[i] represents the association between the seed Web pages and vi.- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
computing sdist(vi′
,va′
) as a shortest distance in GN between vi′ and
the vertex va′
corresponding to va;
computing sdist(vi′
,vb′
) as the shortest distance in GN between vi′ and
the vertex vb′
corresponding to vb; and
computing the penalty value as penalty (υ
i′
)=sdist(vi′
, va′
)+sdist(υ
i′
,υ
b′
).
- V, the method comprising the steps of;
-
3. A method as recited in claim 2, the step of constructing a |V|×
- |V| transition matrix T comprising the steps of;
resetting T[j,i]=0.0 for all vertices υ
i′
ε
V and for all (υ
i′
,υ
j′
)∉
E; and
solving the following set of linear equations for all vertices υ
i′
ε
V,
- |V| transition matrix T comprising the steps of;
-
4. A method as recited in claim 3, the step of calculating a steady state distribution (convergence vector) t of T comprising solving a linear equation (I−
- T)t=0, where I is a unit matrix, and
- T)t=0, where I is a unit matrix, and
-
5. A method as recited in claim 2, the computer network further including a set of seed Web pages |S|≧
- 2 in V, the step of computing a penalty value penalty(vi′
) for all vertices vi′
ε
V comprising computing the penalty value as
- 2 in V, the step of computing a penalty value penalty(vi′
-
6. A method as recited in claim 2, the computer network further including a set of seed Web pages |S|≧
- 2 in V, the step of computing a penalty value penalty(vi′
) for all vertices vi′
ε
V comprising computing the penalty value as
- 2 in V, the step of computing a penalty value penalty(vi′
-
7. A method as recited in claim 6, further including the step of defining a relevant neighborhood of GN(VN,EN) for constructing the random walk graph as a set of vertices, VN=VGU (S,d), that are reachable either from the vertices in S in d edge traversals such that
-
v 1 ∈ V G u ( S , d ) v j ∈ S reachable G u ( v j , v i , d ) .
-
-
8. A method as recited in claim 1, further including the step of defining a relevant neighborhood of GN(VN,EN) for constructing the random walk graph as a set of vertices, VN=VGu(va,vb,d), that are reachable either from va or vb in d edge traversals such that ∀
- viε
VGu(va,vb,d) reachableGu(va,vi,d) ν
reachableGu(vb,vi,d).
- viε
-
9. A method as recited in claim 1, each Web page viε
- V having a known relevance value for a particular topic relevance(v,topic), the method further including the step of adjusting the penalty value penalty(vi′
) for all vertices vi′
ε
V by dividing penalty(vi′
) by relevance(v,topic).
- V having a known relevance value for a particular topic relevance(v,topic), the method further including the step of adjusting the penalty value penalty(vi′
-
10. A method as recited in claim 1, further including the step of pre-fetching Web pages into a memory in decreasing order of t[i].
-
11. On a computer network having a set of Web pages V and a set of links between those Web pages E modeled as a directed graph G(V,E), each Web page viε
- V comprising a pair <
Ov,av>
, where Ov is a set of media objects including a main HTML file and av is a page author, and where each object oε
Ov has a known end-user preference upref(u) for an end-user u and a page author preference apref(av) for a page author av, an end-user u accessing at a seed Web page vc, a method executable on the computer network for estimating an association between the media objects and the seed Web page, the method comprising the steps of;calculating a page preference weight pref(u,v) for each Web page vi by applying preference rules defined by upref(u) and apref(av) to the contents of Ov calculating an object preference weight pref(u,o,v) for each object oε
Ov by applying the preference rules defined by upref(u) and apref(av) to the contents of Ov;
generating a random walk graph Gw having a set of vertices Vw and a set of edges Ew;
calculating a page gain gain(u,v) by finding a steady state distribution convergence vector of the random walk graph; and
calculating an object gain gain(u,o) for each object as wherein the object gain represents an association between the object and the seed Web page. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
creating a vertex vi in Vw for each Web page in V;
creating two edges e′
j=<
v′
a,v′
b> and
e″
j=<
v′
b,v′
a>
in Ew for each edge ej=<
va, vb>
in E; and
assigning an edge weight w(e)=s(u, vj) to each edge ej=<
va, vb>
in E.
- V comprising a pair <
-
13. A method as recited in claim 12, wherein s(u,vj) is a known stickiness value for each Web page vjε
- V.
-
14. A method as recited in claim 12, wherein s(u,vj) is assigned a unit value for each edge ej=<
- va,vb>
in E.
- va,vb>
-
15. A method as recited in claim 12, wherein s(u,vj) is assigned a larger value than the unit value for each edge ej=<
- va,vb>
in E that crosses a domain boundary.
- va,vb>
-
16. A method as recited in claim 12, the step of calculating a page gain gain(u,v) by finding a steady state distribution (convergence vector) of the random walk graph comprising the steps of:
-
finding a shortest distance shortest (vc, vi) from vertex vc to all vertices viε
V while taking into account the edge weight using a shortest path algorithm;
for each vertex vε
Vw,calculating a penalty penalty(u,v)=shortest(vc,vi)/(pref(u,v)+1), and calculating a unit probability unit(u,v) by solving calculating
for each edge e=ε
<
vi,vj>
ε
Vw; and
calculating gain(u,v) by finding a steady state distribution (convergence vector) t of T, wherein for each viε
V, t[i] represents the association between the seed Web pages and vi, where T is a matrix of transition values prob(u)(vj|vi).
-
-
17. A method as recited in claim 16, the step of calculating gain(u,v) by finding a steady state distribution (convergence vector) t of T comprising solving a linear equation (I−
- T)t=0, where I is a unit matrix, and Σ
1≦
i≦
|vw | gain(u,v)=1.0.
- T)t=0, where I is a unit matrix, and Σ
-
18. A method as recited in claim 17, wherein each object oε
- Ov has a known size size(o) and, having an available pre-fetch bandwidth Pu and a pre-fetch duration δ
t in which a server may pre-fetch objects into a memory, the method further includes the step of identifying a set of objects Os highly associated with the end-user or seed Web page, the step comprising;defining a cost of each object as cost(o)=size(o); and
identifying a subset Os of Ov such that Σ
oε
Os cost(o)≦
Pu×
δ
t and Σ
oε
Os gain(u,o) is maximized.
- Ov has a known size size(o) and, having an available pre-fetch bandwidth Pu and a pre-fetch duration δ
-
19. A method as recited in claim 18, further including the step of pre-fetching or refreshing objects from Os into the memory.
-
20. A method as recited in claim 18, where each object oε
- Ov has a known expiration time expire(o), the method further including the step of refining the set of objects Os by removing those objects that will expire before their earliest time to view, the step comprising;
finding a shortest path shortest_path(vc,vi) from vertex vc to all vertices viε
V while taking into account the edge weight using a shortest path algorithm;
calculating a measure of an earliest time that a page p may be accessed as earliest(u,p)=shortest_path(vc, vp);
calculating a measure of an earliest time that an object o may be needed as earliest(u,o)=min{earliest(u,p)|pε
pages(o)}; and
eliminating those objects from Ov in which expire(o)<
earliest(u,o) before identifying the set of objects Os.
- Ov has a known expiration time expire(o), the method further including the step of refining the set of objects Os by removing those objects that will expire before their earliest time to view, the step comprising;
-
21. A method as recited in claim 17, wherein each object oε
- Ov has a known size size(o) and, having an available pre-fetch bandwidth Pu and a pre-fetch duration δ
t in which a server may pre-fetch objects into a memory, and given a set of users U or seed Web pages, the method further includes the step of identifying a set of objects Os highly associated with the set of users or seed Web pages, the step comprising;calculating an object gain gain(o) for each object as wherein the object gain represents an association between the object and the set of end-users or seed Web pages; defining a cost of each object as cost(o)=size(o); and
identifying a subset Os of Ov such that is maximized.
- Ov has a known size size(o) and, having an available pre-fetch bandwidth Pu and a pre-fetch duration δ
-
22. On a computer network having a set of Web pages V and a set of links E between those Web pages, a system for estimating associations between seed Web pages va and vb and other Web pages viε
- V, comprising;
memory for storing a location of the seed Web pages va and vb and the other Web pages vi in V; and
a processor programmed for modeling the computer network as an undirected neighborhood graph GN(VN,EN), and programmed for constructing a directed random walk graph by creating a new vi′
in V for each viε
VN, and creating two directed edges e′
2×
k=<
vi′
,vj′
> and
e′
2×
k+1=<
vj′
,vi′
>
in E for each ek=<
vi,vj>
ε
EN wherein both vi and vj are in VN,computing a penalty value penalty(vi′
) for all vertices vi′
ε
V, andconstructing a |V|×
|V| transition matrix T, where T[j,i] represents a transition value for each directed edge in E denoting a likelihood of moving to vertex vi from vertex vj; and
calculating a steady state distribution convergence vector t of T, wherein for each vε
V, t[i] represents the association between the seed Web pages and Vi.- View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31)
computing sdist(vi′
,va′
) as a shortest distance in GN between vi′ and
the vertex va′
corresponding to va;
computing sdist(vi′
,vb′
) as the shortest distance in GN between vi′ and
the vertex vb′
corresponding to vb; and
computing the penalty value as penalty(υ
i′
)=sdist(vi′
,va′
)+sdist(υ
i′
,υ
b′
).
- V, comprising;
-
24. A system as recited in claim 23, the processor further programmed for constructing a |V|×
- |V| transition matrix T by;
resetting T[j,i]=0.0 for all vertices υ
i′
ε
V and forall (υ
i′
,υ
j′
)∉
E; and
solving the following set of linear equations for all vertices vi′
ε
V,
- |V| transition matrix T by;
-
25. A system as recited in claim 24, the processor further programmed for calculating a steady state distribution (convergence vector) t of T by solving a linear equation (I−
- T)t=0, where I is a unit matrix, and
- T)t=0, where I is a unit matrix, and
-
26. A system as recited in claim 23, the computer network further including a set of seed Web pages |S|≧
- 2 in V, the processor further programmed for computing a penalty value penalty(vi′
) for all vertices
- 2 in V, the processor further programmed for computing a penalty value penalty(vi′
-
27. A system as recited in claim 23, the computer network further including a set of seed Web pages |S|≧
- 2 in V, the processor further programmed for computing a penalty value penalty(vi′
) for all vertices viε
V as
- 2 in V, the processor further programmed for computing a penalty value penalty(vi′
-
28. A system as recited in claim 27, the processor further programmed for defining a relevant neighborhood of GN(VN, EN) for constructing the random walk graph as a set of vertices, VN=VGU(S,d), that are reachable either from the vertices in S in d edge traversals such that
-
v 1 ∈ V G u ( S , d ) v j ∈ S reachable G u ( v j , v i , d ) .
-
-
29. A system as recited in claim 22, the processor further programmed for defining a relevant neighborhood of GN(VN,EN) for constructing the random walk graph as a set of vertices, VN=VGu(va,vb,d), that are reachable either from va or vb in d edge traversals such that ∀
- viε
VGu(va,vb,d)reachableGu(va,vi,d)v reachableGu(vb,vi,d).
- viε
-
30. A system as recited in claim 22, each Web page viε
- V having a known relevance value for a particular topic relevance(v,topic), the processor further programmed for adjusting the penalty value penalty(vi′
) for all vertices vi′
ε
V by dividing penalty(vi′
) by relevance(v,topic).
- V having a known relevance value for a particular topic relevance(v,topic), the processor further programmed for adjusting the penalty value penalty(vi′
-
31. A system as recited in claim 22, the processor further programmed for pre-fetching Web pages into the memory in decreasing order of t[i].
-
32. On a computer network having a set of Web pages V and a set of links between those Web pages E, each Web page viε
- V comprising a pair <
Ovav>
, where Ov is a set of media objects including a main HTML file and av is a page author, a system for estimating an association between the media objects and a seed Web page vc corresponding to a current location of an end-user u, comprising;memory for storing, for each object oε
Ov, a known end-user preference upref(u) for the end-user u, a page author preference apref(av) for the page author av, and a location of the seed Web page vc; and
a processor programmed for modeling the computer network as a directed graph G(V,E), and programmed for calculating a page preference weight pref(u,v) for an end user u for each Web page vi by applying preference rules defined by upref(u) and apref(av) to the contents of Ov, calculating an object preference weight pref(u,o,v) for each object oε
Ov by applying the preference rules defined by upref(u) and apref(av) to the contents of Ov,generating a random walk graph Gw having a set of vertices Vw and a set of edges Ew, calculating a page gain gain(u,v) by finding a steady state distribution convergence vector of the random walk graph, and calculating an object gain gain(u,o) for each object as wherein the object gain represents an association between the object and the end-user or seed Web page. - View Dependent Claims (33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
creating a vertex vi in Vw for each Web page in V;
creating two edges e′
j=<
v′
a,v′
b> and
e″
j=<
v′
b,v′
a>
in Ew for each edge ej=(va, vb) in E; and
assigning an edge weight w(e)=s(u,vj) to each edge ej=<
va,vb>
in E.
- V comprising a pair <
-
34. A system as recited in claim 33, wherein s(u,vj) is a known stickiness value for each Web page vjε
- V.
-
35. A system as recited in claim 33, wherein s(u,vj) is assigned a unit value for each edge ej=<
- va,vb>
in E.
- va,vb>
-
36. A system as recited in claim 33, wherein s(u,vj) is assigned a larger value than the unit value for each edge ej=<
- va, vb>
in E that crosses a domain boundary.
- va, vb>
-
37. A system as recited in claim 33, the processor further programmed for calculating a page gain gain(u,v) by finding a steady state distribution (convergence vector) of the random walk graph by:
-
finding a shortest distance shortest (vc, vi) from vertex vc to all vertices viε
V while taking into account the edge weight using a shortest path algorithm;
for each vertex vε
Vw,calculating a penalty penalty(u,v)=shortest(vc,vi)/(pref(u,v)+1), and calculating a unit probability unit(u,v) by solving calculating
for each edge e=ε
<
vi,vj>
ε
Vw; and
calculating gain(u,v) by finding a steady state distribution (convergence vector) of T, where T is a matrix of transition values prob(u)(vj|vi), and
-
-
38. A system as recited in claim 37, the processor further programmed for calculating gain(u,v) by finding a steady state distribution (convergence vector) t of T by solving a linear equation (I−
- T)t=0, where I is a unit matrix, and
- T)t=0, where I is a unit matrix, and
-
39. A system as recited in claim 38:
-
the memory for storing a known size size(o) for each object oε
Ov; and
the processor having an available pre-fetch bandwidth Pu and a pre-fetch duration δ
t for pre-fetching objects into a memory, and further programmed for identifying a set of objects Os highly associated with the end-user or seed Web page bydefining a cost of each object as cost(o)=size(o), and identifying a subset Os of Ov such that Σ
oε
Os cost(o)≦
Pu×
δ
t and Σ
oε
Os gain(u,o) is maximized.
-
-
40. A system as recited in claim 39, the processor further programmed for pre-fetching or refreshing objects from Os into the memory.
-
41. A system as recited in claim 39:
-
the memory for storing a known expiration time expire(o) for each object oε
Ov; and
the processor further programmed for refining the set of objects Os by removing those objects that will expire before their earliest time to view by finding a shortest path shortest_path(vc,vi) from vertex vc to all vertices viε
V while taking into account the edge weight using a shortest path algorithm,calculating a measure of an earliest time that a page p may be accessed as earliest(u,p)=shortest_path(vc,vp), calculating a measure of an earliest time that an object o may be needed as earliest(u,o)=min{earliest(u,p)|pε
pages(o)}, andeliminating those objects from Ov in which expire(o)<
earliest(u,o) before identifying the set of objects Os.
-
-
42. A system as recited in claim 38:
-
the memory for storing a known size size(o) for each object oε
Ov; and
the processor having an available pre-fetch bandwidth PU and a pre-fetch duration δ
t in which a server may pre-fetch objects into a memory, and further programmed for identifying a set of objects Os highly associated with a set of users U or seed Web pages bycalculating an object gain gain(o) for each object as
wherein the object gain represents an association between the object and the set of end-users or seed Web pages,defining a cost of each object as cost(o)=size(o), and identifying a subset Os of Ov such that
-
Specification