Optimizing computer performance by using data compression principles to minimize a loss function
First Claim
1. A method for prefetching data into prefetch cache based on data compression principles, comprising:
- receiving a cache miss address;
roughly predicting a first address responsive to at least the cache miss address;
generating a first address difference by taking a difference between the cache miss address and an earlier rough prediction;
generating a second address difference responsive to at least the first address difference; and
adaptively prefetching a second address responsive to the second address difference and to the first address.
2 Assignments
0 Petitions
Accused Products
Abstract
The method of prefetching data into cache to minimize CPU stall time uses a rough predictor to make rough predictions about what cache lines will be needed next by the CPU. The address difference generator uses the rough prediction and the actual cache miss address to determine the address difference. The prefetch engine builds a data structure to represent address differences and weights them according to the accumulated stall time produced by the cache misses given that the corresponding address is not prefetched. This stall time is modeled as a loss function of the form:
The weights in the data structure change as the prefetch engine learns more information. The prefetch engine'"'"'s goal is to predict the cache line needed and prefetch before the CPU requests it.
-
Citations
68 Claims
-
1. A method for prefetching data into prefetch cache based on data compression principles, comprising:
-
receiving a cache miss address;
roughly predicting a first address responsive to at least the cache miss address;
generating a first address difference by taking a difference between the cache miss address and an earlier rough prediction;
generating a second address difference responsive to at least the first address difference; and
adaptively prefetching a second address responsive to the second address difference and to the first address. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
building a data structure in a tree form, the tree having a plurality of nodes and a plurality of branches, the plurality of nodes including a root, a plurality of internal nodes, and a plurality of leaves, the root having no incoming branch, each internal node and leaf having one incoming branch, the plurality of branches emanating from the plurality of internal nodes and the root, the leaves having no outgoing branches, each of the internal nodes and leaves having a corresponding weight for its incoming branch, the plurality of branches being representative of corresponding address differences, and the tree having a pointer pointing to a current node in the tree;
receiving the first address difference;
updating the weight of branches on the tree emanating from nodes on a path from the current node up toward the root;
traversing the tree in the direction of the first address difference until reaching a new current node;
assigning the pointer to point to the new current node;
ranking the branches on the new current node according to the weight; and
generating the second address difference responsive to the rank on the branches on the new current node.
-
-
7. The method of claim 6 wherein:
-
ranking the branches on the new current node includes identifying a branch with a highest rank; and
generating the second address difference includes generating the second address difference associated with a branch with the highest rank emanating from the new current node.
-
-
8. The method of claim 6 wherein generating the second address difference includes generating a plurality of address differences associated with branches emanating from the new current node.
-
9. The method of claim 6 wherein:
-
ranking the branches on the new current node includes identifying the branches with highest ranks; and
generating the second address difference includes generating a plurality of address differences associated with branches with the highest ranks emanating from the new current node.
-
-
10. The method of claim 6 wherein:
-
ranking the branches on the new current node includes identifying the branches with highest ranks; and
generating the second address difference includes generating the second address difference associated with a branch not having a highest rank emanating from the new current node when the second address responsive to a different second address difference associated with a branch with the highest rank emanating from the new current node and to the first address is already in the prefetch cache or a central processing unit cache, or was previously requested and on its way to the caches.
-
-
11. The method of claim 6 wherein updating the weight of branches on the tree includes:
-
assigning an incremental numerical index j, j≧
0, to the cache miss address xj;
assigning the incremental numerical index j to the current node of the tree;
determining a time tj at which the cache miss address xj occurred; and
incrementing the weight of one branch emanating from each node associated with the indices j−
i by an increment amount, where 0≦
i<
j−
m(j), m(j) is a largest index l that satisfies tj−
tl≧
T, and T is a memory latency for the system architecture in which the method for prefetching data into a prefetch cache is implemented, stopping if the root of the tree is encountered.
-
-
12. The method of claim 11 wherein updating the weight of branches on the tree further includes:
-
associating to the current node the first address yj roughly predicted responsive to the first j cache miss addresses x0, x1, . . . , xj−
1; and
incrementing the weight of the branch representing an address difference xj−
yj−
i emanating from each node with index j−
i.
-
-
13. The method of claim 12 wherein the increment amount for the node with the index j−
- i is given by min(T, tj−
tj−
i−
1).
- i is given by min(T, tj−
-
14. The method of claim 12 wherein the increment amount for the node with the index j−
- i is given by min(T, tj−
tj−
i−
1)−
tj+tk′
(i, j)−
1, where k′
(i,j) is the smallest index of a node with index greater than j−
i such that the address difference corresponding to the branch with the highest rank is xj−
yj−
i, or j+1 if no such node exists.
- i is given by min(T, tj−
-
15. The method of claim 11 where the time tj at which the cache miss address xj occurred is measured in central processing unit cycles not including stall times.
-
16. The method of claim 15, wherein k′
- (i,j) is the smallest index of a node with index greater than j−
i such that the address difference corresponding to the branch with the highest rank is xj−
yj−
i, based on the weights that are available after the cache miss address xj occurs, or j+1 if no such node exists.
- (i,j) is the smallest index of a node with index greater than j−
-
17. The method of claim 6 wherein ranking the branches on the new current node includes:
-
identifying all branches emanating from the new current node;
identifying all nodes with incoming branches emanating from the new current node;
determining the corresponding weight for each branch emanating from the new current node; and
ranking the branches emanating from the new current node according to their corresponding weights.
-
-
18. The method of claim 17 wherein ranking the branches emanating from the new current node includes ranking the branches in descending order responsive to the weights.
-
19. The method of claim 6 wherein the method includes a reset function to reset the data structure in tree form.
-
20. The method of claim 19 wherein the reset function includes adjusting the weights of all branches in the tree after the reset.
-
21. The method of claim 6 wherein traversing the tree includes moving from the new current node to an alternate new current node when the new current node is a leaf, the alternate new current node being found by traversing from the root parallel to a suffix of a path from the root to the new current node.
-
22. The method of claim 1 wherein the second address difference is assigned a reliability.
-
23. The method of claim 22 further comprising not adaptively prefetching a second address if the reliability of the second address difference is below a threshold reliability.
-
24. The method of claim 1 further comprising using the prefetch cache as a victim cache to receive data from a central processing unit cache.
-
25. A system for prefetching data using data compression principles, comprising:
-
means for receiving a cache miss address;
means for roughly predicting a first address responsive to at least the cache miss address;
means for generating a first address difference by taking a difference between the cache miss address and an earlier rough prediction;
means for generating a second address difference responsive to at least the first address difference; and
means for adaptively prefetching a second address responsive to the second address difference and to the first address. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48)
means for building a data structure in a tree form, the tree having a plurality of nodes and a plurality of branches, the plurality of nodes including a root, a plurality of internal nodes, and a plurality of leaves, the root having no incoming branch, each internal node and leaf having one incoming branch, the plurality of branches emanating from the plurality of internal nodes and the root, the leaves having no outgoing branches, each of the nodes having a corresponding weight for its incoming branch, the plurality of branches being representative of corresponding address differences, and the tree having a pointer pointing to a current node in the tree;
means for receiving the first address difference;
means for updating the weight of branches on the tree emanating from nodes on a path from the current node up toward the root;
means for traversing the tree in the direction of the first address difference until reaching a new current node;
means for assigning the pointer to point to the new current node;
means for ranking the branches on the new current node according to the weight; and
means for generating the second address difference responsive to the rank on the branches on the new current node.
-
-
31. The system of claim 30 wherein:
-
the means for ranking the branches on the new current node includes means for identifying a branch with a highest rank; and
the means for generating the second address difference includes means for generating the second address difference associated with a branch with the highest rank emanating from the new current node.
-
-
32. The system of claim 30 wherein the means for generating the second address difference includes means for generating a plurality of address differences associated with branches emanating from the new current node.
-
33. The system of claim 30 wherein:
-
the means for ranking the branches on the new current node includes means for identifying the branches with highest ranks; and
the means for generating the second address difference includes means for generating a plurality of address differences associated with branches with the highest ranks emanating from the new current node.
-
-
34. The system of claim 30 wherein:
-
the means for ranking the branches on the new current node includes means for identifying the branches with highest ranks; and
the means for generating the second address difference includes means for generating the second address difference associated with a branch not having a highest rank emanating from the new current node when the second address responsive to a different second address difference associated with a branch with the highest rank emanating from the new current node and to the first address is already in the prefetch cache or a central processing unit cache, or was previously requested and on its way to the caches.
-
-
35. The system of claim 30 wherein the means for updating the weight of branches on the tree includes:
-
means for assigning an incremental numerical index j,j≧
0, to the cache miss address xj;
means for assigning the incremental numerical index j to the current node of the tree;
means for determining a time tj at which the cache miss address xj occurred; and
means for incrementing the weight of one branch emanating from each node associated with the indices j−
i by an increment amount, where 0≦
i<
j−
m(j), m(j) is a largest index l that satisfies tj−
tl≧
T, and T is a memory latency for the system architecture in which the method for prefetching data into a prefetch cache is implemented, stopping if the root of the tree is encountered.
-
-
36. The system of claim 35 wherein the means for updating the weight of branches on the tree further includes:
-
means for associating to the current node the first address yj roughly predicted responsive to the first j cache miss addresses x0, x1, . . . , xj−
1; and
means for incrementing the weight of the branch representing an address difference xj−
yj−
i emanating from each node with index j−
i.
-
-
37. The system of claim 36 wherein the means for the increment amount for the node with the index j−
- i is given by min(T, tj−
tj−
i−
1).
- i is given by min(T, tj−
-
38. The system of claim 36 wherein the means for the increment amount for the node with the index j−
- i is given by min(T, tj−
tj−
i−
1)−
tj+tk′
(i,j)−
1, where k′
(i,j) is the smallest index of a node with index greater than j−
i such that the address difference corresponding to the branch with the highest rank is xj−
yj−
i, or j+1 if no such node exists.
- i is given by min(T, tj−
-
39. The system of claim 35 where the time tj at which the cache miss address xj occurred is measured in central processing unit cycles not including stall times.
-
40. The system of claim 39, wherein k′
- (i,j) is the smallest index of a node with index greater than j−
i such that the address difference corresponding to the branch with the highest rank is xj−
yj−
i, based on the weights that are available after the cache miss address xj occurs, or j+1 if no such node exists.
- (i,j) is the smallest index of a node with index greater than j−
-
41. The system of claim 35 wherein the second address difference is assigned a reliability.
-
42. The system of claim 41 further comprising means for not adaptively prefetching a second address if the reliability of the second address difference is below a threshold reliability.
-
43. The system of claim 30 wherein the means for ranking the branches on the new current node includes:
-
means for identifying all branches emanating from the new current node;
means for identifying all nodes with incoming branches emanating from the new current node;
means for determining the corresponding weight for each branch emanating from the new current node; and
means for ranking the branches emanating from the new current node according to their corresponding weights.
-
-
44. The system of claim 43 wherein the means for ranking the branches emanating from the new current node includes means for ranking the branches in descending order responsive to the weights.
-
45. The system of claim 30 wherein the system includes a reset function to reset the data structure in tree form.
-
46. The system of claim 45 wherein the reset function includes means for adjusting the weights of all branches in the tree after the reset.
-
47. The system of claim 30 wherein the means for traversing the tree includes means for moving from the new current node to an alternate new current node when the new current node is a leaf, the alternate new current node being found by traversing from the root parallel to a suffix of a path from the root to the new current node.
-
48. The system of claim 25 further comprising means for using the prefetch cache as a victim cache to receive data from a central processing unit cache.
-
49. A method for optimizing computer system performance by observing a sequence of outcomes xj and taking actions bj that sequentially minimize a loss function L, where xj is a jth outcome that has occurred, bj is a jth action taken and depending on the j outcomes x0, x1, . . . , xj−
- 1, the possible actions being in one-to-one correspondence with the possible outcomes, the loss function L being an accumulation of losses with respect to individual actions in accordance with
where sl(j) is the number of past actions contributing to the loss caused by xj, sl(j)≦
j, each action bj−
i contributing a loss Ci(bj−
i, xj), and n is the total number of outcomes, the method comprising;
receiving the outcome xj; and
taking an action bj+1 responsive to the j+1 outcomes x0, x1, . . . , xj designed to minimize the loss function L. - View Dependent Claims (50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68)
roughly predicting an outcome yjresponsive to previous outcomes x0, x1, . . . , xj−
1;
generating a difference between the outcome xj and the roughly predicted outcome yj;
constructing a data structure in a tree form, the tree having a plurality of nodes and a plurality of branches, the plurality of nodes including a root, a plurality of internal nodes, and a plurality of leaves, the root having no incoming branch, each internal node and leaf having one incoming branch, the plurality of branches emanating from the plurality of internal nodes and the root, the leaves having no outgoing branches, each of the nodes having a corresponding weight for its incoming branch, the plurality of branches being representative of a difference between an outcome and a roughly predicted outcome, and the tree having a pointer pointing to a current node in the tree;
receiving the outcome xj with the pointer pointing to the current node;
updating the weight of branches on the tree emanating from nodes on a path from the current node up toward the root;
traversing the tree in the direction of the difference xj−
yj until reaching a new current node;
assigning the pointer to point to the new current node;
ranking the branches on the new current node responsive to the weight; and
taking the action bj+1 responsive to the rank on the branches emanating from the new current node.
- 1, the possible actions being in one-to-one correspondence with the possible outcomes, the loss function L being an accumulation of losses with respect to individual actions in accordance with
-
51. The method of claim 50 wherein
-
( b j - i , x j ) = { 0 for some b * , a function of x j C i ( t 1 , t 2 , … , t j ) otherwise where Ci represents a loss function, b* is a unique action relative to an outcome xj and t0, t1, . . . , tj represent side information.
-
-
52. The method of claim 51 wherein updating branches on the tree includes:
-
associating an index j to the current node; and
incrementing the weight of one branch emanating from each node associated with indices j−
i by an increment amount, where 0≦
i<
sl(j), stopping if the root of the tree is encountered.
-
-
53. The method of claim 52 wherein incrementing the weight of one branch emanating from each node having the indices j−
- i by an increment amount includes incrementing the weight of the branch associated with a difference zj−
yj by an amount Ci(t0, t1, t2, . . ., tj), where zj is the outcome corresponding to the action b* in the one-to-one correspondence between actions and outcomes.
- i by an increment amount includes incrementing the weight of the branch associated with a difference zj−
-
54. The method of claim 53 wherein the computer system performance to be optimized is for prefetching of data into a prefetch cache, where the increment amount Ci(t0, t1, t2, . . . , tj) is given by min(T, tj−
- tj−
i−
1), T is a memory latency for the computer system, the side information tl, 0≦
l≦
j, is a time not including stall times at which an outcome xl occurs in central processing unit cycles, sl(j) is given by j−
m(j), and m(j) is a maximum index l that satisfies tj−
tl≧
T.
- tj−
-
55. The method of claim 51 wherein ranking the branches on the new current node includes:
-
identifying all branches emanating from the new current node;
identifying all nodes with incoming branches emanating from the new current node;
determining the corresponding weight for each branch emanating from the new current node; and
ranking the branches emanating from the new current node according to their corresponding weights.
-
-
56. The method of claim 55 wherein ranking the branches emanating from the new current node includes ranking the branches in descending order responsive to the weights.
-
57. The method of claim 50 wherein the method includes a reset function to reset the data structure in tree form.
-
58. The method of claim 57 wherein the reset function includes adjusting the weights of all branches in the tree after the reset.
-
59. The method of claim 50 wherein traversing the tree includes:
-
identifying the branch emanating from the current node corresponding to the difference between the outcome xj and the roughly predicted outcome yj; and
moving along the identified branch to the new current node.
-
-
60. The method of claim 50 wherein traversing the tree includes moving from the new current node to an alternate new current node when the new current node is a leaf, the alternate new current node being found by traversing from the root parallel to a suffix of a path from the root to the new current node.
-
61. The method of claim 50 wherein:
-
the method further includes ranking the branches on the new current node; and
taking an action responsive to the branches includes taking an action responsive to the rank of the branches emanating from the new current node.
-
-
62. The method of claim 61 wherein taking an action responsive to the rank on the branches includes taking an action associated with a branch emanating from the new current node with a highest rank.
-
63. The method of claim 62 wherein taking an action associated with a branch emanating from the new current node with the highest rank includes:
-
roughly predicting a first outcome yj+1 responsive to previous outcomes x0, x1, . . . , xj;
generating a second outcome by adding to the first outcome yj+1 a difference corresponding to a branch emanating from the new current node with a highest rank; and
taking an action corresponding to the second outcome in the one-to-one correspondence between actions and outcomes.
-
-
64. The method of claim 61 wherein taking an action responsive to the rank on the branches further includes taking a plurality of actions associated with branches emanating from the new current node.
-
65. The method of claim 61 wherein taking an action responsive to the rank on the branches further includes taking a plurality of actions associated with branches emanating from the new current node with highest ranks.
-
66. The method of claim 65 wherein taking a plurality of actions associated with branches emanating from the new current node with highest ranks includes:
-
roughly predicting a first outcome yj+1 responsive to previous outcomes x0, x1, . . . , xj;
generating multiple outcomes by adding to the first outcome yj+1 a difference corresponding to branches emanating from the new current node with highest ranks; and
taking actions corresponding to the multiple outcomes in the one-to-one correspondence between actions and outcomes.
-
-
67. The method of claim 49 wherein the branches emanating from the new current node are each assigned a reliability.
-
68. The method of claim 67 further comprising not taking the action bj+1 if the reliability of each branch emanating from the new current node is below a threshold reliability.
Specification