Optimizing computer performance by using data compression principles to minimize a loss function

US 6,453,389 B1
Filed: 06/25/1999
Issued: 09/17/2002
Est. Priority Date: 06/25/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A method for prefetching data into prefetch cache based on data compression principles, comprising:

receiving a cache miss address;

roughly predicting a first address responsive to at least the cache miss address;

generating a first address difference by taking a difference between the cache miss address and an earlier rough prediction;

generating a second address difference responsive to at least the first address difference; and

adaptively prefetching a second address responsive to the second address difference and to the first address.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The method of prefetching data into cache to minimize CPU stall time uses a rough predictor to make rough predictions about what cache lines will be needed next by the CPU. The address difference generator uses the rough prediction and the actual cache miss address to determine the address difference. The prefetch engine builds a data structure to represent address differences and weights them according to the accumulated stall time produced by the cache misses given that the corresponding address is not prefetched. This stall time is modeled as a loss function of the form: $L = \sum_{j = 0}^{n} L (x_{j})$ $L (x_{j}) = \sum_{i = 0}^{sl (j) - 1} C_{i} (b_{j - i}, x_{j})$ embedded image

The weights in the data structure change as the prefetch engine learns more information. The prefetch engine'"'"'s goal is to predict the cache line needed and prefetch before the CPU requests it.

Citations

68 Claims

1. A method for prefetching data into prefetch cache based on data compression principles, comprising:
- receiving a cache miss address;
  
  roughly predicting a first address responsive to at least the cache miss address;
  
  generating a first address difference by taking a difference between the cache miss address and an earlier rough prediction;
  
  generating a second address difference responsive to at least the first address difference; and
  
  adaptively prefetching a second address responsive to the second address difference and to the first address.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
- - 2. The method of claim 1 further comprising iteratively repeating the steps.
  - 3. The method of claim 1 wherein adaptively prefetching a second address includes adding the second address difference to the first address.
  - 4. The method of claim 1 wherein roughly predicting the first address includes adding a fixed offset to the cache miss address.
  - 5. The method of claim 1 wherein roughly predicting the first address includes calculating the predicted address using a linear predictor of varying order.
  - 6. The method of claim 1 wherein generating a second address difference comprises:
7. The method of claim 6 wherein:
- ranking the branches on the new current node includes identifying a branch with a highest rank; and
  
  generating the second address difference includes generating the second address difference associated with a branch with the highest rank emanating from the new current node.
8. The method of claim 6 wherein generating the second address difference includes generating a plurality of address differences associated with branches emanating from the new current node.
9. The method of claim 6 wherein:
- ranking the branches on the new current node includes identifying the branches with highest ranks; and
  
  generating the second address difference includes generating a plurality of address differences associated with branches with the highest ranks emanating from the new current node.
10. The method of claim 6 wherein:
- ranking the branches on the new current node includes identifying the branches with highest ranks; and
  
  generating the second address difference includes generating the second address difference associated with a branch not having a highest rank emanating from the new current node when the second address responsive to a different second address difference associated with a branch with the highest rank emanating from the new current node and to the first address is already in the prefetch cache or a central processing unit cache, or was previously requested and on its way to the caches.
11. The method of claim 6 wherein updating the weight of branches on the tree includes:
- assigning an incremental numerical index j, j≧
  
  0, to the cache miss address x_j;
  
  assigning the incremental numerical index j to the current node of the tree;
  
  determining a time t_jat which the cache miss address x_joccurred; and
  
  incrementing the weight of one branch emanating from each node associated with the indices j−
  
  i by an increment amount, where 0≦
  
  i<
  
  j−
  
  m(j), m(j) is a largest index l that satisfies t_j−
  
  t_l≧
  
  T, and T is a memory latency for the system architecture in which the method for prefetching data into a prefetch cache is implemented, stopping if the root of the tree is encountered.
12. The method of claim 11 wherein updating the weight of branches on the tree further includes:
- associating to the current node the first address y_jroughly predicted responsive to the first j cache miss addresses x₀, x₁, . . . , x_j−
  
  1; and
  
  incrementing the weight of the branch representing an address difference x_j−
  
  y_j−
  
  iemanating from each node with index j−
  
  i.
13. The method of claim 12 wherein the increment amount for the node with the index j−
- i is given by min(T, t_j−
  
  t_{j−
  
  i−
  
  1}).
14. The method of claim 12 wherein the increment amount for the node with the index j−
- i is given by min(T, t_j−
  
  t_{j−
  
  i−
  
  1})−
  
  t_j+t_{k′
  
  (i, j)−
  
  1}, where k′
  
  (i,j) is the smallest index of a node with index greater than j−
  
  i such that the address difference corresponding to the branch with the highest rank is x_j−
  
  y_j−
  
  i, or j+1 if no such node exists.
15. The method of claim 11 where the time t_jat which the cache miss address x_joccurred is measured in central processing unit cycles not including stall times.
16. The method of claim 15, wherein k′
- (i,j) is the smallest index of a node with index greater than j−
  
  i such that the address difference corresponding to the branch with the highest rank is x_j−
  
  y_j−
  
  i, based on the weights that are available after the cache miss address x_joccurs, or j+1 if no such node exists.
17. The method of claim 6 wherein ranking the branches on the new current node includes:
- identifying all branches emanating from the new current node;
  
  identifying all nodes with incoming branches emanating from the new current node;
  
  determining the corresponding weight for each branch emanating from the new current node; and
  
  ranking the branches emanating from the new current node according to their corresponding weights.
18. The method of claim 17 wherein ranking the branches emanating from the new current node includes ranking the branches in descending order responsive to the weights.
19. The method of claim 6 wherein the method includes a reset function to reset the data structure in tree form.
20. The method of claim 19 wherein the reset function includes adjusting the weights of all branches in the tree after the reset.
21. The method of claim 6 wherein traversing the tree includes moving from the new current node to an alternate new current node when the new current node is a leaf, the alternate new current node being found by traversing from the root parallel to a suffix of a path from the root to the new current node.
22. The method of claim 1 wherein the second address difference is assigned a reliability.
23. The method of claim 22 further comprising not adaptively prefetching a second address if the reliability of the second address difference is below a threshold reliability.
24. The method of claim 1 further comprising using the prefetch cache as a victim cache to receive data from a central processing unit cache.

25. A system for prefetching data using data compression principles, comprising:
- means for receiving a cache miss address;
  
  means for roughly predicting a first address responsive to at least the cache miss address;
  
  means for generating a first address difference by taking a difference between the cache miss address and an earlier rough prediction;
  
  means for generating a second address difference responsive to at least the first address difference; and
  
  means for adaptively prefetching a second address responsive to the second address difference and to the first address.
- View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48)
- - 26. The system of claim 25 further comprising means for iteratively repeating the steps.
  - 27. The system of claim 25 wherein the means for adaptively prefetching a second address includes means for adding the second address difference to the first address.
  - 28. The system of claim 25 wherein the means for roughly predicting the first address includes means for adding a fixed offset to the cache miss address.
  - 29. The system of claim 25 wherein the means for roughly predicting the first address includes means for calculating the predicted address using a linear predictor of varying order.
  - 30. The system of claim 25 wherein the means for generating a second address difference comprises:
31. The system of claim 30 wherein:
- the means for ranking the branches on the new current node includes means for identifying a branch with a highest rank; and
  
  the means for generating the second address difference includes means for generating the second address difference associated with a branch with the highest rank emanating from the new current node.
32. The system of claim 30 wherein the means for generating the second address difference includes means for generating a plurality of address differences associated with branches emanating from the new current node.
33. The system of claim 30 wherein:
- the means for ranking the branches on the new current node includes means for identifying the branches with highest ranks; and
  
  the means for generating the second address difference includes means for generating a plurality of address differences associated with branches with the highest ranks emanating from the new current node.
34. The system of claim 30 wherein:
- the means for ranking the branches on the new current node includes means for identifying the branches with highest ranks; and
  
  the means for generating the second address difference includes means for generating the second address difference associated with a branch not having a highest rank emanating from the new current node when the second address responsive to a different second address difference associated with a branch with the highest rank emanating from the new current node and to the first address is already in the prefetch cache or a central processing unit cache, or was previously requested and on its way to the caches.
35. The system of claim 30 wherein the means for updating the weight of branches on the tree includes:
- means for assigning an incremental numerical index j,j≧
  
  0, to the cache miss address x_j;
  
  means for assigning the incremental numerical index j to the current node of the tree;
  
  means for determining a time t_jat which the cache miss address x_joccurred; and
  
  means for incrementing the weight of one branch emanating from each node associated with the indices j−
  
  i by an increment amount, where 0≦
  
  i<
  
  j−
  
  m(j), m(j) is a largest index l that satisfies t_j−
  
  t_l≧
  
  T, and T is a memory latency for the system architecture in which the method for prefetching data into a prefetch cache is implemented, stopping if the root of the tree is encountered.
36. The system of claim 35 wherein the means for updating the weight of branches on the tree further includes:
- means for associating to the current node the first address y_jroughly predicted responsive to the first j cache miss addresses x₀, x₁, . . . , x_j−
  
  1; and
  
  means for incrementing the weight of the branch representing an address difference x_j−
  
  y_j−
  
  iemanating from each node with index j−
  
  i.
37. The system of claim 36 wherein the means for the increment amount for the node with the index j−
- i is given by min(T, t_j−
  
  t_{j−
  
  i−
  
  1}).
38. The system of claim 36 wherein the means for the increment amount for the node with the index j−
- i is given by min(T, t_j−
  
  t_{j−
  
  i−
  
  1})−
  
  t_j+t_{k′
  
  (i,j)−
  
  1}, where k′
  
  (i,j) is the smallest index of a node with index greater than j−
  
  i such that the address difference corresponding to the branch with the highest rank is x_j−
  
  y_j−
  
  i, or j+1 if no such node exists.
39. The system of claim 35 where the time t_jat which the cache miss address x_joccurred is measured in central processing unit cycles not including stall times.
40. The system of claim 39, wherein k′
- (i,j) is the smallest index of a node with index greater than j−
  
  i such that the address difference corresponding to the branch with the highest rank is x_j−
  
  y_j−
  
  i, based on the weights that are available after the cache miss address x_joccurs, or j+1 if no such node exists.
41. The system of claim 35 wherein the second address difference is assigned a reliability.
42. The system of claim 41 further comprising means for not adaptively prefetching a second address if the reliability of the second address difference is below a threshold reliability.
43. The system of claim 30 wherein the means for ranking the branches on the new current node includes:
- means for identifying all branches emanating from the new current node;
  
  means for identifying all nodes with incoming branches emanating from the new current node;
  
  means for determining the corresponding weight for each branch emanating from the new current node; and
  
  means for ranking the branches emanating from the new current node according to their corresponding weights.
44. The system of claim 43 wherein the means for ranking the branches emanating from the new current node includes means for ranking the branches in descending order responsive to the weights.
45. The system of claim 30 wherein the system includes a reset function to reset the data structure in tree form.
46. The system of claim 45 wherein the reset function includes means for adjusting the weights of all branches in the tree after the reset.
47. The system of claim 30 wherein the means for traversing the tree includes means for moving from the new current node to an alternate new current node when the new current node is a leaf, the alternate new current node being found by traversing from the root parallel to a suffix of a path from the root to the new current node.
48. The system of claim 25 further comprising means for using the prefetch cache as a victim cache to receive data from a central processing unit cache.

49. A method for optimizing computer system performance by observing a sequence of outcomes x_jand taking actions b_jthat sequentially minimize a loss function L, where x_jis a j^thoutcome that has occurred, b_jis a j^thaction taken and depending on the j outcomes x₀, x₁, . . . , x_j−
- 1, the possible actions being in one-to-one correspondence with the possible outcomes, the loss function L being an accumulation of losses with respect to individual actions in accordance with $L = \sum_{j = 0}^{n} L (x_{j})$ $L (x_{j}) = \sum_{i = 0}^{sl (j) - 1} C_{i} (b_{j - i}, x_{j})$ where sl(j) is the number of past actions contributing to the loss caused by x_j, sl(j)≦
  
  j, each action b_j−
  
  icontributing a loss C_i(b_j−
  
  i, x_j), and n is the total number of outcomes, the method comprising;
  
  receiving the outcome x_j; and
  
  taking an action b_j+1responsive to the j+1 outcomes x₀, x₁, . . . , x_jdesigned to minimize the loss function L.
- View Dependent Claims (50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68)
- - 50. The method of claim 49 wherein taking an action b_j+1responsive to the j+1 outcomes x₀, x₁, . . . , x_jincludes:
51. The method of claim 50 wherein $C_{i}$
- (bj-i,xj)={0for
  
  
  
  some
  
  
  
  b*,
  
  
  
  a
  
  
  
  function
  
  
  
  of
  
  
  
  xjCi
  
  (t1,t2,…
  
  
  
  ,tj)otherwisewhere C_irepresents a loss function, b* is a unique action relative to an outcome x_jand t₀, t₁, . . . , t_jrepresent side information.
52. The method of claim 51 wherein updating branches on the tree includes:
- associating an index j to the current node; and
  
  incrementing the weight of one branch emanating from each node associated with indices j−
  
  i by an increment amount, where 0≦
  
  i<
  
  sl(j), stopping if the root of the tree is encountered.
53. The method of claim 52 wherein incrementing the weight of one branch emanating from each node having the indices j−
- i by an increment amount includes incrementing the weight of the branch associated with a difference z_j−
  
  y_jby an amount C_i(t₀, t₁, t₂, . . ., t_j), where z_jis the outcome corresponding to the action b* in the one-to-one correspondence between actions and outcomes.
54. The method of claim 53 wherein the computer system performance to be optimized is for prefetching of data into a prefetch cache, where the increment amount C_i(t₀, t₁, t₂, . . . , t_j) is given by min(T, t_j−
- t_{j−
  
  i−
  
  1}), T is a memory latency for the computer system, the side information t_l, 0≦
  
  l≦
  
  j, is a time not including stall times at which an outcome x_loccurs in central processing unit cycles, sl(j) is given by j−
  
  m(j), and m(j) is a maximum index l that satisfies t_j−
  
  t_l≧
  
  T.
55. The method of claim 51 wherein ranking the branches on the new current node includes:
- identifying all branches emanating from the new current node;
  
  identifying all nodes with incoming branches emanating from the new current node;
  
  determining the corresponding weight for each branch emanating from the new current node; and
  
  ranking the branches emanating from the new current node according to their corresponding weights.
56. The method of claim 55 wherein ranking the branches emanating from the new current node includes ranking the branches in descending order responsive to the weights.
57. The method of claim 50 wherein the method includes a reset function to reset the data structure in tree form.
58. The method of claim 57 wherein the reset function includes adjusting the weights of all branches in the tree after the reset.
59. The method of claim 50 wherein traversing the tree includes:
- identifying the branch emanating from the current node corresponding to the difference between the outcome x_jand the roughly predicted outcome y_j; and
  
  moving along the identified branch to the new current node.
60. The method of claim 50 wherein traversing the tree includes moving from the new current node to an alternate new current node when the new current node is a leaf, the alternate new current node being found by traversing from the root parallel to a suffix of a path from the root to the new current node.
61. The method of claim 50 wherein:
- the method further includes ranking the branches on the new current node; and
  
  taking an action responsive to the branches includes taking an action responsive to the rank of the branches emanating from the new current node.
62. The method of claim 61 wherein taking an action responsive to the rank on the branches includes taking an action associated with a branch emanating from the new current node with a highest rank.
63. The method of claim 62 wherein taking an action associated with a branch emanating from the new current node with the highest rank includes:
- roughly predicting a first outcome y_j+1responsive to previous outcomes x₀, x₁, . . . , x_j;
  
  generating a second outcome by adding to the first outcome y_j+1a difference corresponding to a branch emanating from the new current node with a highest rank; and
  
  taking an action corresponding to the second outcome in the one-to-one correspondence between actions and outcomes.
64. The method of claim 61 wherein taking an action responsive to the rank on the branches further includes taking a plurality of actions associated with branches emanating from the new current node.
65. The method of claim 61 wherein taking an action responsive to the rank on the branches further includes taking a plurality of actions associated with branches emanating from the new current node with highest ranks.
66. The method of claim 65 wherein taking a plurality of actions associated with branches emanating from the new current node with highest ranks includes:
- roughly predicting a first outcome y_j+1responsive to previous outcomes x₀, x₁, . . . , x_j;
  
  generating multiple outcomes by adding to the first outcome y_j+1a difference corresponding to branches emanating from the new current node with highest ranks; and
  
  taking actions corresponding to the multiple outcomes in the one-to-one correspondence between actions and outcomes.
67. The method of claim 49 wherein the branches emanating from the new current node are each assigned a reliability.
68. The method of claim 67 further comprising not taking the action b_j+1if the reliability of each branch emanating from the new current node is below a threshold reliability.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hewlett-Packard Development Company, L.P. (HP Inc.)
Original Assignee
Hewlett-Packard Company (HP Inc.)
Inventors
Ferrandiz, Joesp M., Gupta, Rajiv, Weinberger, Marcelo, Seroussi, Gadiel, Merhav, Neri, Rokicki, Tomas G.
Primary Examiner(s)
Hudspeth, David
Assistant Examiner(s)
TZENG, FRED

Application Number

US09/340,279
Time in Patent Office

1,180 Days
Field of Search

711/118, 711/137, 711/213, 712/235, 712/207, 712/238, 712/230, 712/237
US Class Current

711/137
CPC Class Codes

G06F 12/0862 with prefetch

G06F 2212/6026 Prefetching based on access...

Optimizing computer performance by using data compression principles to minimize a loss function

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

68 Claims

Specification

Solutions

Use Cases

Quick Links

Optimizing computer performance by using data compression principles to minimize a loss function

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

68 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links