Insertion of prefetch instructions into computer program code
First Claim
1. A computerized system that, at compile-time, converts a first set of computer program instructions of a relatively higher level program instruction language into a second set of computer program instructions of a relatively lower level program instruction language, the system comprising:
- means for making a determination whether to insert a proposed memory prefetch instruction into a location in the second set of computer program instructions;
means for deciding whether insertion of the proposed prefetch instruction at the location is more likely than not to cause an undesired cache memory conflict if the pro-posed prefetch instruction were to be inserted; and
means, responsive to the deciding means, for one of inserting and not inserting the proposed prefetch instruction.
4 Assignments
0 Petitions
Accused Products
Abstract
A technique is provided for inserting memory prefetch instructions only at appropriate locations in program code. The instructions are inserted into the program code such that, when the code is executed, the speed and efficiency of execution of the code may be improved, cache conflicts arising from execution of the prefetch instruction may be substantially eliminated, and the number of simultaneously-executing memory prefetch operations may be limited to prevent stalling and/or overtaxing of the processor executing the code.
120 Citations
63 Claims
-
1. A computerized system that, at compile-time, converts a first set of computer program instructions of a relatively higher level program instruction language into a second set of computer program instructions of a relatively lower level program instruction language, the system comprising:
-
means for making a determination whether to insert a proposed memory prefetch instruction into a location in the second set of computer program instructions;
means for deciding whether insertion of the proposed prefetch instruction at the location is more likely than not to cause an undesired cache memory conflict if the pro-posed prefetch instruction were to be inserted; and
means, responsive to the deciding means, for one of inserting and not inserting the proposed prefetch instruction. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
the process also performs a program loop unrolling operation.
-
-
3. A system according to claim 1, wherein:
the determination is also made based upon whether the insertion of the prefetch instruction at the location is likely to permit an undesirably large number of memory operations to be contemporaneously executing.
-
4. A system according to claim 3, wherein:
the undesirably large number is based upon a maximum number of memory operations that can be contemporaneously executed by a processor.
-
5. A system according to claim 3, wherein:
a prefetch distance is used to calculate a total demand for memory bandwidth which is compared to available memory resources when determining whether insertion of a prefetch instruction would permit an undesirably large number of memory operations to be contemporaneously executing.
-
6. A system according to claim 1, wherein:
the relatively lower level program instruction language is a machine-independent language.
-
7. A system according to claim 5, further comprising:
a machine code generation process that converts the second set of program instructions into program code that is executable by a target processor.
-
8. A system according to claim 1, wherein the determination is based upon:
analyzing the references for cache conflicts based on one of an order of relative offsets and a sort of the references according to offsets of the references modulo cache set size.
-
9. A system according to claim 1, wherein:
said determination of whether insertion of a proposed memory prefetch instruction is more likely than not to cause an undesired cache memory conflict is based on calculation of a prefetch distance and the logical layout of available cache memory.
-
10. A system according to claim 9, wherein:
said logical layout of available cache memory includes set-associativity, number of cache lines and cache line size.
-
11. A computerized system that, at compile-time, inserts at least one memory prefetch instruction at a location in a set of computer program instructions, the system comprising:
-
a process resident in said system that makes a determination whether to insert the at least one prefetch instruction at the location based at least upon whether insertion of the prefetch instruction at the location is likely to permit an undesirably large number of memory operations to be contemporaneously executing. - View Dependent Claims (12, 13, 14, 15)
the process also performs a program loop unrolling operation.
-
-
13. A system according to claim 11, wherein:
the undesirably large number is based upon a maximum number of memory operations that can be contemporaneously executed by a processor.
-
14. A system according to claim 11, further comprising:
a machine code generation process that converts the set of program instructions into program code that is executable by a target processor.
-
15. A system according to claim 11, wherein:
a prefetch distance is used to calculate a total demand for memory bandwidth which is compared to available memory resources when determining whether insertion of a prefetch instruction would permit an undesirably large number of memory operations to be contemporaneously executing.
-
16. A computerized method for converting, at compile-time, a first set of computer program instructions of a relatively higher level program instruction language into a second set of computer program instructions of a relatively lower level program instruction language, the method comprising:
-
determining whether to insert a memory prefetch instruction into a location in the second set of computer program instructions, based at least upon whether insertion of the prefetch instruction at the location is likely to cause an undesired cache memory conflict if the prefetch instruction were to be executed. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24)
performing a program loop unrolling operation.
-
-
18. A method according to claim 16, wherein:
determination of whether to insert the prefetch instruction at the location is also made based upon whether the insertion of the prefetch instruction at the location is likely to permit an undesirable large number of memory operations to be contemporaneously executing.
-
19. A method according to claim 18, wherein:
the undesirably large number is based upon a maximum number of memory operations that can be contemporaneously executed by a processor.
-
20. A method according to claim 18, wherein:
a prefetch distance is used to calculate a total demand for memory bandwidth which is compared to available memory resources when determining whether insertion of a prefetch instruction would permit an undesirably large number of memory operations to be contemporaneously executing.
-
21. A method according to claim 16, wherein:
the relatively lower level program instruction language is a machine-independent language.
-
22. A method according to claim 21, further comprising:
converting the second set of program instructions into program code that is executable by a target processor.
-
23. A method according to claim 16, wherein:
said determination of whether insertion of a memory prefetch instruction is likely to cause an undesired cache memory conflict is based on calculation of a prefetch distance and the logical layout of available cache memory.
-
24. A method according to claim 23, wherein:
said logical layout of available cache memory includes set-associativity, number of cache lines and cache line size.
-
25. A computerized method for inserting at compile-time at least one memory prefetch instruction at a location in a set of computer program instructions, the method comprising:
-
determining whether to insert the at least one prefetch instruction at the location based at least upon whether insertion of the prefetch instruction is likely to cause an undesired cache memory conflict if the prefetch instruction were to be executed. - View Dependent Claims (26, 27, 28, 29, 30, 31)
the determination is also made based upon whether the insertion of the prefetch instruction at the location is likely to permit an undesirably large number of memory operations to be contemporaneously executing.
-
-
27. A method according to claim 26, wherein:
the undesirably large number is based upon a maximum number of memory operations that can be contemporaneously executed by a processor.
-
28. A method according to claim 26, wherein:
a prefetch distance is used to calculate a total demand for memory bandwidth which is compared to available memory resources when determining whether insertion of a prefetch instruction would permit an undesirably large number of memory operations to be contemporaneously executing.
-
29. A method according to claim 25, further comprising:
converting the second set of program instructions into program code that is executable by a target processor.
-
30. A method according to claim 25, wherein:
said determination of whether insertion of a memory prefetch instruction is likely to cause an undesired cache memory conflict is based on calculation of a prefetch distance and the logical layout of available cache memory.
-
31. A method according to claim 30, wherein:
said logical layout of available cache memory includes set-associativity, number of cache lines and cache line size.
-
32. A computerized method for inserting at compile-time at least one memory prefetch instruction at a location in a set of computer program instructions, the method comprising:
-
determining whether to insert the at least one prefetch instruction at the location based at least upon whether insertion of the prefetch instruction at the location is likely to permit an undesirably large number of memory operations to be contemporaneously executing. - View Dependent Claims (33, 34, 35)
the undesirably large number is based upon a maximum number of memory operations that can be contemporaneously executed by a processor.
-
-
34. A method according to claim 32, further comprising:
- converting the second set of program instructions into program code that is executable by a target processor.
-
35. A method according to claim 32, wherein:
a prefetch distance is used to calculate a total demand for memory bandwidth which is compared to available memory resources when determining whether insertion of a prefetch instruction would permit an undesirably large number of memory operations to be contemporaneously executing.
-
36. Computer-readable memory comprising a first set of computer program instructions that when executed at compile-time converts a second set of computer program instructions of a relatively higher level program instruction language into a third set of computer program instructions of a relatively lower level program instruction language, the first set of computer program instructions comprising instructions that when executed:
-
makes a determination whether to insert a memory prefetch instruction into a location in the third set of computer program instructions, based at least upon whether insertion of the prefetch instruction at the location is likely to cause an undesired cache memory conflict if the prefetch instruction were to be executed. - View Dependent Claims (37, 38, 39, 40, 41, 42, 43, 44)
the first set of instructions when executed also performs a program loop unrolling operation.
-
-
38. Computer-readable memory according to claim 36, wherein:
the determination is also made based upon whether the insertion of the prefetch instruction at the location is likely to permit an undesirably large number of memory operations to be contemporaneously executing.
-
39. Computer-readable memory according to claim 38, wherein:
the undesirably large number is based upon a maximum number of memory operations that can be contemporaneously executed by a processor.
-
40. Computer-readable memory according to claim 38, wherein:
a prefetch distance is used to calculate a total demand for memory bandwidth which is compared to available memory resources when determining whether insertion of a prefetch instruction would permit an undesirably large number of memory operations to be contemporaneously executing.
-
41. Computer-readable memory according to claim 36, wherein:
the relatively lower level program instruction language is a machine-independent language.
-
42. Computer-readable memory according to claim 41, further comprising:
machine code generation instructions that when executed convert the third set of program instructions into program code that is executable by a target processor.
-
43. Computer-readable memory according to claim 36, wherein:
said determination of whether insertion of a memory prefetch instruction is likely to cause an undesired cache memory conflict is based on calculation of a prefetch distance and the logical layout of available cache memory.
-
44. Computer-readable memory according to claim 43, wherein:
said logical layout of available cache memory includes set-associativity, number of cache lines and cache line size.
-
45. Computer-readable memory comprising a first set of computer program instructions that when executed at compile-time inserts at least one memory prefetch instruction at a location in a second set of computer program instructions, the first set of instructions comprising instructions that when executed:
-
makes a determination whether to insert the at least one prefetch instruction at the location based at least upon whether insertion of the prefetch instruction is likely to cause an undesired cache memory conflict if the prefetch instruction were to be executed. - View Dependent Claims (46, 47, 48, 49, 50, 51, 52)
the first set of instructions when executed also performs a program loop unrolling operation.
-
-
47. Computer-readable memory according to claim 45, wherein:
the first set of instructions when executed also makes the determination based upon whether the insertion of the prefetch instruction at the location is likely to permit an undesirably large number of memory operations to be contemporaneously executing.
-
48. Computer-readable memory according to claim 47, wherein:
the undesirably large number is based upon a maximum number of memory operations that can be contemporaneously executed by a processor.
-
49. Computer-readable memory according to claim 47, wherein:
a prefetch distance is used to calculate a total demand for memory bandwidth which is compared to available memory resources when determining whether insertion of a prefetch instruction would permit an undesirably large number of memory operations to be contemporaneously executing.
-
50. Computer-readable memory according to claim 45, further comprising:
machine code generation instructions that when executed convert the second set of program instructions into program code that is executable by a target processor.
-
51. Computer-readable memory according to claim 45, wherein:
said determination of whether insertion of a memory prefetch instruction is likely to cause an undesired cache memory conflict is based on calculation of a prefetch distance and the logical layout of available cache memory.
-
52. Computer-readable memory according to claim 51, wherein:
said logical layout of available cache memory includes set-associativity, number of cache lines and cache line size.
-
53. Computer-readable memory comprising a first set of computer program instructions that when executed at compile-time inserts at least one memory prefetch instruction at a location in a second set of computer program instructions, the first set of instructions comprising instructions that when executed:
-
makes a determination whether to insert the at least one prefetch instruction at the location based at least upon whether insertion of the prefetch instruction at the location is likely to permit an undesirably large number of memory operations to be contemporaneously executing. - View Dependent Claims (54, 55, 56, 57)
the first set of instructions when executed also performs a program loop unrolling operation.
-
-
55. Computer-readable memory according to claim 53, wherein:
the undesirably large number is based upon a maximum number of memory operations that can be contemporaneously executed by a processor.
-
56. Computer-readable memory according to claim 53, further comprising:
machine code generation instructions that when executed convert the second set of program instructions into program code that is executable by a target processor.
-
57. Computer-readable memory according to claim 53, wherein:
a prefetch distance is used to calculate a total demand for memory bandwidth which is compared to available memory resources when determining whether insertion of a prefetch instruction would permit an undesirably large number of memory operations to be contemporaneously executing.
-
58. A computerized system that converts a first set of computer program instructions of a relatively higher level program instruction language into a second set of computer program instructions of a relatively lower level program instruction language, the system comprising:
a process resident in said system that performs a cache memory reuse analysis that includes sorting array memory references in at least one of the sets of instructions based upon relative offsets of said references, said sorting being carried out using a B-tree in which each of said references is inserted.
-
59. A computerized system that inserts at least one memory prefetch instruction at a location in a set of computer program instructions, the system comprising:
a process resident in said system that makes a determination whether to insert the at least one prefetch instruction at the location based at least upon a prefetch distance, in terms of cache memory lines, associated with the location.
-
60. A computerized method for converting a first set of computer program instructions of a relatively higher level program instruction language into a second set of computer program instructions of a relatively lower level program instruction language, the method comprising:
performing a cache memory reuse analysis that includes sorting array memory references in at least one of the sets of instructions based upon relative offsets of said references, said sorting being carried out using a B-tree in which each of said references is inserted.
-
61. A computerized method for inserting at least one memory prefetch instruction at a location in a set of computer program instructions, the method comprising:
determining whether to insert the at least one prefetch instruction at the location based at least upon a prefetch distance in terms of cache memory lines, associated with the location.
-
62. Computer-readable memory comprising first program instructions that when executed convert one set of computer program instructions of a relatively higher level program instruction language in to another set of computer program instructions of a relatively lower level program instruction language, the first program instructions when executed:
performing a cache memory reuse analysis that includes sorting array memory references in at least one of the one and another sets of instructions based upon relative offsets of said references, said sorting being carried out using a B-tree in which each of said references is inserted.
-
63. Computer-readable memory comprising first program instructions that when executed inserts at least one memory prefetch instruction at a location in a set of computer program instructions, the first program instructions when executed:
determining whether to insert the at least one prefetch instruction at the location based at least upon a prefetch distance in terms of cache memory lines, associated with the location.
Specification