AUTOMATIC LOOP VECTORIZATION USING HARDWARE TRANSACTIONAL MEMORY
First Claim
1. A computing device for loop vectorization, the computing device comprising:
- an analysis module to detect a loop of a source program, the loop to define one or more scalar iterations and have a loop body for execution during each scalar iteration; and
a vectorization module to;
generate a transactional code segment, wherein to generate the transactional code segment comprises to generate a vectorized implementation of the loop body within the transactional code segment as a function of the loop of the source program, the vectorized implementation to define one or more vector iterations and to include a vector memory read instruction capable of generation of an exception; and
generate a non-transactional fallback code segment associated with the transactional code segment, the non-transactional fallback code to be executed in response to generation of an exception within the transactional code segment and comprising a scalar implementation of the loop body.
1 Assignment
0 Petitions
Accused Products
Abstract
Technologies for automatic loop vectorization include a computing device with an optimizing compiler. During an optimization pass, the compiler identifies a loop and generates a transactional code segment including a vectorized implementation of the loop body including one or more vector memory read instructions capable of generating an exception. The compiler also generates a non-transactional fallback code segment including a scalar implementation of the loop body that is executed in response to an exception generated within the transactional code segment. The compiler may detect whether the loop contains a memory read dependent on a condition that may be updated in a previous iteration or whether the loop contains a potential data dependence between two iterations. The compiler may generate a dynamic check for an actual data dependence and an explicit transactional abort instruction to be executed when an actual data dependence exists. Other embodiments are described and claimed.
-
Citations
20 Claims
-
1. A computing device for loop vectorization, the computing device comprising:
-
an analysis module to detect a loop of a source program, the loop to define one or more scalar iterations and have a loop body for execution during each scalar iteration; and a vectorization module to; generate a transactional code segment, wherein to generate the transactional code segment comprises to generate a vectorized implementation of the loop body within the transactional code segment as a function of the loop of the source program, the vectorized implementation to define one or more vector iterations and to include a vector memory read instruction capable of generation of an exception; and generate a non-transactional fallback code segment associated with the transactional code segment, the non-transactional fallback code to be executed in response to generation of an exception within the transactional code segment and comprising a scalar implementation of the loop body. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for loop vectorization, the method comprising:
-
detecting, by a computing device, a loop of a source program, the loop defining one or more scalar iterations and having a loop body for execution during each scalar iteration; generating, by the computing device, a transactional code segment, wherein generating the transactional code segment comprises generating a vectorized implementation of the loop body within the transactional code segment as a function of the loop of the source program, the vectorized implementation defining one or more vector iterations and including a vector memory read instruction that is capable of generating an exception; and generating, by the computing device, a non-transactional fallback code segment associated with the transactional code segment, the non-transactional fallback code to be executed in response to generation of an exception within the transactional code segment and comprising a scalar implementation of the loop body. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. One or more computer-readable storage media comprising a plurality of instructions that in response to being executed cause a computing device to:
-
detect a loop of a source program, the loop defining one or more scalar iterations and having a loop body for execution during each scalar iteration; generate a transactional code segment, wherein to generate the transactional code segment comprises to generate a vectorized implementation of the loop body within the transactional code segment as a function of the loop of the source program, the vectorized implementation to define one or more vector iterations and including a vector memory read instruction that is capable of generating an exception; and generate a non-transactional fallback code segment associated with the transactional code segment, the non-transactional fallback code to be executed in response to generation of an exception within the transactional code segment and comprising a scalar implementation of the loop body. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification