AUTOMATIC LOOP VECTORIZATION USING HARDWARE TRANSACTIONAL MEMORY
First Claim
1. A computing device for loop vectorization, the computing device comprising:
- an analysis module to detect a loop of a source program, the loop to define one or more scalar iterations and have a loop body for execution during each scalar iteration; and
a vectorization module to;
generate a transactional code segment, wherein to generate the transactional code segment comprises to generate a vectorized implementation of the loop body within the transactional code segment as a function of the loop of the source program, the vectorized implementation to define one or more vector iterations and to include a vector memory read instruction capable of generation of an exception; and
generate a non-transactional fallback code segment associated with the transactional code segment, the non-transactional fallback code to be executed in response to generation of an exception within the transactional code segment and comprising a scalar implementation of the loop body.
1 Assignment
0 Petitions
Accused Products
Abstract
Technologies for automatic loop vectorization include a computing device with an optimizing compiler. During an optimization pass, the compiler identifies a loop and generates a transactional code segment including a vectorized implementation of the loop body including one or more vector memory read instructions capable of generating an exception. The compiler also generates a non-transactional fallback code segment including a scalar implementation of the loop body that is executed in response to an exception generated within the transactional code segment. The compiler may detect whether the loop contains a memory read dependent on a condition that may be updated in a previous iteration or whether the loop contains a potential data dependence between two iterations. The compiler may generate a dynamic check for an actual data dependence and an explicit transactional abort instruction to be executed when an actual data dependence exists. Other embodiments are described and claimed.
40 Citations
20 Claims
-
1. A computing device for loop vectorization, the computing device comprising:
-
an analysis module to detect a loop of a source program, the loop to define one or more scalar iterations and have a loop body for execution during each scalar iteration; and a vectorization module to; generate a transactional code segment, wherein to generate the transactional code segment comprises to generate a vectorized implementation of the loop body within the transactional code segment as a function of the loop of the source program, the vectorized implementation to define one or more vector iterations and to include a vector memory read instruction capable of generation of an exception; and generate a non-transactional fallback code segment associated with the transactional code segment, the non-transactional fallback code to be executed in response to generation of an exception within the transactional code segment and comprising a scalar implementation of the loop body. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for loop vectorization, the method comprising:
-
detecting, by a computing device, a loop of a source program, the loop defining one or more scalar iterations and having a loop body for execution during each scalar iteration; generating, by the computing device, a transactional code segment, wherein generating the transactional code segment comprises generating a vectorized implementation of the loop body within the transactional code segment as a function of the loop of the source program, the vectorized implementation defining one or more vector iterations and including a vector memory read instruction that is capable of generating an exception; and generating, by the computing device, a non-transactional fallback code segment associated with the transactional code segment, the non-transactional fallback code to be executed in response to generation of an exception within the transactional code segment and comprising a scalar implementation of the loop body. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. One or more computer-readable storage media comprising a plurality of instructions that in response to being executed cause a computing device to:
-
detect a loop of a source program, the loop defining one or more scalar iterations and having a loop body for execution during each scalar iteration; generate a transactional code segment, wherein to generate the transactional code segment comprises to generate a vectorized implementation of the loop body within the transactional code segment as a function of the loop of the source program, the vectorized implementation to define one or more vector iterations and including a vector memory read instruction that is capable of generating an exception; and generate a non-transactional fallback code segment associated with the transactional code segment, the non-transactional fallback code to be executed in response to generation of an exception within the transactional code segment and comprising a scalar implementation of the loop body. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification