Motivation
Vectorization of loops with multiple exits and loops with data dependent exits (dd-exit) in particular has always been an attractive but challenging task. The last attempt to improve the situation was taken by Philip several years ago. Here is a corresponding discussion on llvm-dev list [llvm-dev] Vectorizing multiple exit loops and Philip’s notes on github https://github.com/preames/public-notes/blob/master/multiple-exit-vectorization.rst. The goal of this write-up is to kick-off a discussion which should help finally define our vision and achieve progress in this critical but difficult task.
Background
Credit: Major part of this section is taken from Philip Reames public notes.
Currently, the loop vectorizer handles single exit loops (where the exit is from the latch) and some forms of multi-exit loops. There are two kinds of exits from the loop:
- A regular/computable early exits where there is no dependence on values loaded from memory. For example, an exit condition like
if (i > M) break;
. Typically, such conditions should either have already been removed by early optimizations (like fusing with the latch termination condition above while(i < N)
) or handled by the vectorizer by hoisting them out of the loop. These are out of the focus.
- A data dependent early exit where the condition involves loading from memory and performing some check. For example,
if (a[i] == X) break
. Here, the most important legality constraint we need to prove irrespective of how we choose to vectorize, is that in the vectorized form, we do not load from unmapped memory.
In addition, the loop vectorizer has support for internal predication. That is, the body of the loop can contain instructions which execute conditionally.
There are three key differences that need to be considered to support dd-exit loop vectorization.
- In addition to explicit conditions, instruction execution becomes dependent on all preceding (at runtime) conditions guarding dd-exits. That is, during “normal” loop vectorization, instructions that are not explicitly guarded will be unconditionally executed for each lane in the vector code. While in the d-exit case, the instructions should not be executed for already exited lanes. That means execution becomes dependent on all conditions guarding dd-exits. Moreover, that effectively means it should be safe to evaluate all such conditions upfront even though some of them may not be evaluated at all in the scalar execution. This introduces additional legality constrains we should consider.
- The way how live outs are calculated. In the first case, live out value is simply extracted from the last lane of the final vector iteration. While in the second case, it should be extracted from the last lane active at the definition(s) of these live outs.
- Cost modelling should take possible early loop termination into account
Proposed approach
The dd-exit loop vectorization can be handled using different approaches. We discuss two most promising approaches (as we view it) in details here https://github.com/annamthomas/work-notes/blob/main/MultiDataDependentExitVectorization.rst. Besides, we provide an experimental implementation of the “early exit” approach as a standalone pass ⚙ D155948 [LV][WIP] dd-exit loops vectorization which serves as proof of concept.
Conclusion
We talk about two different approaches to handle dd-exit loop vectorization and go over how to handle major aspects of legality, functionality and cost model. Each approach has different motivations such as the approach works best when predication is cheap. There are couple of open questions such as:
- how we would identify loops where this sort of vectorization is not profitable (both approaches are affected by this, but the penalty touches different aspects)
- General speculatability is a hard problem. Different languages provide some notion of array length which can be used to generate a check added within the loop.
4 Likes
@fhahn might be interested.