Recent results on the SSA form open promising directions for the design of new register allocation heuristics for JIT compilation. In particular, heuristics based on tree scans with two decoupled phases, one for spilling, one for splitting/coloring/coalescing, seem good candidates for designing memory-friendly, fast, and competitive register allocators. Another class of register allocators, well-suited for JIT compilation, are those based on linear scans. Most of them perform coalescing poorly but also do live-range splitting (mostly on control-flow edges) to avoid spilling. This leads to a large amount of register-to-register copies inside basic blocks but also, implicitly, on critical edges, i.e., edges that flow from a block with several successors to a block with several predecessors.
This paper presents a new back-end technique, for register-allocated codes, that we call parallel copy motion, which amounts to move a group of parallel copy instructions, from a program point to another. In contrast with a classical scheduler that must preserve data dependencies, our copy motion also permutes register assignments so that a copy can ``traverse'' all instructions of a basic block, except those with conflicting register constraints. Moreover, with an adequate management of compensation code (namely, the reverse of the parallel copy), parallel copies can also be moved across edges. % This technique has several potential applications. First, parallel copies can be placed either where the scheduling has some empty slots (for multiple-issues architectures), or where fewer copies are necessary because some variables are dead at this point. Thus, one application is the elimination of copies or the reduction of the cost of copies by a better placement. The compensation technique generalizes the motion of parallel copies to a global scope by allowing motion across basic block boundaries. Another application is to move copies out of critical edges, when it is beneficial compared to leaving the copies there and splitting the edge. A related use case is to extend the scope of some optimizations to control flow graphs with non splittable edges (abnormal edges) as used in some compiler representations for specific architectural constraints, region boundaries, or exception handling code.
Experiments with the SPECint benchmarks suite and our own benchmark suite show that we can now apply broadly an SSA-based register allocator: all procedures, even with abnormal edges, can be treated, which means that, after copy motion, no moves remain on abnormal edges. Simple strategies for moving copies from edges and locally inside basic block show significant average improvements ($3\%$ for both SPECint and our benchmark suite), with no degradation. It let us believe that the approach is promising, and not only for improving coalescing in fast register allocators.