Optimization on Cedric Bail

How the Compiler Knows Your Load Is Contiguous

Wed, 15 Apr 2026 10:07:00 -0700

The single most important question the SPMD backend asks is: “is this memory access contiguous?” The answer determines whether your loop runs at vector speed or crawls through gather/scatter. This article is about the compiler pass that answers that question, and why it was worth more than every other optimization we built combined.

16 Bytes That Saved a Thousand Branches

Wed, 15 Apr 2026 10:06:00 -0700

The cheapest optimization in our entire SPMD proof of concept cost 16 bytes of memory and eliminated an entire class of branch-heavy fallback code.

Loop Peeling: Where Most of the Speed Comes From

Wed, 15 Apr 2026 10:03:00 -0700

If you took every optimization in our SPMD-for-Go proof of concept and ranked them by benchmark impact, loop peeling would be at the top. Not pattern detection. Not contiguous access analysis. Not the decomposed index path. Peeling. It is the structural foundation that everything else is built on, and the reason our hot loops run at one memory operation per store instead of three.