<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Compiler on Cedric Bail</title><link>http://bluebugs.github.io/tags/compiler/</link><description>Recent content in Compiler on Cedric Bail</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Wed, 15 Apr 2026 10:07:00 -0700</lastBuildDate><atom:link href="http://bluebugs.github.io/tags/compiler/index.xml" rel="self" type="application/rss+xml"/><item><title>How the Compiler Knows Your Load Is Contiguous</title><link>http://bluebugs.github.io/blogs/spmd-contiguous-analysis/</link><pubDate>Wed, 15 Apr 2026 10:07:00 -0700</pubDate><guid>http://bluebugs.github.io/blogs/spmd-contiguous-analysis/</guid><description>&lt;p>The single most important question the SPMD backend asks is: &lt;strong>&amp;ldquo;is this memory access contiguous?&amp;rdquo;&lt;/strong> The answer determines whether your loop runs at vector speed or crawls through gather/scatter. This article is about the compiler pass that answers that question, and why it was worth more than every other optimization we built combined.&lt;/p></description></item><item><title>Byte Iteration at 32 Lanes: The Decomposed Index Path</title><link>http://bluebugs.github.io/blogs/spmd-decomposed-index/</link><pubDate>Wed, 15 Apr 2026 10:05:00 -0700</pubDate><guid>http://bluebugs.github.io/blogs/spmd-decomposed-index/</guid><description>&lt;p>When we set out to make &lt;code>for i, b := range byteSlice&lt;/code> fast on AVX2, the first thing that went wrong was the index vector. This article explains what happened, the technique we used to fix it, and the chain of bugs the fix resolved along the way.&lt;/p></description></item><item><title>Pattern Matching Outperformed Hand-Written SIMD</title><link>http://bluebugs.github.io/blogs/spmd-pattern-matching/</link><pubDate>Wed, 15 Apr 2026 10:04:00 -0700</pubDate><guid>http://bluebugs.github.io/blogs/spmd-pattern-matching/</guid><description>&lt;p>Our base64 decoder was implemented twice. Version 1 used explicit cross-lane operations &amp;mdash; shuffles, rotations, compact stores. It peaked at roughly 2x scalar performance. Version 2 used four plain &lt;code>go for&lt;/code> loops with no cross-lane operations at all. It hit approximately 17 GB/s on AVX2 &amp;mdash; about 77% of simdutf C++ and 9x faster than Go&amp;rsquo;s &lt;code>encoding/base64&lt;/code>. The simpler code outperformed the clever code by a wide margin.&lt;/p></description></item><item><title>Loop Peeling: Where Most of the Speed Comes From</title><link>http://bluebugs.github.io/blogs/spmd-loop-peeling/</link><pubDate>Wed, 15 Apr 2026 10:03:00 -0700</pubDate><guid>http://bluebugs.github.io/blogs/spmd-loop-peeling/</guid><description>&lt;p>If you took every optimization in our SPMD-for-Go proof of concept and ranked them by benchmark impact, loop peeling would be at the top. Not pattern detection. Not contiguous access analysis. Not the decomposed index path. Peeling. It is the structural foundation that everything else is built on, and the reason our hot loops run at one memory operation per store instead of three.&lt;/p></description></item><item><title>How SPMD Lives in the Compiler: Lessons from Building It</title><link>http://bluebugs.github.io/blogs/spmd-compiler-internals/</link><pubDate>Wed, 15 Apr 2026 10:02:00 -0700</pubDate><guid>http://bluebugs.github.io/blogs/spmd-compiler-internals/</guid><description>&lt;p>We added a way to express data parallelism in idiomatic Go. Earlier discussions around this space often stalled on a simple question: how would it actually work in the compiler? A working proof of concept that compiles &lt;code>go for&lt;/code> loops to WASM SIMD128, x86 SSE, and x86 AVX2, with end-to-end tests passing and a base64 decoder reaching ~77% of simdutf C++ throughput, is a better answer than another round of speculation. The goal here is to make the implementation strategy concrete. Along the way we learned one lesson the hard way: &lt;strong>SPMD is a compiler feature that has to live at the heart of the SSA form.&lt;/strong> Everything else follows from that.&lt;/p>
&lt;p>This article is for compiler engineers. If you want to see the benchmarks and the short version, read &lt;a href="../spmd-results/">the overview&lt;/a>. If you want to write SPMD Go code, the &lt;a href="../writing-spmd-go/">practical guide&lt;/a> is for you. Here, we talk about what we built inside the compiler, what we got wrong, and what we would do differently.&lt;/p></description></item></channel></rss>