<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Optimization on Cedric Bail</title><link>http://bluebugs.github.io/tags/optimization/</link><description>Recent content in Optimization on Cedric Bail</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Wed, 15 Apr 2026 10:07:00 -0700</lastBuildDate><atom:link href="http://bluebugs.github.io/tags/optimization/index.xml" rel="self" type="application/rss+xml"/><item><title>How the Compiler Knows Your Load Is Contiguous</title><link>http://bluebugs.github.io/blogs/spmd-contiguous-analysis/</link><pubDate>Wed, 15 Apr 2026 10:07:00 -0700</pubDate><guid>http://bluebugs.github.io/blogs/spmd-contiguous-analysis/</guid><description>&lt;p>The single most important question the SPMD backend asks is: &lt;strong>&amp;ldquo;is this memory access contiguous?&amp;rdquo;&lt;/strong> The answer determines whether your loop runs at vector speed or crawls through gather/scatter. This article is about the compiler pass that answers that question, and why it was worth more than every other optimization we built combined.&lt;/p></description></item><item><title>16 Bytes That Saved a Thousand Branches</title><link>http://bluebugs.github.io/blogs/spmd-wasm-guard-zone/</link><pubDate>Wed, 15 Apr 2026 10:06:00 -0700</pubDate><guid>http://bluebugs.github.io/blogs/spmd-wasm-guard-zone/</guid><description>&lt;p>The cheapest optimization in our entire SPMD proof of concept cost 16 bytes of memory and eliminated an entire class of branch-heavy fallback code.&lt;/p></description></item><item><title>Loop Peeling: Where Most of the Speed Comes From</title><link>http://bluebugs.github.io/blogs/spmd-loop-peeling/</link><pubDate>Wed, 15 Apr 2026 10:03:00 -0700</pubDate><guid>http://bluebugs.github.io/blogs/spmd-loop-peeling/</guid><description>&lt;p>If you took every optimization in our SPMD-for-Go proof of concept and ranked them by benchmark impact, loop peeling would be at the top. Not pattern detection. Not contiguous access analysis. Not the decomposed index path. Peeling. It is the structural foundation that everything else is built on, and the reason our hot loops run at one memory operation per store instead of three.&lt;/p></description></item></channel></rss>