<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>X86 on Cedric Bail</title><link>http://bluebugs.github.io/tags/x86/</link><description>Recent content in X86 on Cedric Bail</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Wed, 15 Apr 2026 10:05:00 -0700</lastBuildDate><atom:link href="http://bluebugs.github.io/tags/x86/index.xml" rel="self" type="application/rss+xml"/><item><title>Byte Iteration at 32 Lanes: The Decomposed Index Path</title><link>http://bluebugs.github.io/blogs/spmd-decomposed-index/</link><pubDate>Wed, 15 Apr 2026 10:05:00 -0700</pubDate><guid>http://bluebugs.github.io/blogs/spmd-decomposed-index/</guid><description>&lt;p>When we set out to make &lt;code>for i, b := range byteSlice&lt;/code> fast on AVX2, the first thing that went wrong was the index vector. This article explains what happened, the technique we used to fix it, and the chain of bugs the fix resolved along the way.&lt;/p></description></item></channel></rss>