<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>AVX2 on Cedric Bail</title><link>http://bluebugs.github.io/tags/avx2/</link><description>Recent content in AVX2 on Cedric Bail</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sun, 10 May 2026 10:00:00 -0700</lastBuildDate><atom:link href="http://bluebugs.github.io/tags/avx2/index.xml" rel="self" type="application/rss+xml"/><item><title>Why a Reduction Loop Tells the Story: SPMD vs Per-Op SIMD Intrinsics</title><link>http://bluebugs.github.io/blogs/spmd-vs-intrinsics-reduction/</link><pubDate>Sun, 10 May 2026 10:00:00 -0700</pubDate><guid>http://bluebugs.github.io/blogs/spmd-vs-intrinsics-reduction/</guid><description>&lt;p>We have a small surprise from our SPMD proof of concept. On three identical AVX2 reductions over &lt;code>[]int32&lt;/code> &amp;ndash; sum, min, contains &amp;ndash; our SPMD-compiled code is 1.8x to 2.6x faster than the same algorithms written against &lt;a href="https://github.com/samber/lo">&lt;code>samber/lo/exp/simd&lt;/code>&lt;/a>, the experimental Go library built on Go&amp;rsquo;s new &lt;code>simd&lt;/code> intrinsics package. Both run AVX2 8-wide. Both issue roughly the same number of vector ops in the body. The runtime gap is not about ISA choice. It is about what each compiler can see when it codegens the loop, and that turns out to be a structural property of how the intrinsic API is shaped &amp;ndash; not a missed optimization in &lt;code>go&lt;/code>.&lt;/p></description></item><item><title>Byte Iteration at 32 Lanes: The Decomposed Index Path</title><link>http://bluebugs.github.io/blogs/spmd-decomposed-index/</link><pubDate>Wed, 15 Apr 2026 10:05:00 -0700</pubDate><guid>http://bluebugs.github.io/blogs/spmd-decomposed-index/</guid><description>&lt;p>When we set out to make &lt;code>for i, b := range byteSlice&lt;/code> fast on AVX2, the first thing that went wrong was the index vector. This article explains what happened, the technique we used to fix it, and the chain of bugs the fix resolved along the way.&lt;/p></description></item></channel></rss>