<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Benchmarks on Cedric Bail</title><link>http://bluebugs.github.io/tags/benchmarks/</link><description>Recent content in Benchmarks on Cedric Bail</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sun, 10 May 2026 10:00:00 -0700</lastBuildDate><atom:link href="http://bluebugs.github.io/tags/benchmarks/index.xml" rel="self" type="application/rss+xml"/><item><title>Why a Reduction Loop Tells the Story: SPMD vs Per-Op SIMD Intrinsics</title><link>http://bluebugs.github.io/blogs/spmd-vs-intrinsics-reduction/</link><pubDate>Sun, 10 May 2026 10:00:00 -0700</pubDate><guid>http://bluebugs.github.io/blogs/spmd-vs-intrinsics-reduction/</guid><description>&lt;p>We have a small surprise from our SPMD proof of concept. On three identical AVX2 reductions over &lt;code>[]int32&lt;/code> &amp;ndash; sum, min, contains &amp;ndash; our SPMD-compiled code is 1.8x to 2.6x faster than the same algorithms written against &lt;a href="https://github.com/samber/lo">&lt;code>samber/lo/exp/simd&lt;/code>&lt;/a>, the experimental Go library built on Go&amp;rsquo;s new &lt;code>simd&lt;/code> intrinsics package. Both run AVX2 8-wide. Both issue roughly the same number of vector ops in the body. The runtime gap is not about ISA choice. It is about what each compiler can see when it codegens the loop, and that turns out to be a structural property of how the intrinsic API is shaped &amp;ndash; not a missed optimization in &lt;code>go&lt;/code>.&lt;/p></description></item><item><title>SPMD for Go: What If Your Loops Were Just Faster?</title><link>http://bluebugs.github.io/blogs/spmd-results/</link><pubDate>Wed, 15 Apr 2026 10:00:00 -0700</pubDate><guid>http://bluebugs.github.io/blogs/spmd-results/</guid><description>&lt;p>We wrote a base64 decoder in about 40 lines of Go. It runs at roughly 17 GB/s on AVX2 &amp;ndash; about 9x faster than &lt;code>encoding/base64&lt;/code> and within 77% of the best C++ SIMD library (&lt;a href="https://github.com/simdutf/simdutf">simdutf&lt;/a>). No assembly. No intrinsics. No &lt;code>unsafe&lt;/code>. Just Go with a new loop keyword.&lt;/p>
&lt;p>This is a proof of concept, not a proposal text or an upstream implementation plan. The point is narrower: show that loop-level data parallelism can fit Go&amp;rsquo;s style, compile to real SIMD on multiple targets, and deliver meaningful wins on real workloads. Below are two live demos running real WebAssembly code in your browser.&lt;/p></description></item></channel></rss>