Cedric Bail
  • Blog
  • Code Examples

SPMD

Below you will find pages that utilize the taxonomy term “SPMD”

May 10, 2026 SPMD

Why a Reduction Loop Tells the Story: SPMD vs Per-Op SIMD Intrinsics

A side-by-side disassembly of the same AVX2 reduction reveals a structural advantage of whole-loop vectorization over per-operation intrinsics

Read More
Apr 15, 2026 SPMD

We Built Cross-Lane SIMD Primitives. None of Them Helped.

The most important negative result from our SPMD-for-Go proof of concept: explicit shuffles and rotations lost to compiler pattern detection on idiomatic Go

Read More
Apr 15, 2026 SPMD

How the Compiler Knows Your Load Is Contiguous

The most important backend optimization in SPMD: recognizing contiguous memory access through ChangeType and BinOp chains

Read More
Apr 15, 2026 SPMD

16 Bytes That Saved a Thousand Branches

The cheapest optimization in our SPMD proof of concept: a WASM linear memory guard zone for safe vector overreads

Read More
Apr 15, 2026 SPMD

Byte Iteration at 32 Lanes: The Decomposed Index Path

How to iterate a []byte on AVX2 without drowning in index-register pressure

Read More
Apr 15, 2026 SPMD

Pattern Matching Outperformed Hand-Written SIMD

How compiler pattern detection on idiomatic Go outperformed explicit cross-lane SIMD builtins in our SPMD proof of concept

Read More
Apr 15, 2026 SPMD

Loop Peeling: Where Most of the Speed Comes From

How SSA-level loop peeling enables the all-ones mask fast path that delivers ~2x of SPMD benchmark wins

Read More
Apr 15, 2026 SPMD

How SPMD Lives in the Compiler: Lessons from Building It

The mask-stack detour, predicated SSA, and why SPMD has to live at the heart of the compiler

Read More
Apr 15, 2026 SPMD

Writing SPMD Go: A Practical Guide

How to think about uniform vs varying, write go for loops, use reductions, and avoid the common pitfalls

Read More
Apr 15, 2026 SPMD

SPMD for Go: What If Your Loops Were Just Faster?

A proof of concept for language-level data parallelism in Go, with live WASM demos and real benchmark results

Read More
Jul 13, 2025

Base64 Decoder - Complete Example

Full SPMD base64 decoder with cross-lane communication

Read More
Jul 13, 2025

IPv4 Parser - Complete Example

Full SPMD IPv4 address parser implementation

Read More
Jul 13, 2025

Mandelbrot Set - SPMD Version

SIMD-accelerated mandelbrot computation using go for loops

Read More
Jul 13, 2025 SPMD

Putting It All Together

Fast IPv4 Parsing with SPMD Go

Read More
Jul 12, 2025 SPMD

Cross-Lane Communication: When Lanes Need to Talk

Understanding why and how SPMD programs coordinate data between execution lanes through base64 decoding

Read More
Jun 21, 2025 SPMD

What if? Practical parallel data.

Using a hypothetical `go for` construct to implement a variety of string operation

Read More
Jun 19, 2025 SPMD

Data Parallelism: simpler solution for Golang?

Warning Historical note. This post predates the actual TinyGo SPMD compiler. It is a thought experiment from when the design space was still open. The …

Read More
© Cedric Bail 2026