Cedric Bail

Tag: AVX2

May 10, 2026 SPMD

What a Reduction Loop Reveals About SPMD vs Per-Op Intrinsics

A side-by-side disassembly of the same AVX2 reduction reveals a structural advantage of whole-loop vectorization over per-operation intrinsics

Apr 15, 2026 SPMD

Byte Iteration at 32 Lanes: The Decomposed Index Path

How to iterate a []byte on AVX2 without drowning in index-register pressure

May 10, 2026 SPMD

What a Reduction Loop Reveals About SPMD vs Per-Op Intrinsics

A side-by-side disassembly of the same AVX2 reduction reveals a structural advantage of whole-loop vectorization over per-operation intrinsics

Apr 15, 2026 SPMD

What If Your Loops Were Just Faster?

A proof of concept for language-level data parallelism in Go, with live WASM demos and real benchmark results

May 10, 2026 SPMD

What a Reduction Loop Reveals About SPMD vs Per-Op Intrinsics

A side-by-side disassembly of the same AVX2 reduction reveals a structural advantage of whole-loop vectorization over per-operation intrinsics

May 10, 2026 SPMD

What a Reduction Loop Reveals About SPMD vs Per-Op Intrinsics

A side-by-side disassembly of the same AVX2 reduction reveals a structural advantage of whole-loop vectorization over per-operation intrinsics

Apr 15, 2026 SPMD

We Built Cross-Lane SIMD Primitives. None of Them Helped.

The most important negative result from our SPMD-for-Go proof of concept: explicit shuffles and rotations lost to compiler pattern detection on idiomatic Go

Apr 15, 2026 SPMD

Pattern Matching Outperformed Hand-Written SIMD

How compiler pattern detection on idiomatic Go outperformed explicit cross-lane SIMD builtins in our SPMD proof of concept

Apr 15, 2026 SPMD

What If Your Loops Were Just Faster?

A proof of concept for language-level data parallelism in Go, with live WASM demos and real benchmark results

Jul 13, 2025

Base64 Decoder - Complete Example

Full SPMD base64 decoder with cross-lane communication

Jul 13, 2025

IPv4 Parser - Complete Example

Full SPMD IPv4 address parser implementation

Jul 13, 2025

Mandelbrot Set - SPMD Version

SIMD-accelerated mandelbrot computation using go for loops

Jul 13, 2025 SPMD

Putting It All Together

Fast IPv4 Parsing with SPMD Go

Jul 12, 2025 SPMD

Cross-Lane Communication: When Lanes Need to Talk

Understanding why and how SPMD programs coordinate data between execution lanes through base64 decoding

Jun 21, 2025 SPMD

What if? Practical parallel data.

Using a hypothetical `go for` construct to implement a variety of string operation

May 10, 2026 SPMD

What a Reduction Loop Reveals About SPMD vs Per-Op Intrinsics

A side-by-side disassembly of the same AVX2 reduction reveals a structural advantage of whole-loop vectorization over per-operation intrinsics

Apr 15, 2026 SPMD

We Built Cross-Lane SIMD Primitives. None of Them Helped.

The most important negative result from our SPMD-for-Go proof of concept: explicit shuffles and rotations lost to compiler pattern detection on idiomatic Go

Apr 15, 2026 SPMD

How the Compiler Knows Your Load Is Contiguous

The most important backend optimization in SPMD: recognizing contiguous memory access through ChangeType and BinOp chains

Apr 15, 2026 SPMD

16 Bytes That Saved a Thousand Branches

The cheapest optimization in our SPMD proof of concept: a WASM linear memory guard zone for safe vector overreads

Apr 15, 2026 SPMD

Byte Iteration at 32 Lanes: The Decomposed Index Path

How to iterate a []byte on AVX2 without drowning in index-register pressure

Apr 15, 2026 SPMD

Pattern Matching Outperformed Hand-Written SIMD

How compiler pattern detection on idiomatic Go outperformed explicit cross-lane SIMD builtins in our SPMD proof of concept

Apr 15, 2026 SPMD

Loop Peeling: Where Most of the Speed Comes From

How SSA-level loop peeling enables the all-ones mask fast path that delivers ~2x of SPMD benchmark wins

Apr 15, 2026 SPMD

How SPMD Lives in the Compiler: Lessons from Building It

The mask-stack detour, predicated SSA, and why SPMD has to live at the heart of the compiler

Apr 15, 2026 SPMD

Writing SPMD Go: A Practical Guide

How to think about uniform vs varying, write go for loops, use reductions, and avoid the common pitfalls

Apr 15, 2026 SPMD

What If Your Loops Were Just Faster?

A proof of concept for language-level data parallelism in Go, with live WASM demos and real benchmark results

Jul 13, 2025

Base64 Decoder - Complete Example

Full SPMD base64 decoder with cross-lane communication

Jul 13, 2025

IPv4 Parser - Complete Example

Full SPMD IPv4 address parser implementation

Jul 13, 2025

Mandelbrot Set - SPMD Version

SIMD-accelerated mandelbrot computation using go for loops

Jul 13, 2025 SPMD

Putting It All Together

Fast IPv4 Parsing with SPMD Go

Jul 12, 2025 SPMD

Cross-Lane Communication: When Lanes Need to Talk

Understanding why and how SPMD programs coordinate data between execution lanes through base64 decoding

Jun 21, 2025 SPMD

What if? Practical parallel data.

Using a hypothetical `go for` construct to implement a variety of string operation

Jun 19, 2025 SPMD

Data Parallelism: simpler solution for Golang?

Warning Historical note. This post predates the actual TinyGo SPMD compiler. It’s a thought experiment from when the design space was still open. The …

Apr 15, 2026 SPMD

We Built Cross-Lane SIMD Primitives. None of Them Helped.

The most important negative result from our SPMD-for-Go proof of concept: explicit shuffles and rotations lost to compiler pattern detection on idiomatic Go

Apr 15, 2026 SPMD

How the Compiler Knows Your Load Is Contiguous

The most important backend optimization in SPMD: recognizing contiguous memory access through ChangeType and BinOp chains

Apr 15, 2026 SPMD

Byte Iteration at 32 Lanes: The Decomposed Index Path

How to iterate a []byte on AVX2 without drowning in index-register pressure

Apr 15, 2026 SPMD

Pattern Matching Outperformed Hand-Written SIMD

How compiler pattern detection on idiomatic Go outperformed explicit cross-lane SIMD builtins in our SPMD proof of concept

Apr 15, 2026 SPMD

Loop Peeling: Where Most of the Speed Comes From

How SSA-level loop peeling enables the all-ones mask fast path that delivers ~2x of SPMD benchmark wins

Apr 15, 2026 SPMD

How SPMD Lives in the Compiler: Lessons from Building It

The mask-stack detour, predicated SSA, and why SPMD has to live at the heart of the compiler

Apr 15, 2026 SPMD

How the Compiler Knows Your Load Is Contiguous

The most important backend optimization in SPMD: recognizing contiguous memory access through ChangeType and BinOp chains

Apr 15, 2026 SPMD

16 Bytes That Saved a Thousand Branches

The cheapest optimization in our SPMD proof of concept: a WASM linear memory guard zone for safe vector overreads

Apr 15, 2026 SPMD

Loop Peeling: Where Most of the Speed Comes From

How SSA-level loop peeling enables the all-ones mask fast path that delivers ~2x of SPMD benchmark wins

Apr 15, 2026 SPMD

16 Bytes That Saved a Thousand Branches

The cheapest optimization in our SPMD proof of concept: a WASM linear memory guard zone for safe vector overreads

Jul 13, 2025

Mandelbrot Set - Serial Version

Scalar mandelbrot computation compiled to browser WASM

Jul 13, 2025

Mandelbrot Set - SPMD Version

SIMD-accelerated mandelbrot computation using go for loops

Apr 15, 2026 SPMD

Byte Iteration at 32 Lanes: The Decomposed Index Path

How to iterate a []byte on AVX2 without drowning in index-register pressure

Apr 15, 2026 SPMD

Pattern Matching Outperformed Hand-Written SIMD

How compiler pattern detection on idiomatic Go outperformed explicit cross-lane SIMD builtins in our SPMD proof of concept

Apr 15, 2026 SPMD

Loop Peeling: Where Most of the Speed Comes From

How SSA-level loop peeling enables the all-ones mask fast path that delivers ~2x of SPMD benchmark wins

Apr 15, 2026 SPMD

How SPMD Lives in the Compiler: Lessons from Building It

The mask-stack detour, predicated SSA, and why SPMD has to live at the heart of the compiler

Apr 15, 2026 SPMD

How SPMD Lives in the Compiler: Lessons from Building It

The mask-stack detour, predicated SSA, and why SPMD has to live at the heart of the compiler

Apr 15, 2026 SPMD

Writing SPMD Go: A Practical Guide

How to think about uniform vs varying, write go for loops, use reductions, and avoid the common pitfalls

Apr 15, 2026 SPMD

What If Your Loops Were Just Faster?

A proof of concept for language-level data parallelism in Go, with live WASM demos and real benchmark results

Jul 13, 2025

Base64 Decoder - Complete Example

Full SPMD base64 decoder with cross-lane communication

Jul 13, 2025

IPv4 Parser - Complete Example

Full SPMD IPv4 address parser implementation

Jul 13, 2025

Mandelbrot Set - Serial Version

Scalar mandelbrot computation compiled to browser WASM

Jul 13, 2025

Mandelbrot Set - SPMD Version

SIMD-accelerated mandelbrot computation using go for loops

Jul 13, 2025 SPMD

Putting It All Together

Fast IPv4 Parsing with SPMD Go

Jun 21, 2025 SPMD

What if? Practical parallel data.

Using a hypothetical `go for` construct to implement a variety of string operation

Jun 19, 2025 SPMD

Data Parallelism: simpler solution for Golang?

Warning Historical note. This post predates the actual TinyGo SPMD compiler. It’s a thought experiment from when the design space was still open. The …

Apr 15, 2026 SPMD

Writing SPMD Go: A Practical Guide

How to think about uniform vs varying, write go for loops, use reductions, and avoid the common pitfalls

Apr 15, 2026 SPMD

What If Your Loops Were Just Faster?

A proof of concept for language-level data parallelism in Go, with live WASM demos and real benchmark results

Jul 13, 2025

Base64 Decoder - Complete Example

Full SPMD base64 decoder with cross-lane communication

Jul 13, 2025

IPv4 Parser - Complete Example

Full SPMD IPv4 address parser implementation

Jul 13, 2025

Mandelbrot Set - Serial Version

Scalar mandelbrot computation compiled to browser WASM

Jul 13, 2025

Mandelbrot Set - SPMD Version

SIMD-accelerated mandelbrot computation using go for loops

Jul 13, 2025 SPMD

Putting It All Together

Fast IPv4 Parsing with SPMD Go

Jul 13, 2025

Base64 Decoder - Complete Example

Full SPMD base64 decoder with cross-lane communication

Jul 12, 2025 SPMD

Cross-Lane Communication: When Lanes Need to Talk

Understanding why and how SPMD programs coordinate data between execution lanes through base64 decoding

Jul 13, 2025

Mandelbrot Set - Serial Version

Scalar mandelbrot computation compiled to browser WASM

Jul 13, 2025

Mandelbrot Set - SPMD Version

SIMD-accelerated mandelbrot computation using go for loops

Jul 13, 2025

IPv4 Parser - Complete Example

Full SPMD IPv4 address parser implementation

Jul 13, 2025 SPMD

Putting It All Together

Fast IPv4 Parsing with SPMD Go

Jun 19, 2025 SPMD

Data Parallelism: simpler solution for Golang?

Warning Historical note. This post predates the actual TinyGo SPMD compiler. It’s a thought experiment from when the design space was still open. The …

Nov 13, 2024

Layoffs in Tech: Impacts on Teams and Technical Debt

The tech sector, after a decade of remarkable growth, has faced significant layoffs. These events affect everyone-not just those directly impacted, but also the …

Nov 9, 2024

Tests Debt

Avoiding Tech debt in your tests!

Nov 13, 2024

Layoffs in Tech: Impacts on Teams and Technical Debt

The tech sector, after a decade of remarkable growth, has faced significant layoffs. These events affect everyone-not just those directly impacted, but also the …

Nov 13, 2024

Layoffs in Tech: Impacts on Teams and Technical Debt

The tech sector, after a decade of remarkable growth, has faced significant layoffs. These events affect everyone-not just those directly impacted, but also the …

Nov 9, 2024

Tests Debt

Avoiding Tech debt in your tests!

Nov 9, 2024

Tests Debt

Avoiding Tech debt in your tests!

Oct 29, 2024

The SuperH family

In depth SuperH instructions set

Oct 29, 2024

The SuperH family

In depth SuperH instructions set

Oct 29, 2024

The SuperH family

In depth SuperH instructions set