Tag: AVX2
Why a Reduction Loop Tells the Story: SPMD vs Per-Op SIMD Intrinsics
A side-by-side disassembly of the same AVX2 reduction reveals a structural advantage of whole-loop vectorization over per-operation intrinsics
Read MoreByte Iteration at 32 Lanes: The Decomposed Index Path
How to iterate a []byte on AVX2 without drowning in index-register pressure
Read MoreTag: Benchmarks
Why a Reduction Loop Tells the Story: SPMD vs Per-Op SIMD Intrinsics
A side-by-side disassembly of the same AVX2 reduction reveals a structural advantage of whole-loop vectorization over per-operation intrinsics
Read MoreSPMD for Go: What If Your Loops Were Just Faster?
A proof of concept for language-level data parallelism in Go, with live WASM demos and real benchmark results
Read MoreTag: Intrinsics
Why a Reduction Loop Tells the Story: SPMD vs Per-Op SIMD Intrinsics
A side-by-side disassembly of the same AVX2 reduction reveals a structural advantage of whole-loop vectorization over per-operation intrinsics
Read MoreTag: SIMD
Why a Reduction Loop Tells the Story: SPMD vs Per-Op SIMD Intrinsics
A side-by-side disassembly of the same AVX2 reduction reveals a structural advantage of whole-loop vectorization over per-operation intrinsics
Read MoreWe Built Cross-Lane SIMD Primitives. None of Them Helped.
The most important negative result from our SPMD-for-Go proof of concept: explicit shuffles and rotations lost to compiler pattern detection on idiomatic Go
Read MorePattern Matching Outperformed Hand-Written SIMD
How compiler pattern detection on idiomatic Go outperformed explicit cross-lane SIMD builtins in our SPMD proof of concept
Read MoreSPMD for Go: What If Your Loops Were Just Faster?
A proof of concept for language-level data parallelism in Go, with live WASM demos and real benchmark results
Read MoreCross-Lane Communication: When Lanes Need to Talk
Understanding why and how SPMD programs coordinate data between execution lanes through base64 decoding
Read MoreWhat if? Practical parallel data.
Using a hypothetical `go for` construct to implement a variety of string operation
Read MoreTag: SPMD
Why a Reduction Loop Tells the Story: SPMD vs Per-Op SIMD Intrinsics
A side-by-side disassembly of the same AVX2 reduction reveals a structural advantage of whole-loop vectorization over per-operation intrinsics
Read MoreWe Built Cross-Lane SIMD Primitives. None of Them Helped.
The most important negative result from our SPMD-for-Go proof of concept: explicit shuffles and rotations lost to compiler pattern detection on idiomatic Go
Read MoreHow the Compiler Knows Your Load Is Contiguous
The most important backend optimization in SPMD: recognizing contiguous memory access through ChangeType and BinOp chains
Read More16 Bytes That Saved a Thousand Branches
The cheapest optimization in our SPMD proof of concept: a WASM linear memory guard zone for safe vector overreads
Read MoreByte Iteration at 32 Lanes: The Decomposed Index Path
How to iterate a []byte on AVX2 without drowning in index-register pressure
Read MorePattern Matching Outperformed Hand-Written SIMD
How compiler pattern detection on idiomatic Go outperformed explicit cross-lane SIMD builtins in our SPMD proof of concept
Read MoreLoop Peeling: Where Most of the Speed Comes From
How SSA-level loop peeling enables the all-ones mask fast path that delivers ~2x of SPMD benchmark wins
Read MoreHow SPMD Lives in the Compiler: Lessons from Building It
The mask-stack detour, predicated SSA, and why SPMD has to live at the heart of the compiler
Read MoreWriting SPMD Go: A Practical Guide
How to think about uniform vs varying, write go for loops, use reductions, and avoid the common pitfalls
Read MoreSPMD for Go: What If Your Loops Were Just Faster?
A proof of concept for language-level data parallelism in Go, with live WASM demos and real benchmark results
Read MoreCross-Lane Communication: When Lanes Need to Talk
Understanding why and how SPMD programs coordinate data between execution lanes through base64 decoding
Read MoreWhat if? Practical parallel data.
Using a hypothetical `go for` construct to implement a variety of string operation
Read MoreData Parallelism: simpler solution for Golang?
Warning Historical note. This post predates the actual TinyGo SPMD compiler. It is a thought experiment from when the design space was still open. The …
Read MoreTag: Design
We Built Cross-Lane SIMD Primitives. None of Them Helped.
The most important negative result from our SPMD-for-Go proof of concept: explicit shuffles and rotations lost to compiler pattern detection on idiomatic Go
Read MoreTag: Compiler
How the Compiler Knows Your Load Is Contiguous
The most important backend optimization in SPMD: recognizing contiguous memory access through ChangeType and BinOp chains
Read MoreByte Iteration at 32 Lanes: The Decomposed Index Path
How to iterate a []byte on AVX2 without drowning in index-register pressure
Read MorePattern Matching Outperformed Hand-Written SIMD
How compiler pattern detection on idiomatic Go outperformed explicit cross-lane SIMD builtins in our SPMD proof of concept
Read MoreLoop Peeling: Where Most of the Speed Comes From
How SSA-level loop peeling enables the all-ones mask fast path that delivers ~2x of SPMD benchmark wins
Read MoreHow SPMD Lives in the Compiler: Lessons from Building It
The mask-stack detour, predicated SSA, and why SPMD has to live at the heart of the compiler
Read MoreTag: Optimization
How the Compiler Knows Your Load Is Contiguous
The most important backend optimization in SPMD: recognizing contiguous memory access through ChangeType and BinOp chains
Read More16 Bytes That Saved a Thousand Branches
The cheapest optimization in our SPMD proof of concept: a WASM linear memory guard zone for safe vector overreads
Read MoreLoop Peeling: Where Most of the Speed Comes From
How SSA-level loop peeling enables the all-ones mask fast path that delivers ~2x of SPMD benchmark wins
Read MoreTag: WASM
16 Bytes That Saved a Thousand Branches
The cheapest optimization in our SPMD proof of concept: a WASM linear memory guard zone for safe vector overreads
Read MoreTag: X86
Byte Iteration at 32 Lanes: The Decomposed Index Path
How to iterate a []byte on AVX2 without drowning in index-register pressure
Read MoreTag: Pattern-Detection
Pattern Matching Outperformed Hand-Written SIMD
How compiler pattern detection on idiomatic Go outperformed explicit cross-lane SIMD builtins in our SPMD proof of concept
Read MoreTag: SSA
Loop Peeling: Where Most of the Speed Comes From
How SSA-level loop peeling enables the all-ones mask fast path that delivers ~2x of SPMD benchmark wins
Read MoreHow SPMD Lives in the Compiler: Lessons from Building It
The mask-stack detour, predicated SSA, and why SPMD has to live at the heart of the compiler
Read MoreTag: LLVM
How SPMD Lives in the Compiler: Lessons from Building It
The mask-stack detour, predicated SSA, and why SPMD has to live at the heart of the compiler
Read MoreTag: Golang
Writing SPMD Go: A Practical Guide
How to think about uniform vs varying, write go for loops, use reductions, and avoid the common pitfalls
Read MoreSPMD for Go: What If Your Loops Were Just Faster?
A proof of concept for language-level data parallelism in Go, with live WASM demos and real benchmark results
Read MoreWhat if? Practical parallel data.
Using a hypothetical `go for` construct to implement a variety of string operation
Read MoreData Parallelism: simpler solution for Golang?
Warning Historical note. This post predates the actual TinyGo SPMD compiler. It is a thought experiment from when the design space was still open. The …
Read MoreTag: Tutorial
Writing SPMD Go: A Practical Guide
How to think about uniform vs varying, write go for loops, use reductions, and avoid the common pitfalls
Read MoreTag: Performance
SPMD for Go: What If Your Loops Were Just Faster?
A proof of concept for language-level data parallelism in Go, with live WASM demos and real benchmark results
Read MoreTag: Base64
Cross-Lane Communication: When Lanes Need to Talk
Understanding why and how SPMD programs coordinate data between execution lanes through base64 decoding
Read MoreTag: Mandelbrot
Tag: Networking
Tag: Data-Parallelism
Data Parallelism: simpler solution for Golang?
Warning Historical note. This post predates the actual TinyGo SPMD compiler. It is a thought experiment from when the design space was still open. The …
Read MoreTag: Engineering
Layoffs in Tech: Impacts on Teams and Technical Debt
The tech sector, after a decade of remarkable growth, has faced significant layoffs. These events affect everyone-not just those directly impacted, but also the …
Read MoreTag: Teams
Layoffs in Tech: Impacts on Teams and Technical Debt
The tech sector, after a decade of remarkable growth, has faced significant layoffs. These events affect everyone-not just those directly impacted, but also the …
Read MoreTag: Tech-Debt
Layoffs in Tech: Impacts on Teams and Technical Debt
The tech sector, after a decade of remarkable growth, has faced significant layoffs. These events affect everyone-not just those directly impacted, but also the …
Read More