Welcome to my website

Recent Blog

image from Why a Reduction Loop Tells the Story: SPMD vs Per-Op SIMD Intrinsics

Why a Reduction Loop Tells the Story: SPMD vs Per-Op SIMD Intrinsics

Read More
image from How the Compiler Knows Your Load Is Contiguous

How the Compiler Knows Your Load Is Contiguous

Read More

More

16 Bytes That Saved a Thousand Branches

Byte Iteration at 32 Lanes: The Decomposed Index Path

Pattern Matching Outperformed Hand-Written SIMD

Loop Peeling: Where Most of the Speed Comes From

All Blog