When compilers surprise you

Written by me, proof-read by an LLM.
Details at end.

Every now and then a compiler will surprise me with a really smart trick. When I first saw this optimisation I could hardly believe it. I was looking at loop optimisation, and wrote something like this simple function that sums all the numbers up to a given value:

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 24th December 2025.

Switching it up a bit

Written by me, proof-read by an LLM.
Details at end.

The standard wisdom is that switch statements compile to jump tables. And they do - when the compiler can’t find something cleverer to do instead.

Let’s start with a really simple example:

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 23rd December 2025.

Clever memory tricks

Written by me, proof-read by an LLM.
Details at end.

After exploring SIMD vectorisation over the last couple of days, let’s shift gears to look at another class of compiler cleverness: memory access patterns. String comparisons seem straightforward enough - check the length, compare the bytes, done. But watch what Clang does when comparing against compile-time constants, and you’ll see some rather clever tricks involving overlapping memory reads and bitwise operations. What looks like it should be a call to memcmp becomes a handful of inline instructions that exploit the fact that the comparison value is known at compile time1.

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 22nd December 2025.

When SIMD Fails: Floating Point Associativity

Written by me, proof-read by an LLM.
Details at end.

Yesterday we saw SIMD work beautifully with integers. But floating point has a surprise in store. Let’s try summing an array1:

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 21st December 2025.

SIMD City: Auto-vectorisation

Written by me, proof-read by an LLM.
Details at end.

It’s time to look at one of the most sophisticated optimisations compilers can do: autovectorisation. Most “big data” style problems boil down to “do this maths to huge arrays”, and the limiting factor isn’t the maths itself, but the feeding of instructions to the CPU, along with the data it needs.

To help with this problem, CPU designers came up with SIMD: “Single Instruction, Multiple Data”. One instruction tells the CPU what to do with a whole chunk of data. These chunks could be 2, 4, 8, 16 or similar units of integers or floating point values, all treated individually. Initially1 the only way to use this capability was to write assembly language directly, but luckily for us, compilers are now able to help.

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 20th December 2025.

Chasing your tail

Written by me, proof-read by an LLM.
Details at end.

Inlining is fantastic, as we’ve seen recently. There’s a place it surely can’t help though: recursion! If we call our own function, then surely we can’t inline…

Let’s see what the compiler does with the classic recursive “greatest common divisor” routine - surely it can’t avoid calling itself?

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 19th December 2025.

Partial inlining

Written by me, proof-read by an LLM.
Details at end.

We’ve learned how important inlining is to optimisation, but also that it might sometimes cause code bloat. Inlining doesn’t have to be all-or-nothing!

Let’s look at a simple function that has a fast path and slow path; and then see how the compiler handles it1.

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 18th December 2025.

Inlining - the ultimate optimisation

Written by me, proof-read by an LLM.
Details at end.

Sixteen days in, and I’ve been dancing around what many consider the fundamental compiler optimisation: inlining. Not because it’s complicated - quite the opposite! - but because inlining is less interesting for what it does (copy-paste code), and more interesting for what it enables.

Initially inlining was all about avoiding the expense of the call1 itself, but nowadays inlining enables many other optimisations to shine.

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 17th December 2025.

About Matt Godbolt

Matt Godbolt is a C++ developer living in Chicago. He works for Hudson River Trading on super fun but secret things. He is one half of the Two's Complement podcast. Follow him on Mastodon or Bluesky.