SIMD City: Auto-vectorisation

Written by me, proof-read by an LLM.
Details at end.

It’s time to look at one of the most sophisticated optimisations compilers can do: autovectorisation. Most “big data” style problems boil down to “do this maths to huge arrays”, and the limiting factor isn’t the maths itself, but the feeding of instructions to the CPU, along with the data it needs.

To help with this problem, CPU designers came up with SIMD: “Single Instruction, Multiple Data”. One instruction tells the CPU what to do with a whole chunk of data. These chunks could be 2, 4, 8, 16 or similar units of integers or floating point values, all treated individually. Initially1 the only way to use this capability was to write assembly language directly, but luckily for us, compilers are now able to help.

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 20th December 2025.

Chasing your tail

Written by me, proof-read by an LLM.
Details at end.

Inlining is fantastic, as we’ve seen recently. There’s a place it surely can’t help though: recursion! If we call our own function, then surely we can’t inline…

Let’s see what the compiler does with the classic recursive “greatest common divisor” routine - surely it can’t avoid calling itself?

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 19th December 2025.

Partial inlining

Written by me, proof-read by an LLM.
Details at end.

We’ve learned how important inlining is to optimisation, but also that it might sometimes cause code bloat. Inlining doesn’t have to be all-or-nothing!

Let’s look at a simple function that has a fast path and slow path; and then see how the compiler handles it1.

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 18th December 2025.

Inlining - the ultimate optimisation

Written by me, proof-read by an LLM.
Details at end.

Sixteen days in, and I’ve been dancing around what many consider the fundamental compiler optimisation: inlining. Not because it’s complicated - quite the opposite! - but because inlining is less interesting for what it does (copy-paste code), and more interesting for what it enables.

Initially inlining was all about avoiding the expense of the call1 itself, but nowadays inlining enables many other optimisations to shine.

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 17th December 2025.

Calling all arguments

Written by me, proof-read by an LLM.
Details at end.

Today we’re looking at calling conventions1 - which aren’t purely optimisation related but are important to understand. The calling convention is part of the ABI (Application Binary Interface), and varies from architecture to architecture and even OS to OS. Today I’ll concentrate on the System V ABI for x86 on Linux, as (to me) it’s the most sane ABI.

Before we go on: I’d be remiss if I didn’t point out that I can never remember which register has what it in, and for years I had a Post It note on my monitor with a hand-written crib sheet of the ABI. While on holiday I had an idea: Why not put the ABI on a mug! I created these ABI mugs and you can get your own one - and support Compiler Explorer - at the Compiler Explorer shop.

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 16th December 2025.

Aliasing

Written by me, proof-read by an LLM.
Details at end.

Yesterday we ended on a bit of a downer: aliasing stopped optimisations dead in their tracks. I know this is supposed to be the Advent of Compiler Optimisations, not the Advent of Compiler Giving Up! Knowing why your compiler can’t optimise is just as important as knowing all the clever tricks it can pull off.

Let’s take a simple example of a counter class1. It accumulates integers into a member variable total. I’ve used C++ templates to show two versions of the code: one that accumulates in an int and one that accumulates in an long.

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 15th December 2025.

When LICM fails us

Written by me, proof-read by an LLM.
Details at end.

Yesterday’s LICM post ended with the compiler pulling invariants like size() and get_range out of our loop - clean assembly, great performance. Job done, right?

Not quite. Let’s see how that optimisation can disappear.

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 14th December 2025.

Loop-Invariant Code Motion

Written by me, proof-read by an LLM.
Details at end.

Look back at our simple loop example - there’s an optimisation I completely glossed over. Let me show you what I mean:

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 13th December 2025.

About Matt Godbolt

Matt Godbolt is a C++ developer living in Chicago. He works for Hudson River Trading on super fun but secret things. He is one half of the Two's Complement podcast. Follow him on Mastodon or Bluesky.