Going loopy

Written by me, proof-read by an LLM.
Details at end.

Which loop style is “best”? This very question led to the creation of Compiler Explorer! In 2011 I was arguing with my team about whether we could switch all our loops from ordinal or iterator-style to the “new” range-for1. I wrote a small shell script to iteratively show the compiler output as I edited code in vim, and the seed of Compiler Explorer was born.

C and C++ have many ways to phrase loops: for(), while(), do..while(), range-for (for (x : container)), STL algorithms like std::for_each, and now range transformations! Let’s see if loop style actually matters for performance.

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 8th December 2025.

Multiplying our way out of division

Written by me, proof-read by an LLM.
Details at end.

I occasionally give presentations to undergraduates, and one of my favourites is taking the students on a journey of optimising a “binary to decimal” routine1. There are a number of tricks, which I won’t go in to here, but the opening question I have is “how do you even turn a number into its ASCII representation?”

If you’ve never stopped to think about it, take a moment now to do so, it can be a fun problem.

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 7th December 2025.

Division

Written by me, proof-read by an LLM.
Details at end.

Broadly computers have the same kind of trouble with integer arithmetic that we do, addition and subtraction are quick, multiplication takes a bit longer, and then division is where things start to get really slow. On an average x86, addition is single-cycle (and many can happen at once), multiplication is three or so cycles, and then division can be up to a hundred!1 And usually only a single divide can happen at a time, too.

It’s worth avoiding divides then, and compilers know this too. Let’s start with the easy cases: you probably know that shifting right effectively divides by powers of two. You’ve likely seen return x >> 9; to divide the int x by 512, right? Let’s see what the compiler does if we use the more obvious return x / 512;:

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 6th December 2025.

ARM's barrel shifter tricks

Written by me, proof-read by an LLM.
Details at end.

Yesterday we talked about how the compiler handles multiplication with a constant on x86. x86 has some architectural quirks (like lea) that give the compiler quite some latitude for clever optimisations. But do other processors have similar fun quirks?

Today we’re going to look at what code gets generated for the ARM processor. Let’s see how our examples come out:

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 5th December 2025.

Multiplying with a constant

Written by me, proof-read by an LLM.
Details at end.

So far we’ve covered addition, and subtraction mostly follows suit. There’s an obvious next step: multiplication. Specifically, let’s try multiplying by constants on x861. We’ll try several constants: 2, 3, 4, 16, 25 and 522.

Before you look at the code below, make your predictions for what instructions the compiler will pick, then see if you were right or not. Let’s start with x86:

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 4th December 2025.

You can't fool the optimiser

Written by me, proof-read by an LLM.
Details at end.

Sometimes you’ll step through code in a debugger and find a complex-looking loop… that executes as a single instruction. The compiler saw through the obfuscation and generated the obvious code anyway.

Consider this assortment of highly questionable unsigned addition routines1 - for variety, here compiled for ARM (unlike yesterday’s addition example).

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 3rd December 2025.

Addressing the adding situation

Written by me, proof-read by an LLM.
Details at end.

Yesterday we saw how compilers zero registers efficiently. Today let’s look at something a tiny bit less trivial (though not by much): adding two integers. What do you think a simple x86 function to add two ints1 would look like? An add, right? Let’s take a look!

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 2nd December 2025.

Why xor eax, eax?

Written by me, proof-read by an LLM.
Details at end.

In one of my talks on assembly, I show a list of the 20 most executed instructions on an average x86 Linux desktop. All the usual culprits are there, mov, add, lea, sub, jmp, call and so on, but the surprise interloper is xor - “eXclusive OR”. In my 6502 hacking days, the presence of an exclusive OR was a sure-fire indicator you’d either found the encryption part of the code, or some kind of sprite routine. It’s surprising then, that a Linux machine just minding its own business, would be executing so many.

That is, until you remember that compilers love to emit a xor when setting a register to zero:

Filed under: Coding AoCO2025
Posted at 06:00:00 CST on 1st December 2025.

About Matt Godbolt

Matt Godbolt is a C++ developer living in Chicago. He works for Hudson River Trading on super fun but secret things. He is one half of the Two's Complement podcast. Follow him on Mastodon or Bluesky.