Written by me, proof-read by an LLM.
Details at end.
So far we’ve covered addition, and subtraction mostly follows suit. There’s an obvious next step: multiplication. Specifically, let’s try multiplying by constants on x861. We’ll try several constants: 2, 3, 4, 16, 25 and 522.
Before you look at the code below, make your predictions for what instructions the compiler will pick, then see if you were right or not. Let’s start with x86:
Written by me, proof-read by an LLM.
Details at end.
Sometimes you’ll step through code in a debugger and find a complex-looking loop… that executes as a single instruction. The compiler saw through the obfuscation and generated the obvious code anyway.
Consider this assortment of highly questionable unsigned addition routines1 - for variety, here compiled for ARM (unlike yesterday’s addition example).
Written by me, proof-read by an LLM.
Details at end.
Yesterday we saw how compilers zero registers efficiently. Today let’s look at something a tiny bit less trivial (though not by much): adding two integers. What do you think a simple x86 function to add two ints1 would look like? An add, right? Let’s take a look!
Written by me, proof-read by an LLM.
Details at end.
In one of my talks on assembly, I show a list of the 20 most executed instructions on an average x86 Linux desktop. All the usual culprits are there, mov, add, lea, sub, jmp, call and so on, but the surprise interloper is xor - “eXclusive OR”. In my 6502 hacking days, the presence of an exclusive OR was a sure-fire indicator you’d either found the encryption part of the code, or some kind of sprite routine. It’s surprising then, that a Linux machine just minding its own business, would be executing so many.
That is, until you remember that compilers love to emit a xor when setting a register to zero:
Today I’m announcing a project that’s been in the making for around a year. As my time off draws to a close, I’ve been working on an “Advent of” type project, to be released one a day from the 1st of December until the 25th.
This December will be the Advent of Compiler Optimisations: I’ll release one blog post and video each day, each detailing a fun and interesting C or C++ optimisation that your compiler can do. I’ll go into the details of when it applies, how to interpret the assembly, and perhaps as importantly, when it doesn’t apply.
I’ll be covering some very low-level, architecture-specific tricks as well as larger, more high-level optimisations. While I mostly cover x86-64, I do touch on 64-bit and 32-bit ARM as well.
Matt Godbolt is a C++ developer living in Chicago. He works for Hudson River Trading on super fun but secret things. He is one half of the Two's Complement podcast. Follow him on Mastodon or Bluesky.