Branch Target Buffer, part 2

Continuing on from my previous ramblings on the branch target buffer, I thought I’d do a quick follow-up with a little more investigation.

The next thing I looked in to was how many bits of the address are used for the tag. My approach for this was as follows: set N=2 and use very large D to place two different branches in the same set. Ordinarily we’d expect no resteers at all: the BTB is four-way so our two branches fit with room to spare.

However, if only a subset of the address is used as the tag, then if the branch addresses differ only in bits not used in the tag, then we should expect resteers. This is because the BTB erroneously thinks the two branches are the same. The mistake is found and corrected at the decoder, but a resteer is caused.

Resteers by branch count and alignment
Resteers by branch count and alignment. Click for full details. Alignment is log scale.

Here it’s pretty clear that no bits between 221 and 228 are used in the tag: we have pretty much a 100% resteer rate. Below 212 (given the results from the last investigation) we aren’t hitting the same set each time, so wouldn’t expect any resteers anyway.

The unusual results at 28 and 29 may be another manifestation of the “hashing” of address theorised last time too.

Running with a few more branches (but with D values not varying quite so high), we get:

Resteers by branch count and alignment
Resteers by branch count and alignment. Click for full details. Alignment is log scale.

Broadly the same results. Again a curious low number of resteers at 216 for 4 branches. More evidence for address hashing, I suppose.

Worth noting is that all these studies have been done with logical memory addresses, and only within a “sensible” amount of RAM – not taxing the higher bits of the 64-bit address space. The initial branch is aligned to a 4MB boundary. There could be yet physical memory issues to consider: I’d expect the BPU to work purely in logical addreeses though.

Next time I’ll run the tests on other CPUs and see what we get. I’d love to try and confirm the set address hashing too, and to fathom out the replacement algorithm. Ideas welcomed! I’ve been somewhat distracted by further posts on the Mechanical Sympathy mailing list with extra pointers to follow up on modern branch prediction, so I’m a little behind on where I’d like to be!

Filed under: Coding Microarchitecture
Posted at 00:10:00 GMT on 20th February 2016.

About Matt Godbolt

Matt Godbolt is a C++ developer working in Chicago for Aquatic. Follow him on Mastodon.