Branch Target Buffer, part 2

Continuing on from my previous ramblings on the branch target buffer, I thought I’d do a quick follow-up with a little more investigation.

The next thing I looked in to was how many bits of the address are used for the tag. My approach for this was as follows: set N=2 and use very large D to place two different branches in the same set. Ordinarily we’d expect no resteers at all: the BTB is four-way so our two branches fit with room to spare.

However, if only a subset of the address is used as the tag, then if the branch addresses differ only in bits not used in the tag, then we should expect resteers. This is because the BTB erroneously thinks the two branches are the same. The mistake is found and corrected at the decoder, but a resteer is caused.

Resteers by branch count and alignment. Click for full details. Alignment is log scale.

Here it’s pretty clear that no bits between 2²¹ and 2²⁸ are used in the tag: we have pretty much a 100% resteer rate. Below 2¹² (given the results from the last investigation) we aren’t hitting the same set each time, so wouldn’t expect any resteers anyway.

The unusual results at 2⁸ and 2⁹ may be another manifestation of the “hashing” of address theorised last time too.

Running with a few more branches (but with D values not varying quite so high), we get:

Resteers by branch count and alignment. Click for full details. Alignment is log scale.

Broadly the same results. Again a curious low number of resteers at 2¹⁶ for 4 branches. More evidence for address hashing, I suppose.

Worth noting is that all these studies have been done with logical memory addresses, and only within a “sensible” amount of RAM – not taxing the higher bits of the 64-bit address space. The initial branch is aligned to a 4MB boundary. There could be yet physical memory issues to consider: I’d expect the BPU to work purely in logical addreeses though.

Next time I’ll run the tests on other CPUs and see what we get. I’d love to try and confirm the set address hashing too, and to fathom out the replacement algorithm. Ideas welcomed! I’ve been somewhat distracted by further posts on the Mechanical Sympathy mailing list with extra pointers to follow up on modern branch prediction, so I’m a little behind on where I’d like to be!

Branch Target Buffer, part 2

About Matt Godbolt