Inside the Ivy Bridge and Haswell BTB

After last time’s analysis of the Arrendale BTB, I thought I should take a look at more contemporary CPUs. At work I have access to Haswell and Ivy Bridge machines. Before I got too far into interpretation, I spent a while making it as easy as possible to remotely run tests, and graph. The code has improved a little in this regard. For completeness, this article was written with the code at SHA hash ab8cbd1d.

The Ivy Bridge I tested was an E5 2667v2 and the Haswell was an E5 2697v3.

Total size

First up let’s try and see how many branches we can fit in the BTB:

Filed under: Coding Microarchitecture
Posted at 03:00:00 GMT on 23rd February 2016.

Branch Target Buffer, part 2

Continuing on from my previous ramblings on the branch target buffer, I thought I’d do a quick follow-up with a little more investigation.

The next thing I looked in to was how many bits of the address are used for the tag. My approach for this was as follows: set N=2 and use very large D to place two different branches in the same set. Ordinarily we’d expect no resteers at all: the BTB is four-way so our two branches fit with room to spare.

However, if only a subset of the address is used as the tag, then if the branch addresses differ only in bits not used in the tag, then we should expect resteers. This is because the BTB erroneously thinks the two branches are the same. The mistake is found and corrected at the decoder, but a resteer is caused.

Filed under: Coding Microarchitecture
Posted at 00:10:00 GMT on 20th February 2016.

The BTB in contemporary Intel chips

I’m in the middle of an investigation of the branch predictor on modern Intel chips. Read the previous article to get some background, and the first part for an overview of branch prediction.

This time I’m digging into the branch target buffer (BTB) on my Arrendale laptop (Core i5 M 520, model 37 stepping 5).

The branch target buffer hints to the front-end that a branch is coming, before the instructions have even been fetched and decoded. It caches the destination and some information about the branch – whether it’s conditional, for example. It’s thought to be a cache-like structure, that has been hinted to be multi-level, like the memory caches. I wanted to find out how big the BTB is and how it was organized.

Filed under: Coding Microarchitecture
Posted at 04:00:00 GMT on 19th February 2016.

jsbeeb top 20 images

A quick one this. Last night while I ought to be doing more micro-architecture research I was instead reading the Wikipedia article on Beebdroid.

I noticed it had the top 20 played games…and I thought I’d do the same for jsbeeb.

So for the period 1st January 2015 to 16th February 2016, the top 20 disc images loaded from the “Stairway to Hell” archive are:

Image Number of loads Percentage
10OfTheBestGames 374 4.00%
Elite 318 3.40%
2002 192 2.06%
Airwolf 128 1.37%
3DPool 123 1.32%
Exile 117 1.25%
Elite-v1.0_B 111 1.19%
3DBombAlley 101 1.08%
Elite-MasterAndTubeEnhanced 99 1.06%
Arcadians 84 0.90%
ChuckieEgg 84 0.90%
180Darts 82 0.88%
Repton 81 0.87%
3DGrandPrix 78 0.83%
Frak 72 0.77%
10OfTheBestEducation_B 72 0.77%
3DConvoy 68 0.73%
Pacman 67 0.72%
CastleQuest 65 0.70%
1984 61 0.65%
Filed under: Emulation
Posted at 15:24:00 GMT on 17th February 2016.

Branch prediction - part two

I’m in the middle of an investigation of the branch predictor on newer Intel chips. Read the previous article to get some background.

Where I left off I’d just decided to look into static prediction myself. I’ve previously used Agner Fog’s tools to do these kinds of investigations, and so that was my first stop.

Agner’s tools install a kernel module to give user-mode access to the hardware performance monitoring counters inside the Intel chips. Each CPU has four counters that can be used to count one of a number of internal CPU events. Agner’s tools then run micro-benchmarks while counting the various internal things going on inside the processor. This seems perfect for us!

Filed under: Coding Microarchitecture
Posted at 14:45:00 GMT on 9th February 2016.

Static branch prediction on newer Intel processors

Over the last week or so I’ve been investigating the static branch prediction on modern Intel processors. A thread on the excellent Mechnical Sympathy mailing list got me thinking about it: a claim was made that static prediction is still used on Intel processors; and my understanding from Agner Fog’s excellent resources is that newer Intel processors no longer do so.

This has led to quite an odyssey of understanding, which I’m still embroiled in; so forgive the length of this post and the fact that it’s the first in a series…

So what’s branch prediction? And what’s static prediction?

Filed under: Coding Microarchitecture
Posted at 04:45:00 GMT on 8th February 2016.

BBC emulation talk - video available

Last year I gave a presentation at work on my favourite open source project, jsbeeb – “Emulating a BBC Micro in Javascript”.

I’ve been given permission to release the video; so here it is, warts and all:

The slides are available, and of course you can play with the emulator on the jsbeeb site.

Filed under: Emulation
Posted at 13:30:00 BST on 17th July 2015.

BBC emulation slides

Last year I gave a presentation at work on my favourite open source project, jsbeeb – “Emulating a BBC Micro in Javascript”.

The slides are now available online. Hopefully they make enough sense by themselves to be interesting. Please note that you can go both left/right and up/down: Some slides have more information if you go down.

The slides were made using reveal with a bit of custom work to get the 6502 assembly syntax highlighted.

Update: I’ve just been given permission to release the video.

Filed under: Emulation
Posted at 13:30:00 BST on 14th July 2015.

About Matt Godbolt

Matt Godbolt is a developer working on cool secret stuff for DRW, a fabulous trading company.