Matt Godbolt’s blog

Self-indulgent postings

GCC Explorer's top 10 compilers

I’ve had a post-it note on my computer for the over six months to upgrade the GCC Explorer cluster to Ubuntu 14.04 from its current 12.04 setup. One issue I have is that newer Clang versions only have pre-canned binaries for 14.04, and rather than have to build my own, I’d prefer to use those binaries.

That being said; the reverse is true for older compilers like GCC 4.4.

I wondered if I could drop support for GCC 4.4, and in short, I don’t think I can! I turned to the very minimal Google Analytics I have for GCC Explorer, and the top ten compilers (in nearly 2 million compiles) are:

  1. clang++ 3.0.6 (the default when you visit the site) – 37%
  2. g++ 4.8 – 15%
  3. g++ 4.9 – 13%
  4. clang++ 3.4.1 – 6%
  5. g++ 4.9 concepts branch – 4%
  6. clang++ 3.3 – 4%
  7. g++ 4.7 – 4%
  8. Intel’s compiler 13.0.1 – 3%
  9. ARM g++ 4.6 – 2%
  10. g++- 4.4 – 2%

So, venerable old g++ 4.4 is still getting 2% of the hits to the site (40,000 compiles).

I suppose I’ll have to bite the bullet and get builds of gcc4.4 for Ubuntu 14.04. I also really ought to update the Intel compiler too!

Filed under: Coding

Posted at 19:00:00 GMT on 16th December 2014.

Debugging BBC Master demos with jsbeeb

Over the last few weeks I’ve really been concentrating on shoring up the emulation quality of jsbeeb, mainly by adding test cases for all the undefined opcodes. Thankfully, there are some processor test suites out there and I’ve been able to get them running in jsbeeb as part of the continuous build. It now takes about 40 minutes to run all the tests, but I’m pretty darned sure jsbeeb has an accurate NMOS 6502 emulation.

Most recently I’ve been taking a glance over the BBC Master emulation, both in terms of hardware and the slightly different CMOS 65SC12 chip it used.

With a little help from my friends on the stardot forums I was able to get some tests run on real Masters and compare the output of the few instructions I couldn’t find info for. Most notably the unusual behaviour of the 6502 when in Binary Coded Decimal mode has mostly been fixed in the 65SC12.

To test the emulation quality, Rich Talbot-Watkins suggested I try out a Master-only demo that Tom Walker (of b-em fame) had written1. I fired it up and was both very impressed at the quality of the demo, and pleased that it worked first time!

The Master demo, by Tom Walker
Tom’s impressive Master demo

Then I restarted it and the second time it ran, the sound went all crazy.


A bit of experimentation and it seemed about 30% of the time the sound would go nuts about a minute into the demo, squeaking and missing half the notes.

At first I suspected the sound chip emulation. The emulation was taken from my earlier Sega Master System emulator which happened to use the same chip, and I had recently updated the emulation there to fix up some other problems that Rich had noticed. I’d taken the information from the SMS Power emulation site, and it was pretty accurate, but maybe it didn’t match the chip in the Beeb.

It’s definitely the case that Tom’s b-em emulator and jsbeeb differ on how they latch the register values — maybe that was it? I ended up writing a “b-em” style sound system and running it in parallel with the jsbeeb one and comparing their outputs. They agreed 100%, even when the squeaks were there.

Scratch that as a source of error, then. It was a bit suspect anyway as the problem is intermittent and (somewhat) non-deterministic2.

While instrumenting the sound code I was able to notice a pattern: at the point the squeaks started, two of the sound channels would be set to $020 values. This was a handy automatic diagnostic of the problem — it seems there’s no normal time when the music ought to be this squeaky.

The sound code in the demo is based off of Rich’s “Blurp” music player. As such he had a pretty good understanding of how it worked, and he was able to determine what inputs to the sound code would make it choose to play a sound of frequency $020, reading from a particular part of the frequency lookup table.

Armed with this information I added a breakpoint on reading that frequency table location. That allowed me to see when it was being read, but there wasn’t any obvious reason why: I needed to somehow track to the root cause.

Writing an emulator gives you amazing flexibility in debugging — I added a circular buffer of the last 256 values of all the registers after executing each instruction. That meant when the breakpoint tripped I had a history of how the code and the values in the A, X, Y and flags registers had evolved.

Poring over the history I noticed something odd in the code compared to the version in Rich’s original: there was one comparison with zero, which was followed by a branch-if-less-than — a branch that would never be taken.

PC   opcode      A  X  Y   ; my comment
8248 CPX #$00   20 02 00
824a BNE $8291  20 02 00
8291 LDX #$00   20 00 00
8293 CMP #$c0   20 00 00   ; >= c0?
8295 BCC $829b  20 00 00   ; no...
829b CMP #$00   20 00 00   ; >= 0 (er?)
829d BCC $82a3  20 00 00   ; no... carry is set

This is deeply suspicious — this wasn’t in Rich’s Blurp code and didn’t make any sense. On a hunch I reloaded until I got a “working” version where the sound didn’t squeak and lo and behold: The code at location $829b was comparing with #$60. Something was overwriting the code!

Excitedly I breakpointed on memory writes to the $60 byte at location $829a and was rewarded with the culprit: a pretty screen-clearing routine that shutter-style cleared the screen at the point in the demo where the sound went wrong!

Hooray! Now I knew how the corruption was happening. Next I needed to work out why it was intermittent…

The clear routine writes zero bytes in a shutter-style pattern to the screen memory. Nothing too unusual there. The screen memory ends at $7fff in the Master memory map, and then in the bank $8000-#bfff there’s a selectable 16KB bank which can be ROM or one of four banks of RAM. The sound playing routines live in one of the RAM banks at location $8000.

To apparently save some cycles somewhere, the screen routine doesn’t explicitly check for overrunning the memory at $8000: it writes past the end in some cases. 100% reproducible disaster, you might think. But no: Tom’s smarter than that and every time he’s about to write to the screen, he swaps the RAM out of $8000 and instead pages in some ROM — which of course makes the writes harmless. Here’s a snippet of the code:

LDA #$0f
STA $fe30     ; page in ROM
SEI           ; disable IRQs
LDA #$00
STA ($72),Y   ; store zero
CLI           ; re-enable IRQs
LDA $72
ADC #$08      ; move to the next char
STA $72
BNE loop      ; and keep clearing

The interesting part is where interrupts are disabled: recall I said the sound playing routine is in one of the RAM banks that gets paged in at $8000? Well, the sound player is driven by interrupts. The main interrupt handler (which lives down in unpaged memory) pages in the relevant RAM bank when it realises it needs to handle sound (in response to a timer firing). Critically, when it returns it doesn’t restore the page that was previously there.

To prevent this being a problem, Tom sensibly disables interrupts while he does his write. However he doesn’t disable interrupts between paging the ROM in and writing so there’s a tiny window of opportunity where, should an IRQ happen, the sound handling bank is left in just before the zero byte is written.

The window is tiny - there’s only 6 cycles between the point of no return for the store, and the SEI taking effect. Additionally, it’s only if the clearing routine is just past the end of the screen when the IRQ happens that it causes an issue.

Now we understand how, where and why. One last nagging thing to take care of — why is it intermittent? Further instrumentation gives the answer: the timer values that are used to set the interrupt frequency aren’t being deterministically set. Instead their values are loaded in the interrupt routine itself. That means the “start” time of the interrupts depends on how many cycles have passed since the last hard reset. I was able to use this to find the 6-cycle window: by externally varying the timer values at the point at which the first instruction of the demo executes I could find the values that caused it to go wrong.

b-em and other emulators don’t seem to see this issue — I believe as they don’t model the interrupts as precisely in all cases.

The only remaining mystery is why nobody has ever seen the issue on a real BBC Master. The marvellous Ed Spittles has run the demo a number of times on a real Master and never seen the issue. This leaves me feeling that perhaps I’m missing something, but it could just be an emulation artifact (maybe in how long the disc drive takes to load?) that makes the few timer values where it’s a problem more likely.

Any ideas on this gratefully accepted! For now, I’m happy enough with my explanation and can hopefully get on with some of the other things that need attending to.

Big thanks to Rich for putting up with my enormous nocturnal email rambles as I picked my way through all this in addition to giving fantastic advice on what the problem might be. Again, also thanks to Tom Walker for both the demo and for his excellent emulator which was the inspiration for jsbeeb.

  1. Sadly I don’t know the license state of the demo so I haven’t been able to share it here. 

  2. From an autoboot (with a deterministic start time) it would always fail. Subsequent runs would fail about 30% of the time. 

Filed under: Emulation

Posted at 03:50:00 GMT on 31st October 2014.

Elite's crazy tokenised string routine

The other day some discussions between friends led to someone posting a faked up Elite status screen with a silly rating instead of “Harmless”. Given what I’ve been spending all my time on recently I thought it would be more amusing to post a playable version with the “Harmless” replaced by patching the Elite code.

I naively thought this would be easy to achieve: I added a quick routine in jsbeeb to search for a string in memory and looked for Harmless. No joy. I tried a case-insensitive version. Likewise, no luck. This wasn’t going to be as simple as I thought.

I next put a conditional breakpoint on the OS write character routine OSWRCH, conditional on the character being the H of the Harmless I was hoping to replace.

In jsbeeb (at the time of writing) breakpoints are a little awkward to use: Assign a function to processor.debugInstruction which will be called at every instruction with the program counter and should return true to stop the program, or false to carry on. OSWRCH is at $ffee so bringing up the Javascript console and typing:

processor.debugInstruction = function(pc) { 
    return pc === 0xffee && processor.a == 72; 

A quick run and…no joy. Of course, Elite doesn’t use the OS routines to print text — it has its own.

At this point I remembered that Ian Bell, co-creator of Elite, had posted up some old disc images of the source code of the tape version of Elite. It’s not quite the same as this disc image I’m using, but it should be a good starting point. For ease of reading without firing up an emulator, he’s also posted a ZIP of the pertinent files as text.

Opening up the source gives an amazing insight into how the game was made: The source is a terrific mess. It’s split over several BASIC files which assemble the code by daisy-chaining each other. The code has very few comments and whitespace; crams many 6502 instructions per line; and uses many tricks like skipping over the next few instructions in branches via PC-relative mechanisms like BNE P%+4. It’s pretty hard-going stuff.

Luckily one of the few comments is \PRINT and so this is the first clue: the print routine is at the obfuscatedly-named label TT26. There’s no easy way to tie labels in the tape source to addresses in the disc version. I opted to search for the opcodes either side of the label, and struck gold: TT26 lives at address $1d56. I moved my Javascript breakpoint there and was able to catch the letter ‘H’ being drawn. Lucky Elite uses ASCII or otherwise I wouldn’t have been able to catch it here.

Here I stepped out (keypress ‘o’ in the latest jsbeeb debugger) and found the calling routine, and traced its origin to $31ac. Searching in the source I found this corresponds to label TT27, which sadly has no handy comments. This routine is the “print a string character” routine, but its input is definitely not ASCII.

The short version

The code EORs its input with 35, then interprets the character as a token: some tokens are ASCII, some are colour and case-changing codes, some refer to parts of the player’s state and some refer to one of a list of two-letter expansions, making the strings in general shorter. The string “Harmless” is actually all caps (the title casing is done by the code) and is encoded as 6 bytes: H [AR] M [LE] S S where the AR and LE are two-letter expansions. The string itself is stored at $0769. Capital letters are left unchanged, so to replace the string with, say, “Monkey” we can patch the code to put 6e4c4d48465a (“MONKEY” EORed with 35) in the relevant place. This can be achieved with a URL parameter in jsbeeb, adding &patch=@31a6,0769:6e4c4d48465a to the autoboot URL to yield a handy bootable patched URL.

The “@” sign in the patch URL sets a breakpoint at that location, and then applies the patch (in addr:bytestopatch) when the breakpoint is hit. That way, Elite loads up first, then the first time the print routine is called the patch is made.

The long version

Let’s dig into that print routine a bit and see what it’s doing. Here I’m going to use the label names from the source. To follow along at home, fire up Elite in jsbeeb, hit the Home key and then go to $31ac by typing into the disassembly address box, or open up ELITEB.TXT from the source ZIP.

The “print message” routine directly follows the print character routine, and I’ve annotated the whole lot. Some of the auxiliary routines are just above .TT27, but to keep things a little shorter I’ve left those out, and just explained them in the text (e.g. the csh routine).

.TT27       ; print char routine
TAX         ; save A into X, and set flags
BEQ csh     ; if zero, print "Cash:" and then
            ; the player's cash
BMI TT43    ; if negative (bit 7 set) go to TT43
            ; which treats the bottom 7 bits as an index
            ; into a token table
DEX         ; decrement and set flags...
BEQ tal     ; if zero, A was 1; go to tal
            ; which prints "tally" (a number)
BEQ ypl     ; if A was 2; go to to ypl
            ; which prints the current planet name
BNE P%+5
JMP cpl     ; if A was 3; go to cpl
            ; which prints the selected planet name
BEQ cmn     ; if A was 4; go to cmn
            ; which prints the player name (commander?)
BEQ fwl     ; if A was 5; go to fwl
            ; which just prints the player's cash
BNE P%+7
LDA #128    ; if A was 6, store 128 in QQ17
STA QQ17    ; which is a case-switch flag
RTS         ; and return
DEX         ; (this comes from P%+7 above)
BNE P%+5
STX QQ17    ; if A was 8, store zero ...
RTS         ; in case-switch flag and return
BEQ crlf    ; if A was 9, tab over and put a colon
            ; (maybe)
CMP #&60    ; NB we are comparing with the original 'A' again
BCS ex      ; If it's lower-case (or above), head to ex
            ; which treats it as a message index to print
CMP #14     ; if less than 14..
BCC P%+6    ; treat as a capitcal
CMP #32     ; if less than 32
BCC qw      ; ...go to 'qw' which treats it as message number
            ; A + 114
; So now we have a capital letter.
LDX QQ17    ; otherwise check the case-switch flag
BEQ TT74    ; zero? leave as-is and just print
BMI TT41    ; negative? head to TT41
BIT QQ17    ; bit 6 set?
BVS TT46    ; ...go to TT46 (which prints the character
            ; as-is, then clears bit 6 of case-switch)
; This prints the char in the A register in lowercase:
CMP #65     ; less than 'A' ?
BCC TT44    ; ...then print as is
CMP #&5B    ; more than 'Z' ?
BCS TT44    ; ...just print
ADC #32     ; otherwise add 32 to make it lower-case
.TT44       ; TT44 just jumps to TT26 (the character
JMP TT26    ; print routine).
BVS TT45    ; if bit 6 of case-switch set, go to TT45
CMP #65     ; if less than 65 ('A')
BCC TT74    ; just print, else..
ORA #64
STA QQ17    ; set bit '6' in case-switch
BNE TT44    ; and print as-is - i.e. this character
            ; will be printed but the next will be
            ; down-cased
ADC #114    ; add 114 to A
BNE ex      ; and print it as a canned message

LDA #21
STA XC      ; store 21 in the x position
BNE TT73    ; and go to TT73 (which prints a colon)

CPX #FF     ; if case-switch was $ff
BEQ TT48    ; ...skip entirely
CMP #65     ; if it's 'A' or above
BCS TT42    ; ...print in lowercase
PHA         ; save character to print
AND #191    ; clear bit 6 of case-switch
STA QQ17    
PLA         ; restore character
JMP TT26    ; and print as-is

CMP #160    ; is it >160?
BCS TT47    ; print as message number (A-160)
; Else we're going to print it as a one or
; two-character token.
AND #127    ; clear top bit to leave token number
ASL A       ; double it to get offset into table
TAY         ; put in Y register
LDA QQ16,Y  ; read first char
JSR TT27    ; print it
LDA QQ16+1,Y; read second char
CMP #63     ; if it's 63
BEQ TT48    ; ...skip (not that I've noticed any tokens that
            ; take advantage of this)
JMP TT27    ; else print second char and exit

; We got here if A>160 which means print canned message A-160
SBC #160        ; A-=160
; This is the main entry routine that prints canned message A
TAX             ; save A in X
; point zero-page location (V,V+1) to QQ18
; which is the message table.
LDA #(QQ18 MOD 256)
LDA #(QQ18 DIV 256)
LDY #0          ; init loop counter
BEQ TT50        ; Skip if this is the zeroth message

; TT51 to TT59 skip to the end of a zero-terminated string
LDA (V),Y       ; get current byte
BEQ TT49        ; if we found the end, exit loop
INY             ; otherwise move to next char
BNE TT51        ; if we didn't overflow, reloop
INC V+1         ; handle overflow
BNE TT51        ; and reloop
INY             ; skip the zero-terminator
BNE TT59        ; handle overflow
INC V+1         
DEX             ; decrement the message index
BNE TT51        ; and reloop if we haven't skipped to target

; Now we're at the main loop where (V),Y point at the message
; to print.
; Printing a char can recursively call us, so stack (V+1) 
; and Y (which is enough to preserve our state).
PHA             ; preserve "Y"
PHA             ; preserve (V+1)
LDA (V),Y       ; Read a string byte
EOR #35         ; "Decrypt" it
JSR TT27        ; call the print char routine
PLA             ; restore V+1
PLA             ; restore Y
INY             ; move to next char
BNE P%+4        ; overflowed?
INC V+1         ; handle overflow
LDA (V),Y       ; hit the end of the string?
BNE TT50        ; if not, go around again
RTS             ; we're finished

The token table (at QQ18) maps the following indices to these two-byte strings:

00  AL
01  LE
02  XE
03  GE
04  ZA
05  CE
06  BI
07  SO
08  US
09  ES
0a  AR
0b  MA
0c  IN
0d  DI
0e  RE
0f  A?
10  ER
11  AT
12  EN
13  BE
14  RA
15  LA
16  VE
17  TI
18  ED
19  OR
1a  QU
1b  AN
1c  TE
1d  IS
1e  RI
1f  ON

So the HARMLESS string was encoded as:

48  'H'
8a  reference to 'AR'
4d  'M'
81  reference to 'LE'
53  'S'
53  'S'

Then EORed with 35 ($23) and zero-terminated to yield 6b a9 6e a2 70 70 00

To make your own simple strings you can use this snippet of python:

def encode(msg):
    return "".join("{:02x}".format(ord(x)^35) for x in msg)

msg = raw_input("Text to encode: ")
print encode(msg)

Filed under: Coding

Posted at 20:00:00 BST on 12th June 2014.

jsbeeb Part Four - IRQs and timers

This is the fourth post in my series on emulating a BBC Micro in Javascript. I’d recommend reading the previous part on 6502 internal timings before reading this post. It’s also handy to have read read the first part which covers general stuff, and the second part which focuses on the video hardware.

Again thanks to my good chum Rich Talbot-Watkins who helped demystify what goes on in the Beeb.

May I interrupt you a moment?

Like most CPUs, the 6502 has a physical pin (the IRQ line) which causes an interrupt when brought low: the current instruction is completed and then instead of proceeding to the next instruction, the CPU enters interrupt mode (which disables further interrupts) and jumps to an operating system handler. This handler then deals with the interrupt — a timer firing, a keyboard event, disc activity and so on. The handler is responsible for:

Of course in reality nothing is that simple. On the 6502 the simple pipelining has an interesting side effect: the IRQ pin is only checked on the penultimate cycle of each instruction. This is the point at which the next instruction’s address is determined and fetched, ready for execution on the next cycle.

The penultimate cycle in question is the last “logical” cycle of the 6502’s state machine, not necessarily the penultimate physical cycle — in the case of cycle stretching on the final cycle of the instruction, those stretched cycles don’t cause the 6502 to recheck the IRQ line.

If the IRQ line is brought back high mid-instruction when the processor had already noticed it being low, the interrupt is still taken — even though there’s no evidence left for the interrupt handler to determine why it was called.

Not emulating these strange quirks leads to subtle differences in when interrupts are taken, which causes the legendary Kevin Edwards’ protection systems to fail to decode.

Given all this complexity, it’s interesting to think how experiment to try and work out how Kevin sussed all this out. Rich had a guess and in this fascinating thread Kevin confirms how it was done.

Running out of time(rs)

While we’re talking cycle correctness we should mention the hardware timers. There are two timers in each of two 6522 Versatile Interface Adapters (VIA). Each timer can generate an interrupt when it overruns, and automatically reloads from a programmable start value.

From our cycle correctness point of view the interesting thing about the timers is the exact time that the timer fires. They tick at 1MHz, and generate their interrupt a half-cycle after they underflow below $0000 (back to $ffff). At this point they either stop (a “one-shot” timer), or reload their initial value and start again (a “free-running” timer).

When used in the latter mode the reload happens on the half-cycle after the interrupt: When a value N is written to the timer’s latched start value, the first interrupt will happen N + 1.5 1MHz cycles later (i.e. 2N + 3 CPU cycles); and subsequent interrupts happen at N + 2 1MHz cycle (2N + 4 CPU cycle) intervals. (Recall that the CPU clock runs at 2MHz.)

If we configured a free-running timer with a counter of 2, and assuming we could acknowledge the IRQ almost straight away1, it would look something like:

Some games rely on the exact timing of the timers to change video settings mid-frame. For example, the driving game Revs sets the timer to fire every 312.5 * 64 - 2 cycles — which is to say exactly the number of 1MHz ticks there are in an interlaced PAL TV frame, minus two to account for the later time the timer fires. Most games use the vertical blanking interrupt generated by the video circuitry at the top of a frame to set a one-shot timer to fire after the appropriate time, but Revs just leaves a free-running timer going. If the emulation wasn’t spot on, then over time the place where the video settings change would slowly creep up the screen.

Acknowledging the timer

Reading the low 8 bits of a timer has an extra meaning: it acknowledges any interrupts from that timer. This is what an interrupt handler would do: once it had ascertained that it was a timer that caused the IRQ (by checking the VIA status register) it would read from that timer to stop it holding the CPU IRQ pin low.

By now you’ll realise there’s going to be a catch: if the CPU tries to acknowledge the timer firing on the same cycle that it fired, then the acknowledgement gets lost2. An interrupt will still be generated on the next cycle. Again, Kevin Edwards’ protection systems take advantage of this unusual behaviour.

Of course, on a real system all the components run in parallel. In an emulator they’re emulated serially. Empirically, we determined that to get things to work right, at each tick we do things in this order:

  1. Check to see if any IRQs are pending (if this is a penultimate CPU cycle).
  2. Generate any IRQs due to happen this cycle.
  3. Let any CPU memory accesses happen (which may of course clear IRQs etc).

Also there are a few extra + 1s in the code when counters are being loaded from or written to account for the actual point within a cycle the CPU would actually read or write the timer.

Most of the relevant code for the interrupts in in the via.js source file. The interrupt generation and counter handling is in the polltime function and it looks a little like:

// "cycles" is the number of 2MHz cycles that
// have occurred since the last time we checked.
count -= cycles;
justHit = false;
// Have we fired? -3 here accounts for the
// initial late fire time.
if (count <= -3) {
  // "justHit" is our note that we
  // should ignore any acknowledgment on
  // this cycle.
  if (count === -3) justHit = true;
// Reload the timer, accounting for the
// extra 4 2MHz ticks of a reload.
while (count < -3) 
  count += reload + 4;

Getting the emulation cycle-perfect has been a very satisfying achievement. Thanks are due to Ed Spittles for gathering real-world measurements, and Rich Talbot-Watkins for poring over technical documents and making spreadsheets of possible internal timing behaviour. It’s all been worth it as we finally cracked Kevin’s preotection system.

As an amusing side-note, when Rich and I first tried cracking the Alien 8 protection system (one of Kevin’s earlier accomplishments) some 20 years ago we started down this road. We tried building a 6502 emulator — in BBC Basic — to run the decryption code. We never got it working, and now I have a much fuller appreciation of why!

  1. The mark “x” on the diagram is where the IRQ is acknowledged. 

  2. This seems to be a 6522 glitch, though it’s not documented anywhere. 

Filed under: Coding Emulation

Posted at 14:40:00 BST on 4th June 2014.