Benchmarking Retro Computers (mostly Commodore) with marginal methods

Benchmarking Retro Computers (mostly Commodore) with marginal methods

Just the BASICs, please

Our friend in retro @NoelsRetroLab has a simple BASIC benchmarking tool that's fun to use to compare different basic systems execution speed.

We're counting jiffies, which is roughly (sometimes very roughly) 1/60th of a second.

We're using three variations of this code on the Commodore BASIC computers. This will vary a little if you try it on something else. And PLEASE try this on whatever you have. It's for "science"!!!

Standard Version

5 TI$="000000"
10 FOR I=1TO10
20 S=0
30 FOR J=1TO1000
40 S=S+J
50 NEXT
60 PRINT".";
70 NEXT I
80 PRINT S
100 PRINT TI

Screen Blank Version

5 TI$="000000"
7 POKE53265,PEEK(53265)AND239
10 FOR I=1TO10
20 S=0
30 FOR J=1TO1000
40 S=S+J
50 NEXT
60 PRINT".";
70 NEXT I
75 POKE53265,27
80 PRINT S
100 PRINT TI

We've discussed this before; the VIC-II on the Commodore 64 and 128 (Plus4?) can, and do, stop the CPU fairly often to "play catch up." Referred to as a "bad line" and can really impact the performance. Disabling the screen eliminates this problem. We don't need it in this code since we're not printing anything useful to the user until done. The results clearly show the performance gain in doing it.

128 FAST Version

This puts the 128 into FAST mode, which runs the CPU at it's full 2Mhz speed.

5 TI$="000000":FAST
10 FOR I=1TO10
20 S=0
30 FOR J=1TO1000
40 S=S+J
50 NEXT
60 PRINT".";
70 NEXT I
75 SLOW
80 PRINT S
100 PRINT TI

A bunch of machines

I was super happy that several people on the internetses saw my Twitter posts about benchmarking and tried this code on their machines. Thanks to NoelsRetroLab (for the idea in the first place, 8-Bit Hero(PET 4032, 128 PAL), and DevZine (Atari / Turbo Basic XL / PAL) specifically.

Montage

There are A LOT of screenshots, and I didn't even include all of them. Here they are in an unorganized gallery just for fun.

Execution Times

System Jiffies Seconds(Approx)
64 NTSC 2384 39.73
64 NTSC Screen Blanked 2234 37.23
64 PAL 2637 43.95
64 PAL Screen Blanked 2493 41.55
VIC 20 NTSC 2237 37.28
VIC 20 PAL 1891 31.52
128 NTSC FAST mode(blanks screen) 1629 27.15
128 NTSC 3444 57.4
128 NTSC Screen Blanked 3218 53.63
128 PAL FAST mode(blanks screen) 1688 28.13
128 PAL Screen Blanked 3318 55.3
Atari / Turbo Basic XL / PAL N/A 22.18
Plus/4 NTSC 3566 59.43
Plus/4 PAL 2690 44.83

We expected the PAL machines to be a little slower because their clocks run slower to get the timing for the different video signal just right. Nothing out of the ordinary, it's just how it is.

BUT...

Commodore 128 BASIC, why are you so slow?

I've been digging around quite a bit to get to the bottom of the Commodore 128's slowness. If anyone knows Bil Herd and wants to ask him, let me know what he says.

I  got help from Robin on technical details, AGAIN, thanks Robin.

We're pretty sure that this comes down to the very complicated banking the 128 does and the way it's organized. For example, BASIC text is in bank 0 and variables are in bank 1. So it's bank switching A LOT.

Even the BASIC ROM itself is split into two chips:

They got their monies worth out of adding that MMU to the 128, but it comes at quite a price in speed.

Clock speeds

These days, the clock speed matters less than it used to. But for a simple comparison:

System Clock Speed
64 NTSC 1.0227 MHz
64 PAL 0.9852 MHz
VIC 20 NTSC 1.0227 MHz
VIC 20 PAL 1.1084 MHz
C 128 NTSC 2.046 MHz FAST/ 1.0227 MHz
C 128 PAL 1.97 MHz FAST/ 0.9852 MHz
PET 2001 1 MHz
Atari 130XE PAL 1.77 MHz

Should we keep going down the rabbit hole?

Yes. Yes we should.

Expanding the Wargames post 24 bit "cracks"

We spent a short time doing 24 bit counters in the Wargames post. We had 16,777,215 possible combinations and this was more challenging for the Commodore as you might expect.  It took between 2.5-4.5 hours to get all 10.

32 bit "codes" to crack

Why stop there? After all 32 > 24, so 32 must be BETTER than 24!?!

32 Bits is 4,294,967,296 combinations. That's a lot to ask of a 1Mhz CPU.

For fun, I decided to do a few calculations on an NTSC Commodore 64 counting to 2563 (or 224 or 16,777,216) and 2564(or 232 or 4,294,967,296). At around 0.43 MIPS, I knew this would take a while. Instead of running a 40 year old computer at 100% for a couple of days, this job fell to my Ultimate 64. I don't feel bad about beating it up, it's really what I bought it for.

I did this is a simple machine language loop that does NOTHING except count and then display what it counted to and the number of jiffies when done.

It took... a while.

Counting to Jiffies Time Screen
16,777,216 ($FFFFFF) 19824 ($4D70) 330 Seconds On
16,777,216 ($FFFFFF) 18576 ($4890) 309.6 Seconds Blanked
4,294,967,296 ($FFFFFFFF) 5,091,744 ($4DB1A0) 23.57 hours On
4,294,967,296 ($FFFFFFFF) 4,771,440 ($48C370) 22.09 hours Blanked

How fast were computers in 1982?

Computers were historically benchmarked with millions of instructions per second (MIPS). 1 MIPS is 1 million instructions per second.

To compare a few computers from the same time period (approximate speeds, there are a LOT of factors):

  • 6502/10 @ 1 Mhz in the Commodore 64: 0.43 MIPS
  • Intel 8086 @  5 MHz: 0.33 MIPS
  • Intel 286 @ 12 Mhz: 1.28 MIPS
  • IBM System/370 158-3 Mainframe @ 8.696 MHz : 0.73 MIPS

"Modern" languages on modern hardware

It's hard for me to wrap my mind around just how large 4 Billion+ is, and even harder to imagine how much goes on inside a modern computer. It's incredible that it works if you think about it.

So how long does this counting to 4 Billion take on a 2020 MacBook Pro? Let's find out.

Python

While not even CLOSE to the fastest language out there, how long does it take Python count to 4,294,967,296 (2564 or 232)  on a 2020 Macbook Pro (2Ghz i5)

This simple Python program took over 6 MINUTES to count up to 4 Billion without printing anything to the screen. 4 billion is a BIG number.

FYI, it only took 1.37 seconds to count to 16,777,215

That's quite a jump from 16 million to 4.5 Billion.

Rust

Just for funzies I did 2564 (or 232) in Rust

Took 3.36 Seconds (including compiling it).

Lesson? If you need speed, Rust is more betterer. (and no I don't want to hear about C, sheesh).

Cryptocurrency miners

ASIC built miner. BTW, they are REALLY loud and generate a ton of heat

Cryptocurrency mining is all the rage these days and purpose built machines are commercially available to do hashes for Crypto work. They do only ONE thing, and that's generate hashes. Nothing else.

They are INSANELY FAST.

A CHEAP off the shelf Crypto Miner might do 20Gh/second, or 20 billion hashes in one second. Based on my unscientific benchmarks, a Commodore 64 would take the better part of a WEEK just to count that high. NOT do the hashes, just to COUNT to what a miner and do in 1 second.

Food for thought.

More/Updates

If you want to include your machine results in this post, please let me know (@mrdoornbos) or mike at imapenguin dot com. I'm happy to make edits/additions to this post at any time. Doesn't have to be in stone. The more machines we include the better. Even Spectrums ;-)