Making and Breaking Ciphers with Commodore 64 Part 9 - Finding Smallish Primes

By Michael Doornbos

April 26, 2021 - 7 minutes read - 1440 words

Recap

So far we’ve

Disclaimer: I’m sure I’ll get an angry email from a mathematician about this, but we’re going for understanding here. No one expects to be looking for large primes on a Commodore 64. Calm down.

A primer on primes

A prime number is a natural number that has exactly two distinct natural number divisors: the number 1 and itself.

In cryptography

Primes are used in all of the modern secure ciphers I can think of. But why?

Short short version: Primes are useful for cryptography because two large primes are computationally “easy” to multiply together, but extremely difficult to determine the factors when you want to go the other direction.

Our method

There are quite a few methods for determining primes. Eratosthenes and Sundaram are good examples.

Many methods use pre storage of an array to mark results on. We’re limited to 64k of RAM unless we use an REU, and even the largest of those is 16MB (limited by the number of banks which is stored in a single byte so 256*64k=16MB). We’ll need to do a little more work and just loop through values and store them once we find them unless we want to limit the total number we can look for. We’ll lose some speed but gain the ability to go as high as we want assuming we had enough time to wait for a 1Mhz processor to do this work.

Since we’re in learning all the things mode, let’s just try and work it out on our own and see where we end up.

Let’s find the primes between 0 and 224.

What do we know?

We know that 2 is the only even prime, so let’s just count it and process from 3 and up.
We can skip all even numbers other than 2.
We can look through all divisors from 3 to the number we’re testing for and check if there is a remainder. If we find none that divide evenly, our number is a prime.
In step 3 we can save a lot of these checks by reducing them to the square root. Anything greater than the root would be a multiple of an integer we already checked.

This is easy to understand and it’s obviously pretty computationally labor intensive. Let’s do it in BASIC so it’ll be super slow!

BASIC

We’re in Simon’s BASIC again. I actually really like it and we don’t need much RAM for any of these problems so far, so the missing 8K it takes up isn’t an issue. Plus it was written by a teenager, David Simons which just makes me love it more. Someone suggested I do a mod function and load it into the USR vector instead of using Simon’s BASIC for these.

You CAN do

R=A-INT(A/B)*B

This looks really messy to me. You’d have to GOSUB this after assigning A and B every time. And GOSUB is pretty slow. But if you need to be pure BASIC 2.0, then that’s one option.

Implement our labor intensive, er of love program

Using the 4 steps above, we get something like:

100 C=0:N=0
110 FOR N=1TO10000
200 IF N < 1 THEN PRINT N;:GOTO 300
210 IF N = 2 THEN PRINT N;:C=C+1:GOTO300
220 IF N > 2 AND MOD(N,2)=0 THEN GOTO 300
230 P=INT(SQR(N))
240 FOR I= 3 TO P STEP 2
250 IF MOD(N,I)=0 THEN GOTO 300
260 NEXT I
270 C=C+1:PRINT N 
300 NEXT N
400 PRINT:PRINT"PRIMES: ";:PRINT C

Let’s break it down in case it’s not clear.

Line 210 - 2 is the only even prime. Count it.

Line 220 - Skip all even numbers greater than 2

Line 230 - We are iterating over all of the possible odd factors but we can skip anything greater than the square root of N since those would be multiples of what we’ve already checked.

It’s slow, but sorta satisfying isn’t it?

SIMON’S BASIC Limits

As a funny result of working on the benchmark table later down the post, I found out that the mod function in SIMON’S BASIC is 16 bits. It can only do modulus to 65535. I suppose I should have expected that.

Assembly Version, still overly labor intensive but faster

We already know from this silly benchmarking post that these machines take about a day to COUNT to 2^32 so we’re going to stick to 24 bits here to keep this from taking TOO long.

We need a Square Root routine

There is an excellent and easy to read 24 Bit square-root routine by Verz on Codebase64. Back to “let’s not reinvent the wheel” so I’m using this one with credit to the author.

Let’s do this

We’re going to try very hard to make this as close to the BASIC version as possible. I think the end result is pretty close and easy to understand. At least as easy to understand as Assembly code can be.

findprimes
         .block
loop  lda n
         sta dividend
         lda n+1
         sta dividend+1
         lda n+2
         sta dividend+2
         lda #2
         sta divisor
         jsr divide24
         lda remainder
         bne cont
         jmp notprime ;divides by 2

cont
         lda n
         sta square
         lda n+1
         sta square+1
         lda n+2
         sta square+2
         jsr sqrt24
         lda p
         clc
         adc #1
         sta p
         bcc aheadp
         lda p+1
         adc #0
         sta p+1
aheadp
         lda #2
         sta i
         lda #0
         sta i+1
inner
         lda n
         sta dividend
         lda n+1
         sta dividend+1
         lda n+2
         sta dividend+2
         lda i+1
         lda i
         sta divisor
         lda i+1
         sta divisor+1
         lda i+2
         sta divisor+2
         jsr divide24
         lda remainder
         beq notprime

         clc
         lda i
         adc #1
         sta i
         bcc aheadc
         lda i+1
         adc #0
         sta i+1

aheadc
         lda i+1
         cmp p+1
         bne again
         lda i
         cmp p
         bcs doneinner

again
         jmp inner

doneinner

prime
         lda n+2
         sta dvalue+2
         lda n+1
         sta dvalue+1
         lda n
         sta dvalue
         jsr printdec
         lda #32
         jsr $ffd2

notprime
         clc
         lda n
         adc #1
         sta n
         bcc ahead1
         lda n+1
         adc #0
         sta n+1
         bcc ahead1
         lda n+2
         adc #0
         sta n+2
ahead1
         ;lda n+2
         ;cmp #$ff
         ;bne againc
         lda n+1
         cmp #$ff
         bne againc
         lda n
         cmp #$ff
         bcs done
againc
         jmp loop
done
         ; how long did this take?
         lda $a2
         sta dvalue
         lda $a1
         sta dvalue+1
         lda $a0
         sta dvalue+2
         jsr printdec ;prints value as decimal
         rts
         .bend

i        .byte 0,0,0
n       .byte 0,0,0
p = sqrt

Optimizations

This isn’t the fastest way to do this and there are several areas that could be improved pretty simply.

We’ll do A LOT of optimization when we go through my SHA-256 implementation, but I’ve kept it readable here as much as possible since there are probably 100+ algorithms to do this, many of which are dramatically more efficient for a computer.

I’d love to chat about these if you’re so inclined. mike at imapenguin dot com is a good place to start.

Benchmarking

And now we get to the benchmarking part of the show. I’ll stick to NTSC here since this is all raw math and it will be slightly faster.

I also did exactly the same functions in Python 3.9 on a Macbook just to give some context to an average (but very popular) programming language on a mid 2021 laptop for speed comparisons. Not an Apples to Apples by any stretch, but interesting anyway.

PLATFORM	FROM 1 TO:	TIME
Simon’s BASIC	256	17 seconds
Assembly	256	3 seconds
Simon’s BASIC	65536	9819 (2.7 hours)
Assembly	65536	3733 seconds ( 1 hourish)
Assembly	16777216	2 weeks
Python	16777216	109.74 seconds
Python	4294967296	407516 seconds (4 days 17 hours)

Our method works okayish on the Commodore up to numbers of about a million, but after that having to loop through so many iterations becomes pretty inefficient. Later in the series we’ll explore a few methods to make this faster using some pretty clever shortcuts.

Python

Here’s the python version to 65535, happens so fast you can’t really see much.

All together

Let’s compare them from slowest to fastest

Extra Credit

Make simple improvements to the Commodore versions to get some speed out of it
Add a counter for the number of primes like in the BASIC version

This method really is inefficient

I let the python program run to 2^32 and it took a while…

Python to 2^32 took just short of 5 days