Making and breaking Ciphers on the Commodore 64 Part 8 - RC4

Making and breaking Ciphers on the Commodore 64 Part 8 - RC4

Recap

So far in this 8 part series on doing weird things with Ciphers on old computers we've:

The warning

"In September, 1994 someone posted source code to the Cyberpunks mailing list-anonymously. It quickly spread to the Usenet newsgroup sci.crypt, and via the Internet to ftp sites around the world. Readers with legal copies of RC4 confirmed compatibility. RDA Data Security, Inc. tried to put the genie back into the bottle, claiming that it was still a trade secret even through it was public, it was too late. It has since been discussed and dissected on Usenet, distributed at conferences, and taught in cryptography courses." - Applied Cryptography, Bruce Schneier, 1997

We’re going to turn to the above book for this one. If there was ever a Crypto 101 bible, Applied Cryptography by the mighty Bruce has to be it.

We’re going to tackle RC4 today that was in widespread use for quite a long time.

RC4 was written by Ron Rivest in 1987 for RSA Security and is both simple in design and very fast.

It was the basis for encryption in:

  • WEP
  • WPA
  • BitTorrent

and is/was optional in many others including

  • Secure shell
  • Remote Desktop Protocol
  • Kerberos
  • TLC/SSL

Warning: RC4 was once a premier cipher, used in millions of systems, but has been broken for quite some time. Use this for learning, not actual cryptography.

How it works

The concept is simple enough (for an encryption protocol):

This is word for word from Applied Cryptography p 397-398 with very minor edits to make it play nice on a web page:

It has a 8 * 8 S-box: s0,s1,...,S255The entries are a permutation of the numbers 0 through 255, and the permutation is a function of the variable-length key. It has two counters, i and j, initialized to zero.

To generate a random byte, do the following:

i = (i + 1) mod 256
j = (j + Si) mod 256 
swap Si and Sj 
t=(Si +Sj)mod256 
K = St

The byte K is XORed with the plaintext to produce ciphertext or XORed with the ciphertext to produce plaintext. Encryption is fast—about 10 times faster than DES.

Initializing the S-box is also easy. First, fill it linearly so:S0=0,S1=1,...,S255=255   Then fill another 256-byte array with the key, repeating the key as necessary to fill the entire array K0,K1,...,K255 Set the index j to zero. Then:

for i = 0 to 255: 
j=(j+Si +Ki)mod256
swap Si and Sj

The plan

We're going to simply encrypt the phrase "ARE YOU KEEPING UP?" with the key "SECRET".

Note that I'm using uppercase here, this will make the already complicated Assembly code easier since the Commodore is going to store these codes anyway.  We want the output in hexadecimal. We've tackled printing hex in previous parts of this series, but it will present a challenge in the BASIC version in a few minutes.

Python

This will  be easier to visualize with some Python code. The pseudo-code SEEMS complicated, but it's really not. Python has a way of making concepts like this easy to understand. It's no mystery it has become one of the most popular programming languages and stayed at the top for decades.

#!/usr/bin/env python

def rc4encrypt(key, data):
    # First initialize the S-Box
    j = 0
    S = list(range(256))
    for i in list(range(256)):
        j = (j + S[i] + ord(key[i % len(key)])) % 256
        S[i], S[j] = S[j], S[i] #swap places
    i = 0
    j = 0
    for char in data:
        i = (i + 1) % 256
        j = (j + S[i]) % 256
        S[i], S[j] = S[j], S[i] # swap places
        print(hex(ord(char) ^ S[(S[i] + S[j]) % 256]), end =" ")

if __name__ == '__main__':
    rc4encrypt('SECRET','ARE YOU KEEPING UP?')

So we create the S-Box and then iterate over it, swapping positions before the XOR for each of the characters in the message. If you're not familiar ord(char) just returns the ASCII number of the character we're currently working in. That ord will byte me later as we'll see.

This outputs the encrypted message:

0xdd 0x3c 0x29 0x5f 0x56 0x6f 0x76 0xe8 0xa9 0x23 0xf9 0xe3 0x28 0x4c 0xa2 0x27 0xf9 0x39 0x3e

Wasn't too difficult right?

BASIC

Just for fun, I implemented this in Commodore C64 BASIC. I don't have an easy way to copy the code into a post without retyping it all, so you'll just have to live with video.

The BASIC version works fine but it's very very slow. It takes around 10 seconds to actually do the S-Box generation and then encrypt.

Since we want to match the Assembly version below as much as possible, the outputs need to be in hexadecimal.  BASIC 2.0 does not natively support printing in hex, so there's a GOSUB at the end to convert and print them. We're using Simons BASIC here again for the XOR and MOD functions, but even it doesn't have a decimal to hex conversion. Our GOSUB at the end adds a TON of processing time to this. It DOES look cool though, and it works so we'll just go with it.

Assembly version

Note: This works... but don’t go using it as a model for elegant code. We could probably spend a few weeks refining it if you wanted to publish it for widespread use commercially or something like that. That's not the point here.

Understanding the cipher is the core concept and working code that's “easy” to follow is the goal.

Setup

We'll be using the divide24 routine from Part 7 of this series to get our modulus function. Get it from there if you're not familiar.

We're also using a couple zero page labels, zp1 and zp2. These could certainly be better named, but they are temporary. Call them what you want. There are MANY parts of this that could be optimized where I did it "the long way" to keep the readability in tact. I've tried to keep everything as close to the naming conventions Mr Shneier uses in Applied Cryptography.

We'll need some places to hold things:

;divide24
remainder .byte 0,0,0
dividend .byte 0,0,0
divisor  .byte 0,0,0
answer   = dividend


message  .null "are you keeping up?"
messagelen .byte 0
cipher   .null "                      "

key      .null "secret"
keylen   .byte 0

s        .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0
         .byte 0,0,0,0,0,0,0,0

i        .byte 0
j        .byte 0
count    .byte 0

And then we've got a little routine to clear the memory where the division will take place:

cleardiv
         lda #0
         sta dividend
         sta dividend+1
         sta dividend+2
         sta divisor
         sta divisor+1
         sta divisor+2
         rts

Swapping

RC4 makes use of a position swap between S[i] and S[j] twice. It's pretty straightforward:

swap
         ldx i
         ldy j
         lda s,x
         sta zp2
         lda s,y
         sta s,x
         lda zp2
         sta s,y

S-Box

Filling the S-Box is first.

dosbox
;it's already initialized with zeros, but we'll do it again 
;to match the book
         .block
         ldx #$00
filloop  txa
         sta s,x
         inx
         bne filloop

         lda #$00
         sta i
sboxloop jsr cleardiv
         lda i
         sta dividend
         lda keylen
         sta divisor
         jsr divide24
         lda remainder
         tay
         lda key,y
         sta zp2
         clc
         ldx i
         lda s,x
         adc zp2
         sta zp2
         adc #0
         sta zp2+1
         lda j
         clc
         adc zp2
         sta dividend
         adc zp2+1
         sta dividend+1
         lda #$00
         sta divisor
         lda #$01
         sta divisor+1
         jsr divide24
         lda remainder
         sta j
swap
         ldx i
         ldy j
         lda s,x
         sta zp2
         lda s,y
         sta s,x
         lda zp2
         sta s,y
skip
         inc i
         ldx i
         cpx #0
         bne sboxloop
done

         rts
         .bend

You have got to be kidding me

As I was writing this version, I got really really stuck. For days. I started to question if I should even bother to make an Assembly version of these things at all. The BASIC version took 30 minutes after all and it does work on a vintage computer.

Why am I even here? What's the meaning of it all?!? Okay, pull yourself together and stop being a baby Michael.

If you code long enough you’ll run into what I did.

My python version above uses ord() to get the ASCII representation of the character during the last real line of work.  As I was writing this Assembly version, I must have transposed ord to ora in my mind.  I banged my head on the rest of the week and started to doubt my suitability in doing this series.

On a Saturday morning I went through it character by character, explaining it to the cat (my rubber duck) and then finally saw it. OR is quite different from XOR when you're looking for accurate results.

Anyway, try and avoid this sort of mistake. And WHEN you make one like it, remember that there is a reason you’re getting the result you’re getting. It’s not magic. Mostly. Or is it?

Encrypt

Now for the encryption loop and position swap.

doencrypt
         .block
         ldx #0
         stx i
         stx j
         stx count

loop     jsr cleardiv
         clc
         lda i
         adc #1
         sta dividend
         adc #0
         sta dividend+1
         lda #$00
         sta divisor
         lda #$01
         sta divisor+1
         jsr divide24
         lda remainder
         sta i
         
doj      jsr cleardiv
         ldx i
         clc
         lda j
         adc s,x
         sta dividend
         adc #0
         sta dividend+1
         lda #$00
         sta divisor
         lda #$01
         sta divisor+1
         jsr divide24
         lda remainder
         sta j
  
swap
         ldx i
         ldy j
         lda s,x
         sta zp2
         lda s,y
         sta s,x
         lda zp2
         sta s,y
   
doxor
         jsr cleardiv
         ldx i
         ldy j
         clc
         lda s,x
         adc s,y
         tax
         lda s,x
         sta zp2
         ldx count
         lda message,x
         eor zp2
         sta dividend
         lda #$00
         sta divisor
         lda #$01
         sta divisor+1
         jsr divide24
         lda remainder
         ldx count
         sta cipher,x
skip
         inc count
         lda count
         cmp messagelen
         
;this is actually more than 128 away to bne back to the loop
;at the top, so we'll test for the end and do a jmp if
;we're not there

         beq done
         jmp loop

done
         rts
         .bend

Benchmarking for fun and profit

Reiterating once again that NO EFFORT has been made to optimize the code for speed. So this is just for funzies:

I made a 128 bit key, which was a realistic key size when RC4 was in widespread use and we can do:

PlatformEncrypt Bits Per Second
BASIC102
Assembly1330

Not great, but not terrible. It'll be interesting to compare to the more complicated ciphers on this platform in a completely non scientific manner.

Extra Credit

  • Optimize parts of the code and benchmark. How much speed can you squeeze out of the 6510/6502 to do this sort of work?
  • Implement this on a different 8 Bit machine

All together now