# Making and breaking Ciphers on the Commodore 64 Part 8 - RC4

## Recap

So far in this 8 part series on doing weird things with Ciphers on old computers we've:

# The warning

"In September, 1994 someone posted source code to the Cyberpunks mailing list-anonymously. It quickly spread to the Usenet newsgroup sci.crypt, and via the Internet to ftp sites around the world. Readers with legal copies of RC4 confirmed compatibility. RDA Data Security, Inc. tried to put the genie back into the bottle, claiming that it was still a trade secret even through it was public, it was too late. It has since been discussed and dissected on Usenet, distributed at conferences, and taught in cryptography courses." - Applied Cryptography, Bruce Schneier, 1997

We’re going to turn to the above book for this one. If there was ever a Crypto 101 bible, Applied Cryptography by the mighty Bruce has to be it.

We’re going to tackle RC4 today that was in widespread use for quite a long time.

RC4 was written by Ron Rivest in 1987 for RSA Security and is both simple in design and very fast.

It was the basis for encryption in:

• WEP
• WPA
• BitTorrent

and is/was optional in many others including

• Secure shell
• Remote Desktop Protocol
• Kerberos
• TLC/SSL

Warning: RC4 was once a premier cipher, used in millions of systems, but has been broken for quite some time. Use this for learning, not actual cryptography.

# How it works

The concept is simple enough (for an encryption protocol):

This is word for word from Applied Cryptography p 397-398 with very minor edits to make it play nice on a web page:

It has a 8 * 8 S-box: `s0,s1,...,S255`The entries are a permutation of the numbers 0 through 255, and the permutation is a function of the variable-length key. It has two counters, i and j, initialized to zero.

To generate a random byte, do the following:

``````i = (i + 1) mod 256
j = (j + Si) mod 256
swap Si and Sj
t=(Si +Sj)mod256
K = St
``````

The byte K is XORed with the plaintext to produce ciphertext or XORed with the ciphertext to produce plaintext. Encryption is fast—about 10 times faster than DES.

Initializing the S-box is also easy. First, fill it linearly so:`S0=0,S1=1,...,S255=255`   Then fill another 256-byte array with the key, repeating the key as necessary to fill the entire array `K0,K1,...,K255` Set the index j to zero. Then:

``````for i = 0 to 255:
j=(j+Si +Ki)mod256
swap Si and Sj
``````

## The plan

We're going to simply encrypt the phrase "ARE YOU KEEPING UP?" with the key "SECRET".

Note that I'm using uppercase here, this will make the already complicated Assembly code easier since the Commodore is going to store these codes anyway.  We want the output in hexadecimal. We've tackled printing hex in previous parts of this series, but it will present a challenge in the BASIC version in a few minutes.

# Python

This will  be easier to visualize with some Python code. The pseudo-code SEEMS complicated, but it's really not. Python has a way of making concepts like this easy to understand. It's no mystery it has become one of the most popular programming languages and stayed at the top for decades.

``````#!/usr/bin/env python

def rc4encrypt(key, data):
# First initialize the S-Box
j = 0
S = list(range(256))
for i in list(range(256)):
j = (j + S[i] + ord(key[i % len(key)])) % 256
S[i], S[j] = S[j], S[i] #swap places
i = 0
j = 0
for char in data:
i = (i + 1) % 256
j = (j + S[i]) % 256
S[i], S[j] = S[j], S[i] # swap places
print(hex(ord(char) ^ S[(S[i] + S[j]) % 256]), end =" ")

if __name__ == '__main__':
rc4encrypt('SECRET','ARE YOU KEEPING UP?')``````

So we create the S-Box and then iterate over it, swapping positions before the XOR for each of the characters in the message. If you're not familiar `ord(char)` just returns the ASCII number of the character we're currently working in. That `ord` will byte me later as we'll see.

This outputs the encrypted message:

`0xdd 0x3c 0x29 0x5f 0x56 0x6f 0x76 0xe8 0xa9 0x23 0xf9 0xe3 0x28 0x4c 0xa2 0x27 0xf9 0x39 0x3e`

Wasn't too difficult right?

# BASIC

Just for fun, I implemented this in Commodore C64 BASIC. I don't have an easy way to copy the code into a post without retyping it all, so you'll just have to live with video.

The BASIC version works fine but it's very very slow. It takes around 10 seconds to actually do the S-Box generation and then encrypt.

Since we want to match the Assembly version below as much as possible, the outputs need to be in hexadecimal.  BASIC 2.0 does not natively support printing in hex, so there's a GOSUB at the end to convert and print them. We're using Simons BASIC here again for the XOR and MOD functions, but even it doesn't have a decimal to hex conversion. Our GOSUB at the end adds a TON of processing time to this. It DOES look cool though, and it works so we'll just go with it.

# Assembly version

Note: This works... but don’t go using it as a model for elegant code. We could probably spend a few weeks refining it if you wanted to publish it for widespread use commercially or something like that. That's not the point here.

Understanding the cipher is the core concept and working code that's “easy” to follow is the goal.

## Setup

We'll be using the `divide24` routine from Part 7 of this series to get our modulus function. Get it from there if you're not familiar.

We're also using a couple zero page labels, zp1 and zp2. These could certainly be better named, but they are temporary. Call them what you want. There are MANY parts of this that could be optimized where I did it "the long way" to keep the readability in tact. I've tried to keep everything as close to the naming conventions Mr Shneier uses in Applied Cryptography.

We'll need some places to hold things:

``````;divide24
remainder .byte 0,0,0
dividend .byte 0,0,0
divisor  .byte 0,0,0

message  .null "are you keeping up?"
messagelen .byte 0
cipher   .null "                      "

key      .null "secret"
keylen   .byte 0

s        .byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0
.byte 0,0,0,0,0,0,0,0

i        .byte 0
j        .byte 0
count    .byte 0
``````

And then we've got a little routine to clear the memory where the division will take place:

``````cleardiv
lda #0
sta dividend
sta dividend+1
sta dividend+2
sta divisor
sta divisor+1
sta divisor+2
rts``````

## Swapping

RC4 makes use of a position swap between `S[i] and S[j]` twice. It's pretty straightforward:

``````swap
ldx i
ldy j
lda s,x
sta zp2
lda s,y
sta s,x
lda zp2
sta s,y``````

## S-Box

Filling the S-Box is first.

``````dosbox
;it's already initialized with zeros, but we'll do it again
;to match the book
.block
ldx #\$00
filloop  txa
sta s,x
inx
bne filloop

lda #\$00
sta i
sboxloop jsr cleardiv
lda i
sta dividend
lda keylen
sta divisor
jsr divide24
lda remainder
tay
lda key,y
sta zp2
clc
ldx i
lda s,x
sta zp2
sta zp2+1
lda j
clc
sta dividend
sta dividend+1
lda #\$00
sta divisor
lda #\$01
sta divisor+1
jsr divide24
lda remainder
sta j
swap
ldx i
ldy j
lda s,x
sta zp2
lda s,y
sta s,x
lda zp2
sta s,y
skip
inc i
ldx i
cpx #0
bne sboxloop
done

rts
.bend``````

## You have got to be kidding me

As I was writing this version, I got really really stuck. For days. I started to question if I should even bother to make an Assembly version of these things at all. The BASIC version took 30 minutes after all and it does work on a vintage computer.

Why am I even here? What's the meaning of it all?!? Okay, pull yourself together and stop being a baby Michael.

If you code long enough you’ll run into what I did.

My python version above uses `ord()` to get the ASCII representation of the character during the last real line of work.  As I was writing this Assembly version, I must have transposed `ord to ora` in my mind.  I banged my head on the rest of the week and started to doubt my suitability in doing this series.

On a Saturday morning I went through it character by character, explaining it to the cat (my rubber duck) and then finally saw it. `OR` is quite different from `XOR` when you're looking for accurate results.

Anyway, try and avoid this sort of mistake. And WHEN you make one like it, remember that there is a reason you’re getting the result you’re getting. It’s not magic. Mostly. Or is it?

## Encrypt

Now for the encryption loop and position swap.

``````doencrypt
.block
ldx #0
stx i
stx j
stx count

loop     jsr cleardiv
clc
lda i
sta dividend
sta dividend+1
lda #\$00
sta divisor
lda #\$01
sta divisor+1
jsr divide24
lda remainder
sta i

doj      jsr cleardiv
ldx i
clc
lda j
sta dividend
sta dividend+1
lda #\$00
sta divisor
lda #\$01
sta divisor+1
jsr divide24
lda remainder
sta j

swap
ldx i
ldy j
lda s,x
sta zp2
lda s,y
sta s,x
lda zp2
sta s,y

doxor
jsr cleardiv
ldx i
ldy j
clc
lda s,x
tax
lda s,x
sta zp2
ldx count
lda message,x
eor zp2
sta dividend
lda #\$00
sta divisor
lda #\$01
sta divisor+1
jsr divide24
lda remainder
ldx count
sta cipher,x
skip
inc count
lda count
cmp messagelen

;this is actually more than 128 away to bne back to the loop
;at the top, so we'll test for the end and do a jmp if
;we're not there

beq done
jmp loop

done
rts
.bend``````

## Benchmarking for fun and profit

Reiterating once again that NO EFFORT has been made to optimize the code for speed. So this is just for funzies:

I made a 128 bit key, which was a realistic key size when RC4 was in widespread use and we can do:

PlatformEncrypt Bits Per Second
BASIC102
Assembly1330

Not great, but not terrible. It'll be interesting to compare to the more complicated ciphers on this platform in a completely non scientific manner.

# Extra Credit

• Optimize parts of the code and benchmark. How much speed can you squeeze out of the 6510/6502 to do this sort of work?
• Implement this on a different 8 Bit machine