Computing Pi from Bouncing Blocks on the C64

By Michael Doornbos

March 14, 2026 - 14 minutes read - 2776 words

Happy Pi Day. Over the years I’ve computed pi on the C64 with Gregory-Leibniz series and Monte Carlo dart throwing, and even written a Pilish validator. This year I wanted to try something different. Something physical.

Place a small block near a wall. Slide a heavy block toward it. The small block bounces between the wall and the heavy block, back and forth, until the heavy block is moving away too fast to catch. Count every collision, wall bounces included.

If the heavy block is 100 times the mass of the small one, you get 31 collisions. Make it 10,000 times heavier: 314 collisions. A million times heavier: 3,141.

Those are the digits of pi.

This isn’t a numerical trick or an approximation algorithm. It’s a physical system where pi falls out of the conservation laws. Grant Sanderson (3Blue1Brown) made a video explaining why, and it’s one of those results that feels like it shouldn’t be true until you run the numbers yourself.

So let’s run them. On a Commodore 64. You expected this on this site, right?

The Setup

We need to idealize the physics a bit to make this work. Picture a flat surface stretching off to the right forever, with a solid wall on the left. Two blocks sit on this surface. The small block (mass $m_1$) is near the wall, sitting still. The heavy block (mass $m_2$) is to the right, sliding left toward the small one.

The rules:

No friction. The blocks slide freely and never slow down on their own. Without friction, the only thing that changes a block’s velocity is a collision.
Perfectly elastic collisions. No energy is lost in any bounce. A real-world collision turns some kinetic energy into heat and sound. Here, every joule of kinetic energy before a collision is still kinetic energy after.
The wall has infinite mass. When the small block hits the wall, it bounces back at the same speed. The wall doesn’t move or absorb any energy.
The blocks are points. No rotation, no deformation, no weird edge hits. They only move left and right along one dimension.

None of this happens in real life, but it’s the kind of frictionless-surface, perfectly-elastic-collision setup you’d see in a physics textbook. The result depends on these idealizations. If you added even a tiny bit of friction, the count would change.

With those rules in place, each collision follows from conservation of momentum and energy:

$$v_1' = \frac{(m_1 - m_2)\,v_1 + 2\,m_2\,v_2}{m_1 + m_2}$$$$v_2' = \frac{(m_2 - m_1)\,v_2 + 2\,m_1\,v_1}{m_1 + m_2}$$

When the small block hits the wall, its velocity just reverses sign. That’s it. Two formulas and a sign flip.

The simulation loop is:

Apply the elastic collision formulas (block-block collision)
If the small block is heading left (toward the wall), reverse its velocity (wall bounce)
If both blocks are moving right and the heavy block is at least as fast, stop. They’ll never meet again.
Otherwise, go back to step 1.

Each event in steps 1 and 2 counts as a collision.

The Counting Program

We don’t need positions, sprites, or animation to see pi appear. We just need velocities and a counter.

10 REM *** PI FROM BOUNCING BLOCKS ***
20 REM *** BY MICHAEL DOORNBOS 2026 ***
30 REM *** IMAPENGUIN.COM ***
40 PRINT CHR$(147)
50 PRINT "PI FROM BOUNCING BLOCKS"
60 PRINT
70 PRINT "ENTER MASS OF LARGE BLOCK"
80 PRINT "(SMALL BLOCK = 1)"
90 PRINT "TRY 100 OR 10000"
100 INPUT "MASS";M2
110 M1=1:V1=0:V2=-1:C=0
120 T1=((M1-M2)*V1+2*M2*V2)/(M1+M2)
130 T2=((M2-M1)*V2+2*M1*V1)/(M1+M2)
140 V1=T1:V2=T2:C=C+1
150 IF V1<0 THEN V1=-V1:C=C+1
160 IF V2>=V1 AND V2>0 THEN 180
170 GOTO 120
180 PRINT
190 PRINT "COLLISIONS:";C

Lines 120-130 are the elastic collision formulas, translated directly from the math. Line 150 handles wall bounces. Line 160 checks whether the blocks are separating for good.

Run it with a mass of 100 and you get 31. Run it with 10,000 and you get 314. It finishes almost instantly because there’s no animation, just arithmetic.

The C64’s floating-point math handles mass ratios up to about 100,000,000 (giving 31,415 collisions) before precision starts to drift. That’s already five digits of pi from a 19-line BASIC program.

Why Does This Work?

The short answer: phase space turns this into a ball bouncing inside a circle.

Plot the system’s state as a point in 2D space, with $\sqrt{m_1}\,v_1$ on one axis and $\sqrt{m_2}\,v_2$ on the other. Conservation of energy says the point stays on a circle. Each collision reflects the point off a line. Wall bounces reflect off one line, block collisions off another. The angle between those lines depends on the mass ratio.

When $m_2 / m_1 = 100^n$, the angle between the reflection lines is $\arctan(1/10^n)$. The number of times a ball can bounce between two lines meeting at angle $\theta$ before escaping is $\lfloor\pi / \theta\rfloor$. For small $\theta$, $\arctan(x) \approx x$, so the count approaches $\pi / (1/10^n) = \pi \times 10^n$. That’s the digits of pi.

Grant Sanderson’s explainer walks through this with great animations. The phase space argument comes from Gregory Galperin’s paper “Playing Pool with Pi.”

Adding Sprites

The counting program proves the math, but it’s more fun to watch blocks bounce. Here’s a version with two sprites: a small yellow block and a larger green one, with a wall on the left edge.

10 REM *** PI BLOCKS - VISUAL ***
20 REM *** BY MICHAEL DOORNBOS 2026 ***
30 REM *** IMAPENGUIN.COM ***
40 PRINT CHR$(147)
50 POKE 53280,0:POKE 53281,0
60 PRINT CHR$(5);"     PI FROM BOUNCING BLOCKS"
70 PRINT
80 REM SPRITE DATA AT $3000
90 SD=12288
100 REM SMALL BLOCK: 6 WIDE, 18 TALL
110 FOR I=SD TO SD+62:POKE I,0:NEXT
120 FOR R=0 TO 17:POKE SD+R*3,252:NEXT
130 REM BIG BLOCK: 18 WIDE, 21 TALL
140 FOR I=SD+64 TO SD+126:POKE I,0:NEXT
150 FOR R=0 TO 20
160 POKE SD+64+R*3,255:POKE SD+65+R*3,255
170 POKE SD+66+R*3,192
180 NEXT R
190 REM CONFIGURE SPRITES
200 POKE 2040,SD/64:POKE 2041,SD/64+1
210 POKE 53269,3
220 POKE 53287,7:POKE 53288,13
230 REM DRAW WALL
240 FOR I=0 TO 24
250 POKE 1024+I*40,160:POKE 55296+I*40,10
260 NEXT I
270 REM PHYSICS SETUP
280 M1=1:M2=100
290 V1=0:V2=-1:C=0
300 P1=10:P2=28
310 SC=8:OX=32
320 Y=200:POKE 53249,Y:POKE 53251,Y
330 GOSUB 700
340 REM MAIN LOOP
350 IF V1<=V2 THEN 570
360 T=(P2-P1-.75)/(V1-V2)
370 GOSUB 800
380 REM ELASTIC COLLISION
390 T1=((M1-M2)*V1+2*M2*V2)/(M1+M2)
400 T2=((M2-M1)*V2+2*M1*V1)/(M1+M2)
410 V1=T1:V2=T2:C=C+1
420 GOSUB 700
430 REM WALL BOUNCE?
440 IF V1>=0 THEN 530
450 T=-P1/V1
460 GOSUB 800
470 P1=0:V1=-V1:C=C+1
480 GOSUB 700
530 IF V2>=V1 AND V2>0 THEN 570
540 GOTO 350
570 REM DONE
580 PRINT:PRINT "     COLLISIONS:";C
590 PRINT "     THAT'S PI!"
600 REM DRIFT OFF SCREEN TO THE RIGHT
610 FOR F=1 TO 40
620 P1=P1+V1*.3:P2=P2+V2*.3
630 GOSUB 700
640 NEXT F
650 POKE 53269,0
660 END
700 REM UPDATE SPRITES
710 SX=OX+P1*SC:BX=OX+P2*SC
720 MB=PEEK(53264) AND 252
730 IF SX>255 THEN MB=MB OR 1:SX=SX-256
740 IF BX>255 THEN MB=MB OR 2:BX=BX-256
750 POKE 53264,MB
760 POKE 53248,SX:POKE 53250,BX
770 H=INT(C/10):L=C-H*10
780 POKE 1024+40+35,H+48
790 POKE 1024+40+36,L+48
795 RETURN
800 REM ANIMATE MOVEMENT
810 DT=T/5
820 FOR F=1 TO 5
830 P1=P1+V1*DT:P2=P2+V2*DT
840 GOSUB 700
850 NEXT F
860 RETURN

You might expect this to use the C64’s hardware sprite collision detection ($d01e), but it doesn’t. Sprite collision registers tell you that two sprites are overlapping right now. That’s fine for a game where you check every frame, but here the velocities can get large enough that the small block would jump past the wall or through the big block between frames. We’d miss collisions.

Instead, the simulation is event-driven. It computes exactly when the next collision happens (line 360 for block-block, line 450 for wall), then the animation subroutine at line 800 smoothly moves both sprites to that point in 5 steps. After each event, the velocities update and we look for the next one. No collision is ever missed because we calculate them all analytically.

The small block starts yellow and stationary. The green block slides in from the right. You can watch the small block ping-pong faster and faster between the wall and the heavy block, while the heavy block barely notices. After 31 collisions, everything separates and the count reads 31.

Assembly Version

The BASIC programs work, but they’re slow. Assembly gives us a reason to use a part of the C64 ROM that most assembly programmers never touch: the floating-point math routines.

The BASIC ROM has a full floating-point engine. Two accumulators (FAC and ARG), multiply, divide, add, and compare. We can call these from assembly just like any other ROM routine. Here are the entry points we need:

Routine	Address	What it does
`movfm`	`$bba2`	Load FAC from 5-byte float at address A/Y
`movmf`	`$bbd4`	Store FAC to 5-byte float at address X/Y
`fmultt`	`$ba28`	FAC = float at A/Y * FAC
`faddt`	`$b867`	FAC = float at A/Y + FAC
`fdivt`	`$bb0f`	FAC = float at A/Y / FAC
`sign`	`$bc2b`	Return sign of FAC in A ($ff=neg, $01=pos)

One gotcha: faddt lives at $b867, not $b86a as some references claim. The $b86a address lands in the middle of the routine (a bne instruction after the internal setup), so calling it directly gives wrong results. I verified this by dumping the actual ROM bytes. At $b867 is a jsr $ba8c that loads the memory operand into ARG and sets up the sign comparison before the addition.

Center-of-Mass Frame

The direct elastic collision formulas work fine at mass ratio 100:1. But at 1,000,000:1, the numerator involves subtracting two large, nearly equal terms (-999999*v1 + 2000000*v2), and the C64’s 32-bit mantissa loses too many digits in the cancellation. The BASIC counting program has the same problem.

The fix is to reformulate using the center-of-mass velocity:

$$v_{cm} = \frac{v_1 + m \cdot v_2}{m + 1}$$$$v_1' = 2 \cdot v_{cm} - v_1 \qquad v_2' = 2 \cdot v_{cm} - v_2$$

This computes the same physics but avoids the catastrophic cancellation. The addition $v_1 + m \cdot v_2$ doesn’t cancel because one term dominates. The subtractions $2 \cdot v_{cm} - v_1$ and $2 \cdot v_{cm} - v_2$ are the intended physics, not an artifact of the formula.

With this approach, mass ratio 1,000,000:1 gives exactly 3,141 collisions. The collision counter updates live on screen so you can watch the digits of pi emerge.

; ==============================
; pi from bouncing blocks
; commodore 64
; turbo macro pro / tmpx
;
; by michael doornbos 2026
; mike@imapenguin.com
;
; mass ratio 1000000:1 -> 3141 collisions
; uses basic rom float routines
; center-of-mass frame avoids
; precision loss
; ==============================

* = $c000

; rom routines
chrout = $ffd2
linprt = $bdcd
movfm  = $bba2
movmf  = $bbd4
fmultt = $ba28
faddt  = $b867
fdivt  = $bb0f
sign   = $bc2b
plot   = $fff0

; zero page
ptr    = $fb

; --- entry ---
         lda #$93
         jsr chrout
         lda #5
         jsr chrout

         ; print title
         ldx #<s_title
         ldy #>s_title
         jsr print
         lda #13
         jsr chrout
         ldx #<s_mass
         ldy #>s_mass
         jsr print
         lda #13
         jsr chrout
         lda #13
         jsr chrout

         ; print label for live counter
         ldx #<s_count
         ldy #>s_count
         jsr print

         ; init v1 = 0
         ldx #4
cpv1     lda fp_zero,x
         sta v1,x
         dex
         bpl cpv1

         ; init v2 = -1
         ldx #4
cpv2     lda fp_neg1,x
         sta v2,x
         dex
         bpl cpv2

         ; zero counter
         lda #0
         sta count
         sta count+1

         ; zero jiffy clock
         sei
         sta $a0
         sta $a1
         sta $a2
         cli

; --- main loop ---
; uses center-of-mass frame:
;   vcm = (v1 + m*v2) / (m+1)
;   v1' = 2*vcm - v1
;   v2' = 2*vcm - v2
loop
         ; fac = v2 * m
         lda #<v2
         ldy #>v2
         jsr movfm
         lda #<fp_m
         ldy #>fp_m
         jsr fmultt
         ; fac = m*v2 + v1
         lda #<v1
         ldy #>v1
         jsr faddt
         ; save numerator
         ldx #<temp
         ldy #>temp
         jsr movmf

         ; fac = (m+1)
         lda #<fp_m1
         ldy #>fp_m1
         jsr movfm
         ; fac = numerator / (m+1) = vcm
         lda #<temp
         ldy #>temp
         jsr fdivt
         ; fac = 2 * vcm
         lda #<fp_2
         ldy #>fp_2
         jsr fmultt
         ; save 2*vcm
         ldx #<temp
         ldy #>temp
         jsr movmf

         ; v1' = 2*vcm - v1
         lda #<v1
         ldy #>v1
         jsr movfm
         lda $66
         eor #$ff
         sta $66
         ; fac = -v1 + 2*vcm
         lda #<temp
         ldy #>temp
         jsr faddt
         ldx #<nv1
         ldy #>nv1
         jsr movmf

         ; v2' = 2*vcm - v2
         lda #<v2
         ldy #>v2
         jsr movfm
         lda $66
         eor #$ff
         sta $66
         ; fac = -v2 + 2*vcm
         lda #<temp
         ldy #>temp
         jsr faddt
         ldx #<v2
         ldy #>v2
         jsr movmf

         ; copy nv1 to v1
         ldx #4
cpnv1    lda nv1,x
         sta v1,x
         dex
         bpl cpnv1

         ; count++ (block-block collision)
         inc count
         bne wallchk
         inc count+1

wallchk
         ; wall bounce: if v1 < 0, negate
         lda v1
         beq show
         lda v1+1
         bpl show
         ; negate v1 (flip sign bit)
         eor #$80
         sta v1+1
         ; count++
         inc count
         bne show
         inc count+1

show
         ; live counter display
         clc
         ldx #3
         ldy #16
         jsr plot
         lda count+1
         ldx count
         jsr linprt

         ; done when v2 > 0 and v2 >= v1

         ; v2 > 0?
         lda v2
         bne notzero
         jmp loop
notzero  lda v2+1
         bpl v2pos
         jmp loop

v2pos
         ; v2 >= v1?
         ; fac = v1
         lda #<v1
         ldy #>v1
         jsr movfm
         ; negate fac
         lda $66
         eor #$ff
         sta $66
         ; fac = v2 + (-v1) = v2 - v1
         lda #<v2
         ldy #>v2
         jsr faddt
         ; check sign
         jsr sign
         bpl done
         jmp loop
done

; --- done ---
         ; read jiffy clock
         sei
         lda $a2
         sta jiffies
         lda $a1
         sta jiffies+1
         cli

         ; print time
         lda #13
         jsr chrout
         ldx #<s_time
         ldy #>s_time
         jsr print
         lda jiffies+1
         ldx jiffies
         jsr linprt
         ldx #<s_jif
         ldy #>s_jif
         jsr print

         rts

; --- print subroutine ---
; x=lo, y=hi, null-terminated petscii
print    stx ptr
         sty ptr+1
         ldy #0
ploop    lda (ptr),y
         beq pdone
         jsr chrout
         iny
         bne ploop
pdone    rts

; --- strings ---
s_title  .null "  pi from bouncing blocks"
s_mass   .null "  mass ratio 1000000:1"
s_count  .null "  collisions: "
s_time   .null "  time: "
s_jif    .null " jiffies"

; --- float constants ---
fp_zero  .byte $00,$00,$00,$00,$00
fp_neg1  .byte $81,$80,$00,$00,$00
fp_m     .byte $94,$74,$24,$00,$00
fp_m1    .byte $94,$74,$24,$10,$00
fp_2     .byte $82,$00,$00,$00,$00

; --- variables ---
v1       .byte 0,0,0,0,0
v2       .byte 0,0,0,0,0
nv1      .byte 0,0,0,0,0
temp     .byte 0,0,0,0,0
count    .byte 0,0
jiffies  .byte 0,0

The wall bounce at wallchk doesn’t need the ROM at all. Since the sign lives in bit 7 of the second byte of the stored float, a single eor #$80 flips positive to negative or vice versa. That’s a 2-cycle operation versus the hundreds of cycles BASIC would spend parsing V1=-V1.

The show section uses plot ($fff0) to position the cursor and linprt ($bdcd) to print the current collision count each iteration. You can watch the counter climb from 1 to 3,141 in about 29 seconds (1,751 jiffies). The mass ratio must be a power of 100 for the count to give digits of pi: 100 gives 31, 10,000 gives 314, 1,000,000 gives 3,141.

Run it with sys 49152.

Sanity Check

If you want to verify the C64 got it right without waiting for 3,141 iterations of ROM floating-point math, here’s the same simulation in Python. It uses Decimal for arbitrary precision and tests four mass ratios at once.

#!/usr/bin/env python
"""
Pi from bouncing blocks
by michael doornbos 2026
mike@imapenguin.com

Slide a heavy block into a light one near a wall.
Count every collision. The total is the digits of pi.

Mass ratio must be a power of 100 for clean pi digits:
  100:1      -> 31     collisions
  10000:1    -> 314    collisions
  1000000:1  -> 3141   collisions
"""

from decimal import Decimal, getcontext

getcontext().prec = 50

ratios = [100, 10000, 1000000, 100000000,
          10000000000, 1000000000000,
          100000000000000, 10000000000000000]

for m2_val in ratios:
    m1, m2 = Decimal(1), Decimal(m2_val)
    v1, v2 = Decimal(0), Decimal(-1)
    count = 0

    while True:
        # center-of-mass frame
        vcm = (v1 + m2 * v2) / (m1 + m2)
        v1 = 2 * vcm - v1
        v2 = 2 * vcm - v2
        count += 1

        # wall bounce
        if v1 < 0:
            v1 = -v1
            count += 1

        # done when blocks separate for good
        if v2 >= v1 and v2 > 0:
            break

    print(f"Mass ratio {m2_val:>12}:1  ->  {count} collisions")

Output:

Mass ratio              100:1  ->  31 collisions
Mass ratio            10000:1  ->  314 collisions
Mass ratio          1000000:1  ->  3141 collisions
Mass ratio        100000000:1  ->  31415 collisions
Mass ratio      10000000000:1  ->  314159 collisions
Mass ratio    1000000000000:1  ->  3141592 collisions
Mass ratio  100000000000000:1  ->  31415926 collisions
Mass ratio 10000000000000000:1 ->  314159265 collisions

The C64 assembly version with mass ratio 1,000,000:1 gives 3,141. Same answer, about 29 seconds on real hardware.

Happy Pi Day.