RISC-V Snake

Summary

This was my deep dive into low-level systems work at the core of a Computer Architecture course. The capstone was a complete Snake implementation in RISC-V assembly running on RARS MMIO primitives, with keyboard and timer interrupts sharing one trap path, event-driven game-state updates, and rendering optimized to redraw only what changed each tick. It was a practical way to build interrupt handling, MMIO I/O, and state management directly in assembly. There are also a few honorable-mention projects in the full write-up!

Stack

What it's built with.

Architecture

RISC-V ISA
CSR Programming
MMIO

Systems

Trap / Interrupt Handlers
Kernel-Stack Discipline
Event-Driven Game Loop
Network Byte Order

Background

How it works.

What's RISC-V, and why does this matter?

RISC-V is an open instruction set architecture — the language a CPU speaks. Where Intel's x86 and Apple/Arm's ARM are proprietary and licensed, RISC-V is free, royalty-free, and designed in the open. Anyone can build a chip that runs it without paying anyone. That's why it's quietly become the architecture behind a growing number of embedded devices, research processors, and increasingly, mainstream silicon — Western Digital ships RISC-V cores in hard drives, NVIDIA uses it inside GPUs, and the EU is pouring money into RISC-V supercomputing as a sovereignty play.

Working in RISC-V assembly means writing code at the layer where the CPU actually lives. There's no language runtime, no garbage collector, no standard library. You manage memory by hand, you talk to hardware by writing to specific memory addresses, and you handle interrupts by registering your own handler that the CPU jumps to mid-instruction. Every abstraction you'd take for granted in Python or JavaScript — function calls, the call stack, error handling — has to be built up from primitives.

The projects on this page are exercises in that. Snake is the headline because it's a complete, playable game running on those primitives end-to-end. The roadmap below is the path that got there — each piece isolates one concept (protocol-level byte handling, interrupt discipline, hash tables in raw memory, atomic locks) before the game stitches them all together. The point isn't the games or the toy programs themselves; it's that you can read every line and know exactly what the CPU is doing, which is the foundation everything else in software engineering is built on.

Memory map of the playfield

The display is a character grid. Row 0 is a 21-character top wall (a row of `#`s). Rows 1 through 9 are the playable area — a `#` on each side, 19 free cells per row in the middle. Row 10 is the bottom wall, a mirror of the top. Row 12 is the HUD: `Score: NN Time: NNN`. Row 13 is reserved for the end-of-game banner.

That gives 9 × 19 = 171 playable cells. The snake can never legally occupy a wall, the HUD, or the banner row. All the bounds checks downstream collapse into a single `is the new head inside rows 1..9 and cols 1..19?` test — no per-edge special cases, no off-by-one arithmetic at the corners.

Snake representation

Two parallel arrays in `.data`: `snakeRows: .space 1024` and `snakeCols: .space 1024` (256 ints each), plus a `snakeLen: .word 0` counter. Cells are stored in compact insertion order. Index `0` is the tail (oldest cell), index `len-1` is the head (newest). The whole live snake is the prefix `[0..len)` of those arrays — everything past `len` is garbage that gets overwritten on growth.

Two operations run on every tick. **Move (no growth):** erase the cell at index 0 from the display, shift the array left by one (so index 1 becomes index 0, etc.), and append the new head position at index `len-1`. `snakeLen` is unchanged. **Grow (ate an apple):** append the new head at index `len`, then increment `snakeLen`. No erase, no shift.

This compact model keeps every collision check O(len) — a linear walk over the live prefix — and every draw update O(1). Only three cells repaint per move: the tail is erased, the old head gets repainted as a body cell `*`, and the new head is painted `@`. That's what makes the simulator's MMIO redraws survivable at gameplay speed; a full-screen repaint per tick would be unplayable on RARS.

The PRNG

A linear congruential generator stored in `XiVar`: `Xi = (1103515245·Xi-1 + 12345) mod 2^31-1`. The `random` routine updates `XiVar` in place and returns the new value in `a0`.

Apple placement calls `random` twice — once for a row, once for a column — mods each by 9 and 19 respectively, then adds 1 to land inside the playable rectangle (rows 1..9, cols 1..19). If the candidate cell is already occupied by the snake, it's rejected and the routine retries. Rejection sampling rather than a constructive shuffle, because the open-cell count starts at 170 and only shrinks; the expected retry count stays low until the snake fills most of the board.

Interrupts: keyboard and timer

Two MMIO interrupt sources drive the entire run loop. Both are enabled at startup and disabled cleanly on game-over.

**Keyboard interrupt (UEIE, ucause = 8).** Enabled by writing `0x2` to `KEYBOARD_CONTROL` (`0xFFFF0000`). Fires whenever a character is available at `KEYBOARD_DATA` (`0xFFFF0004`). The handler reads the ASCII code and dispatches: `w`/`a`/`s`/`d` maps to a direction (0=up, 1=right, 2=down, 3=left) which gets written into `pendingDir`. A 180° reversal check rejects the input if `(currentDir + 2) mod 4 == newDir`, so the player can't instantly fold the snake onto itself. `q` sets `gameOver = 1`. The new direction lives in `pendingDir` rather than `direction` because keypresses can fire many times between snake moves; the commit `pendingDir → direction` only happens at the start of each `snakeStep`, so the player always gets exactly one direction change per cell — no double-tap glitches, no missed inputs.

**Timer interrupt (UTIE, ucause = 4).** Enabled by setting `TIMECMP = TIME + tickPeriodMs`. Each interrupt schedules the next one (`scheduleNextInterrupt`), then runs two pieces of work: `accumulateSeconds` adds `tickPeriodMs` to `msSinceSecond` and, every time it crosses 1000, decrements `timeLeft` and redraws the HUD (hitting 0 sets `gameOver`). Then, if still alive, `snakeStep` advances the snake one cell.

Trap entry and exit

`utvec` points at `handler`, a single entry point for both interrupt sources. `uscratch` holds the address of a 256-byte save area (`iTrapData`) sitting in `.data`. The handler follows the standard convention:

1. `csrrw t0, uscratch, t0` — atomically swap the user's `t0` with the save-area base. This is the trick that lets you steal a register without losing any state. After this instruction, `t0` holds the save-area address and `uscratch` holds the user's original `t0`.

2. Spill all caller-saved registers into the save area at known offsets: `t1` through `t6`, `a0` through `a7`, `ra`, etc.

3. `csrrs t1, uscratch, zero` — read the user's original `t0` back out of `uscratch` (without disturbing it). Store it at offset 0 of the save area.

4. `csrrw zero, uscratch, t0` — restore `uscratch` to the save-area base, so re-entry into the handler still works.

5. Read `ucause` to identify the source. The high bit set means async interrupt; the low bits are the cause code (4 = timer, 8 = external/keyboard). Branch to the appropriate handler. Anything else falls through to `handlerTerminate`, which prints the cause and exits via `ecall 10` rather than wedging.

6. After the per-source handler returns, reverse the spill — reload every register from the save area, swap `uscratch` and `t0` back the same way going out. Final instruction: `uret`, which restores the saved PC and returns control to whatever the user code was doing pre-trap.

The reentrancy-disabled and reentrancy-permitted boundaries around the swap are annotated explicitly in the source, because that's exactly where bugs live. A reentrant interrupt that fires between steps 1 and 4 would clobber `uscratch` mid-swap and the handler could no longer find its save area on the next entry — recovery is not possible from there. The convention is to keep that window as short as possible.

snakeStep — the heart of the game

On every timer tick (after the HUD/countdown work), `snakeStep` runs:

1. Commit the buffered input: `direction ← pendingDir`.

2. Compute the new head position from the current head + direction.

3. If the new head is outside the wall rectangle (rows 1..9, cols 1..19) → set `gameOver`, return.

4. If the new head equals the apple position → mark this tick as `growing`.

5. Body collision check. If `growing`, scan all snake cells (the tail will stay in place because the snake is growing). If not, scan cells `[1..len-1]` — the tail at index 0 is leaving this tick anyway, so colliding with it would be a false positive. If the new head matches any of those cells → set `gameOver`, return.

6a. If not growing: erase the tail cell from the display; shift the array left by one; repaint what was the old head as a body cell `*`; write the new head into index `len-1`; paint the new head `@`.

6b. If growing: repaint the old head as `*`; bump `snakeLen`; write the new head into the new index `len-1`; paint the new head `@`; increment `runPoints`; add `tickBonus` to `timeLeft`; place a new apple; redraw the HUD.

Step 5 is the subtle one. Doing the cheap-bounds-check (step 3) before the linear collision walk (step 5) keeps the common case fast: most ticks, the snake is moving into open space and the routine bails after a couple of compares. Only ticks that survive the wall check pay the O(len) walk.

Display I/O

`printChar(char, row, col)` does the standard RARS two-step: wait for `DISPLAY_CONTROL[0]` ready, write `(col << 20) | (row << 8) | bell` to `DISPLAY_DATA` to position the cursor (the bell character is the cursor-control sentinel), wait for ready again, then write the actual character to `DISPLAY_DATA`. Higher-level routines — `printStr`, `printMultipleSameChars`, `printAllWalls`, `clearScreen`, `drawHud`, `printInt` — all build on `printChar`.

The per-character `waitForDisplayReady` busy-wait is the dominant cost during full-screen redraws (title screen, walls, game-over banner). RARS can run the simulator at very different speeds, and at the slowest settings each `printChar` takes hundreds of milliseconds — which is why the game's design pushes hard for incremental redraws (3 cells per tick) instead of full-frame refreshes. Once the initial draw of walls + snake + apple is done, ongoing gameplay is fast.

Game-over flow

Once any code path sets `gameOver = 1`, the busy-wait `gameLoop` in `snakeGame` exits. Interrupts get disabled (clear `ustatus.UIE`) so no late-arriving timer or keyboard event can fire mid-cleanup. The handler prints either `GAME OVER!` or `YOU WIN!` on row 13, then polls the keyboard until the user presses `q`, and exits via `ecall 10`.

The interrupts-off → poll-for-q → ecall sequence is deliberately the inverse of game startup. Same registers in the same order, same MMIO addresses, mirrored — so the program's entry and exit paths look like a matched pair. That kind of symmetry is the closest thing assembly gets to RAII.

Roadmap

Where it's headed.

Other RISC-V work

Sibling artifacts in the same world as Snake. Each one isolates a different systems primitive — protocol-level byte handling, hash tables in raw memory, computational kernels over strided data, atomic synchronization, stack-based evaluation.

01
IPv4 Packet Validator
Done
~350 lines. One's-complement checksum with carry-fold and explicit endian handling; forward path decrements TTL and recomputes in place.
02
String Interning Hash Table
Done
~1,080 lines. Hash table with collision chains for string deduplication; equality reduces to a pointer compare.
03
2D Gaussian Blur Kernel
Done
~170-line inner kernel. Nested loops over strided pixel grids with manual register-pressure management.
04
Atomic Spinlocks (lr.w / sc.w)
Done
Histogram counters guarded by `lr.w`/`sc.w` retry loops — the RISC-V atomic primitive in isolation.
05
RPN Calculator
Done
Stack-based reverse-Polish evaluator with manual decrement-on-push stack discipline.

Highlights

The things I'm proudest of.

▹Built a complete event-driven game loop in raw RISC-V assembly with keyboard and timer interrupts routed through one trap path.
▹Implemented incremental redraw logic so each tick updates only changed cells instead of repainting the whole board.
▹Wrote collision detection, movement state transitions, apple spawning, and score/timer handling directly over MMIO primitives.
▹Used explicit register and stack discipline in trap handling, including safe context save/restore and clean interrupt return.