Project Setup

project setupzigcoding
Keith Gangarahwe

Keith Gangarahwe

@keith-gang

The ā€œI Don’t Have Hardware Yetā€ Problem

So this is a post on how I’m starting the setup of my ARM7TDMI interpreter.

For now, we don’t have the actual hardware. That is a problem. So, instead of waiting around, we are going to try and make a basic interpreter in Zig that runs on standard machines. We are effectively hallucinating a CPU in software until the real silicon arrives.

Step 1: Generating Test Binaries

To test an interpreter, you need code to interpret. I didn’t want to mess around with complex toolchains, so I set up this: zig-arm7tdmi_build.zig.

This repo is basically a Zig project setup to write inline assembly in Zig and use the Zig build system to generate a flat ARM .bin file. No ELF overhead, just raw instructions.

There is a catch, though. Because we are doing this weird freestanding build, we hit this: pub const panic = @compileError("panic not allowed in freestanding build");

This means we can’t debug on the target. It is a tradeoff to make truly flat binaries.

Build Requirements

Due to the freestanding nature of the code and the absence of a panic handler, you must build with the ReleaseSmall optimization level.

Why ReleaseSmall?

The file src/main.zig explicitly forbids panic generation. In standard Debug builds, Zig inserts safety checks (e.g., for overflow) that require a panic handler to function. Using ReleaseSmall eliminates these checks, allowing the code to compile without a runtime panic implementation. This ensures the resulting binary contains only the assembly instructions we intended, without any extra bloat.

The build produces the following artifacts:

  • Flat Binary: zig-out/bin/main.bin (The final artifact for our interpreter).
  • ELF Executable: zig-out/bin/dbg (An intermediate artifact with debug symbols).

However, we have options to run this in QEMU to verify what we are generating:

StepCommandDescription
Runzig build run -Doptimize=ReleaseSmallRuns the binary in QEMU (versatilepb, arm926).
Debugzig build run-dbg -Doptimize=ReleaseSmallRuns QEMU paused (-S) with a GDB stub on port 1234 (-s).
WSLzig build run-wsl -Doptimize=ReleaseSmallLaunches the Windows version of QEMU from a WSL environment.

Here is what the main.zig looks like (yes, it’s just inline assembly):

pub const panic = @compileError("panic not allowed in freestanding build");

export fn _start() callconv(.naked) noreturn {
    asm volatile (
        \\ ldr sp, =0x4000
        \\ mov r0, #10
        \\ mov r1, #40
        \\ mov r4, #66
        \\ add r0, r1, r4
    );
    while (true) {}
}

Deconstructing the Binary Generator: main.zig

The code above is our entire program. It’s a bare-metal application that does nothing more than load a few values into registers. Let’s break it down:

  • export fn _start(): This defines our entry point. We name it _start which is the conventional starting point for programs without a standard library or runtime. export makes it visible to the linker.
  • callconv(.naked): This is the crucial part. It tells the compiler not to add any standard function setup or cleanup code (no function prologue or epilogue). We get full control over the registers and the stack, which is exactly what we need for bare-metal programming.
  • noreturn: We promise the compiler this function will never return. In a freestanding environment, there’s no OS to return to, so we end in an infinite loop (while (true) {}).
  • asm volatile (...): This is Zig’s inline assembly. volatile ensures the compiler doesn’t reorder or optimize away our raw instructions. Inside, we perform basic ARM operations: initialize the stack pointer (ldr sp, =0x4000) and then perform some simple arithmetic.

Why Inline Assembly and _start?

You might wonder why we’re writing assembly inside a Zig file instead of using traditional .s assembly files.

  1. Simplicity & Integration: Everything stays in one file. This simplifies the project structure and the build process. We don’t need to tell the build system how to find and assemble separate files.
  2. Control: The combination of _start and callconv(.naked) gives us the low-level control of a pure assembly entry point, but from the comfort of our Zig source. It’s the best of both worlds.
  3. Unified Toolchain: We let the Zig compiler handle everything. It parses the Zig code and the inline assembly in one pass, streamlining the process from source code to machine code.

In short, this approach is a modern, integrated way to achieve what once required separate assembly files and more complex build configurations.

And the build.zig that makes the magic happen:

const std = @import("std");

pub fn build(b: *std.Build) void {
    const target = b.resolveTargetQuery(.{
        .cpu_arch = .arm,
        .cpu_model = .{ .explicit = &std.Target.arm.cpu.arm7tdmi },
        .os_tag = .freestanding,
        .abi = .eabi,
    });

    const optimize = b.standardOptimizeOption(.{});

    const root_module = b.createModule(.{
        .root_source_file = b.path("src/main.zig"),
        .target = target,
        .optimize = optimize,
    });

    // -------------------------------------------------
    // 1. Build ELF executable (WITH debug symbols)
    // -------------------------------------------------
    const exe = b.addExecutable(.{
        .name = "dbg",
        .root_module = root_module,
    });

    exe.root_module.strip = false; // Keep symbols for GDB
    exe.pie = false;
    exe.entry = .{ .symbol_name = "_start" };
    exe.link_gc_sections = true;
    exe.linker_script = b.path("linker.ld");

    // Install ELF (creates zig-out/bin automatically)
    b.installArtifact(exe);

    // -------------------------------------------------
    // 2. Convert ELF → flat binary
    // -------------------------------------------------

    // Ensure output directory exists
    const mkdir = b.addSystemCommand(&.{
        "mkdir",
        "-p",
        "zig-out/bin",
    });
    mkdir.step.dependOn(&exe.step);

    const objcopy = b.addSystemCommand(&.{
        "zig",
        "objcopy",
        "-O",
        "binary",
    });

    objcopy.addArtifactArg(exe);
    objcopy.addArg("zig-out/bin/main.bin");

    objcopy.step.dependOn(&mkdir.step);

    // -------------------------------------------------
    // 3. QEMU Normal Run (flat binary)
    // -------------------------------------------------
    const run_qemu = b.addSystemCommand(&.{
        "qemu-system-arm",
        "-M", "versatilepb", // The Board
        "-cpu", "arm926", // <--- CHANGED: Matches your list exactly
        //"-nographic", // use current terminal for I/O
        "-S", // Freeze CPU at startup
        "-s", // Start GDB server on port 1234
        "-serial", "stdio", // Redirect serial to terminal
        "-semihosting", // enable semihosting for print support
        "-kernel",
        "zig-out/bin/main.bin",
    });

    run_qemu.step.dependOn(&objcopy.step);

    // -------------------------------------------------
    // 4. QEMU Debug Mode (freeze + GDB)
    // -------------------------------------------------
    const run_qemu_dbg = b.addSystemCommand(&.{
        "qemu-system-arm",
        "-M", "versatilepb", // The Board
        "-cpu", "arm926", // <--- CHANGED: Matches your list exactly
        //"-nographic", // use current terminal for I/O
        "-serial", "stdio", // Redirect serial to terminal
        "-semihosting", // enable semihosting for print support
        "-kernel",
        "zig-out/bin/main.bin",
    });

    run_qemu_dbg.step.dependOn(&objcopy.step);

    // -------------------------------------------------
    // 5. Windows QEMU (from WSL)
    // -------------------------------------------------
    const run_qemu_win = b.addSystemCommand(&.{
        "/mnt/c/msys64/ucrt64/bin/qemu-system-arm.exe",
        "-M", "versatilepb", // The Board
        "-cpu", "arm926", // <--- CHANGED: Matches your list exactly
        //"-nographic", // use current terminal for I/O
        "-serial", "stdio", // Redirect serial to terminal
        "-semihosting", // enable semihosting for print support
        "-kernel",
        "zig-out/bin/main.bin",
    });

    run_qemu_win.step.dependOn(&objcopy.step);

    // -------------------------------------------------
    // 6. Expose build steps
    // -------------------------------------------------
    b.step("elf", "Build ELF with debug symbols")
        .dependOn(&exe.step);

    b.step("bin", "Build flat ARM binary")
        .dependOn(&objcopy.step);

    b.step("run", "Run in QEMU (Linux/WSL)")
        .dependOn(&run_qemu.step);

    b.step("run-dbg", "Run in QEMU Debug mode (freeze)")
        .dependOn(&run_qemu_dbg.step);

    b.step("run-wsl", "Run Windows QEMU from WSL")
        .dependOn(&run_qemu_win.step);
}

Deconstructing the Build Magic: build.zig

This build script is the heart of our binary generation process. It automates a multi-step pipeline:

  1. Targeting ARM Freestanding: We first tell Zig that our target is arm, specifically arm7tdmi, and that we’re building for a .freestanding environment. This is the key to stripping away all operating system dependencies.

  2. Generating ELF with Debug Symbols: The first step (exe) builds a standard ELF executable named dbg. The line exe.root_module.strip = false; is critical: it tells the linker to keep the debugging symbols. This allows us to use a debugger like GDB to inspect our code, even though the final binary won’t have these symbols.

  3. Creating the Flat Binary: We then use the zig objcopy command. This tool takes the dbg ELF file as input and applies the -O binary flag. This flag is the magic wand: it strips away all the ELF headers, metadata, and section information, leaving only the raw, executable machine code. The output is our final main.bin, a pure, flat binary file.

  4. Setting up QEMU: The script defines several commands to run our binary in the QEMU emulator.

    • zig build run -Doptimize=ReleaseSmall: Runs the code immediately.
    • zig build run-dbg -Doptimize=ReleaseSmall: This is for debugging. It uses -S to freeze the CPU at startup and -s to open a GDB server(on port 1234), allowing you to connect a debugger and step through the code from the very first instruction.
    • The -kernel zig-out/bin/main.bin argument tells QEMU to load our flat binary file directly into memory and execute it, just as if it were a real piece of hardware booting up.

Step 2: The Naive Interpreter

Now that we have binary blobs, we need something to run them. Enter the basic interpreter: arm7tdmi-interpreter-zig.

In this project, I built a simple pipeline to load the ARM bin file and try to extract the commands. The design is intentionally ā€œnaiveā€ to start with, focusing on a clean separation of data and logic.

The Cpu struct is a prime example of this. It’s pure data—just an array of registers and the status register. It has no methods. This is a Data-Oriented Design approach that keeps the state clean and separate from the execution logic.

The interpreter itself has two modes, which reveal the learning process. The run function operates on a pre-decoded array of Instruction structs. This is a ā€œperfect worldā€ simulation where the binary has been fully analyzed and structured beforehand. The loop function is more realistic, attempting to read raw 32-bit words from a memory slice one by one, decode them, and execute them in a tight loop—much closer to how a real CPU operates.

Here is the code. It looks nice, doesn’t it?

const std = @import("std");

/// ============================================
/// CPU STATE (PURE DATA — NO BEHAVIOR)
/// ============================================
pub const Cpu = struct {
    /// 16 General Purpose Registers (R0–R15)
    /// R15 is Program Counter in real ARM,
    /// but in this simulation we track PC separately.
    regs: [16]u32 = [_]u32{0} ** 16,

    /// Current Program Status Register
    cpsr: u32 = 0x000000D3,
};

/// ============================================
/// Simulated Memory
/// ============================================
pub const Memory = struct {
    data: []u8,
};

/// ============================================
/// DECODED INSTRUCTION FORMAT
/// ============================================
pub const Instruction = union(enum) {
    mov: struct { rd: u8, imm: u32 },
    add: struct { rd: u8, rn: u8, imm: u32 },
    sub: struct { rd: u8, rn: u8, imm: u32 },
    bl: struct { target: usize },
};

/// ============================================
/// DECODE STAGE
/// Converts raw 32-bit ARM word into structured Instruction
/// ============================================
fn decode(word: u32) !Instruction {
    const type_bits = (word >> 26) & 0b11;

    // Data Processing instructions (bits 27–26 must be 00)
    if (type_bits == 0) {
        const opcode = (word >> 21) & 0xF;
        const rd: u8 = @intCast((word >> 12) & 0xF);
        const rn: u8 = @intCast((word >> 16) & 0xF);
        const i_bit = (word >> 25) & 1;

        // MOV (opcode 13)
        if (opcode == 13 and i_bit == 1) {
            const imm = word & 0xFF;
            return .{ .mov = .{ .rd = rd, .imm = imm } };
        }

        // ADD (opcode 4)
        if (opcode == 4 and i_bit == 1) {
            const imm = word & 0xFF;
            return .{ .add = .{ .rd = rd, .rn = rn, .imm = imm } };
        }

        // SUB (opcode 2)
        if (opcode == 2 and i_bit == 1) {
            const imm = word & 0xFF;
            return .{ .sub = .{ .rd = rd, .rn = rn, .imm = imm } };
        }
    }

    // Branch with Link (BL)
    // Bits 27–25 = 101
    if (((word >> 25) & 0b111) == 0b101) {
        const imm24 = word & 0x00FFFFFF;

        // Sign-extend 24-bit immediate
        const signed = @as(i32, @bitCast(imm24 << 8)) >> 6;

        return .{ .bl = .{ .target = @intCast(@as(isize, @divTrunc(signed, 4))) } };
    }

    return error.UnsupportedInstruction;
}

/// Decode entire program once (DOD style)
fn decodeProgram(
    raw: []const u32,
    allocator: std.mem.Allocator,
) ![]Instruction {
    var decoded = try allocator.alloc(Instruction, raw.len);

    for (raw, 0..) |word, i| {
        decoded[i] = try decode(word);
    }

    return decoded;
}

/// ============================================
/// DEBUG PRINT HELPERS
/// ============================================
fn printInstruction(inst: Instruction) void {
    switch (inst) {
        .mov => |i| std.debug.print("MOV R{d}, #{d}", .{ i.rd, i.imm }),
        .add => |i| std.debug.print("ADD R{d}, R{d}, #{d}", .{ i.rd, i.rn, i.imm }),
        .sub => |i| std.debug.print("SUB R{d}, R{d}, #{d}", .{ i.rd, i.rn, i.imm }),
        .bl => |i| std.debug.print("BL {d}", .{i.target}),
    }
}

fn dumpRegisters(cpu: *const Cpu) void {
    for (cpu.regs, 0..) |r, i| {
        std.debug.print("R{d: <2}: {d: <10}  ", .{ i, r });

        if ((i + 1) % 4 == 0)
            std.debug.print("\n", .{});
    }
}

/// ============================================
/// Helper Function to read from the file
/// ============================================
fn readU32(memory: []u8, addr: u32) u32 {
    return @as(u32, memory[addr]) |
        (@as(u32, memory[addr + 1]) << 8) |
        (@as(u32, memory[addr + 2]) << 16) |
        (@as(u32, memory[addr + 3]) << 24);
}

/// ============================================
/// EXECUTION STAGE (HOT LOOP)
/// ============================================
fn run(cpu: *Cpu, program: []Instruction) void {
    var pc: u32 = 0x10; // entry point
    var cycle: usize = 0;

    while (pc < program.len) {
        const inst = program[pc];

        std.debug.print(
            "\n==============================\n",
            .{},
        );
        std.debug.print("Cycle: {d}\n", .{cycle});
        std.debug.print("PC: {d}\n", .{pc});
        std.debug.print("Instruction: ", .{});
        printInstruction(inst);
        std.debug.print("\n\n", .{});

        // Execute instruction
        switch (inst) {
            .mov => |i| {
                cpu.regs[i.rd] = i.imm;
                pc += 1;
            },
            .add => |i| {
                cpu.regs[i.rd] = cpu.regs[i.rn] + i.imm;
                pc += 1;
            },
            .sub => |i| {
                cpu.regs[i.rd] = cpu.regs[i.rn] - i.imm;
                pc += 1;
            },
            .bl => |i| {
                // Save return address in R14 (Link Register)
                cpu.regs[14] = @intCast(pc + 1);

                // Branch
                pc = @intCast(i.target);
            },
        }

        dumpRegisters(cpu);

        cycle += 1;
    }
}

fn loop(memory: []u8) void {
    var cpu = Cpu{};
    var pc: u32 = 0x10; // entry point
    var cycle: usize = 0;

    while (cycle < 10) { // temporary limit
        const word = readU32(memory, pc);

        std.debug.print(
            "\n====================\nCycle: {d}\nPC: 0x{x}\nRaw: 0x{x}\n",
            .{ cycle, pc, word },
        );

        const inst = decode(word) catch {
            std.debug.print("Unsupported instruction\n", .{});
            break;
        };

        // Execute (we will expand this next)
        switch (inst) {
            .mov => |i| {
                cpu.regs[i.rd] = i.imm;
                pc += 4;
            },
            .add => |i| {
                cpu.regs[i.rd] = cpu.regs[i.rn] + i.imm;
                pc += 4;
            },
            .sub => |i| {
                cpu.regs[i.rd] = cpu.regs[i.rn] - i.imm;
                pc += 4;
            },
            else => {
                std.debug.print("Instruction not implemented yet\n", .{});
                //                break;
            },
        }

        dumpRegisters(&cpu);

        cycle += 1;
    }
}

/// ============================================
/// MAIN — SIMULATION ENTRY POINT
/// ============================================
pub fn main() !void {
    // var cpu = Cpu{};

    // Simulated ARM machine code
    //const raw_program = [_]u32{
    //    0xE3A0000A, // MOV R0, #10
    //    0xE3A0100B, // MOV R1, #11
    //    0xE2802005, // ADD R2, R0, #5
    //    0xE2403002, // SUB R3, R0, #2
    //};

    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator = gpa.allocator();

    //const decoded = try decodeProgram(&raw_program, allocator);
    //defer allocator.free(decoded);

    //run(&cpu, decoded);
    const memory_size = 128 * 1024;
    const memory = try allocator.alloc(u8, memory_size);
    defer allocator.free(memory);
    @memset(memory, 0);

    // open main.bin
    const file_path = "bin/main.bin"; // The arm binary formed by our project, copied here
    const file = try std.fs.cwd().openFile(file_path, .{});
    defer file.close();

    const file_size = try file.readAll(memory);

    std.debug.print("Loaded {d} bytes into memory.\n", .{file_size});

    loop(memory);
}

The Reality Check

Here is where my naivety comes back to bite me.

In the code above, you can see two implementations. One is a nice, clean CPU simulation (run function) that takes a pre-cooked array of u32 hex codes. It works perfectly because I hand-picked instructions that I had already implemented (MOV, ADD, SUB).

Then there is the loop function. This is the work-in-progress pipeline attempting to load an actual ARM7TDMI binary file generated by my build system.

Obviously, it breaks.

Why? Because I haven’t worked on the actual pipeline to do the underlying stuff that isn’t a basic mov or add. When you compile real code, even simple code, the compiler generates instructions like ldr (Load Register) to move data around. My interpreter looks at that ldr instruction, realizes it has no idea what that is, and panics.

I was so focused on the ā€œpure dataā€ CPU struct that I forgot that a CPU actually has to fetch memory, and memory access is complicated. The run function assumed a perfect world where instructions just exist. The loop function is facing the cold, hard reality of binary files where you have to handle everything, or you handle nothing.

So, we are in a bit of a crunch. The next step is to actually implement the memory pipeline and handle these load/store instructions so we can run something more complex than 10 + 5.