Adding a guest-architecture frontend¶

A new guest ISA is a new frontend, not a new core. The arch-neutral IR (jit.md), the IR interpreter, and the LLVM JIT are shared; an architecture supplies only the code that turns its guest bytes into the arch-neutral ir::IrBlock plus a handful of coarse services. This page is the contributor procedure, with the committed SPARC V8 implementation (src/arch/sparc/) as the worked reference.

The boundary is two interfaces in src/ir/include/lince/ir/architecture.hpp:

Interface	What it supplies	Call frequency
`ir::IArchFrontend`	`translate_block(state, bus, pc, mode) → IrBlock`	once per untranslated `(pc, mode)`
`ir::IArchitecture`	`state_size`, `reset_state`, `mode_ctx_of`, `pc_of`, `frontend`, `take_exception`, `name`	once per block / quantum / event

Neither is on the per-instruction path. The only per-instruction cost is the IR op interpreter (or, when JIT-compiled, native code) — both arch-neutral.

flowchart LR
    subgraph New["src/arch/<arch>/ (you write this)"]
        FE["ArchFrontend : IArchFrontend<br/>translate_block"]
        AR["ArchArchitecture : IArchitecture<br/>state_size / mode_ctx_of /<br/>pc_of / take_exception"]
        LO["arch_layout.hpp<br/>GuestState offsets + reg math"]
    end
    subgraph Shared["unchanged"]
        IR["lince_ir<br/>IrBlock / interpreter / BlockCache"]
        JIT["lince_jit<br/>lower_block / TieredJit"]
        RT["lince_runtime<br/>run_ir_quantum"]
    end
    FE -->|emits IrBlock| IR
    AR -->|state_size, take_exception| RT
    LO -.->|offsets| FE
    IR --> JIT
    RT -->|holds IArchitecture*| AR

What a frontend must not do¶

These are the invariants that keep the IR shareable. Breaking one re-couples the ISA to the engine and defeats the seam.

No register names in the IR. Guest registers are byte offsets into the GuestState blob, touched only via LdState/StState. SPARC register windows and ARM register banking are the frontend's choice of offset, invisible to the IR (decision 49).
No flags register. Condition codes are explicit guest-state writes; the frontend computes each bit into its offset (SPARC: flags_add/flags_sub/ flags_logical, sparc_frontend.cpp:84). There is no shared NZVC/NZCV concept (decision 52).
No endianness in the bus. LdGuest/StGuest carry {size, endianness}; the swap happens in the op (interpreter, guest_memory.hpp) or lowered code (JIT). A big-endian SPARC and a little-endian ARM frontend emit the same ops with a different endian field (decision 51).
No mode dynamism inside a block. Any instruction that changes the mode context (window rotation, processor-mode switch, trap entry, Thumb toggle) is a block terminator, so within a block the mode is constant and every mode-dependent offset is resolved at translate time (decision 50).

The IR a frontend emits¶

src/ir/include/lince/ir/ir.hpp defines everything a frontend produces. An IrBlock is a vector of IrInst plus one structured IrExit. Build it with the member helpers — never hand-fill an IrInst:

Helper	Emits	Notes
`emit_const(imm) → Temp`	`Const`	block-local value
`emit_ld_state(off, size) → Temp`	`LdState`	read guest reg (host order)
`emit_st_state(off, size, src)`	`StState`	write guest reg
`emit_ld_guest(addr, size, endian) → Temp`	`LdGuest`	guest memory load
`emit_st_guest(addr, val, size, endian)`	`StGuest`	guest memory store
`emit_binary(Op, a, b) → Temp`	ALU/Cmp	`Add Sub And Or Xor Shl Shr Sar Mul UMulHi SMulHi UDiv SDiv CmpEq CmpLtU CmpLtS`
`emit_ternary(Op, a, b, c) → Temp`	`UDiv64`/`SDiv64`	three-operand divide (high\|low / divisor)
`emit_select(cond, t, f) → Temp`	`Select`	`cond ? t : f`
`emit_trap_if(cond, code)`	`TrapIf`	mid-block conditional trap at `cur_pc`
`set_cur_pc(pc)`	—	stamp the PC onto subsequent trapping ops

Temp values are block-local SSA-free temporaries; they die at the block boundary. Cross-instruction state flows through the GuestState blob, never through temps. See jit.md § The IR data model for the full op semantics.

Set the block's identity and terminator:

block.entry_pc — the guest PC the block starts at.
block.mode_ctx — the mode context (see Step 3); together with entry_pc it keys the block cache.
block.insn_count — guest instructions covered (the run loop bills sim-time from it).
block.mode_change_kind — ir::ModeChangeKind (ir.hpp:112): None (default), StaticDelta (a translate-time-constant mode shift; the region compiler may chain across it under exit_mode_ctx), or Dynamic (an unknown post-change mode — a region terminator).
block.exit_mode_ctx — the mode the static successor runs under; set it alongside StaticDelta.
block.exit — an IrExit:

`ExitKind`	Fields	Meaning
`FallThrough` / `StaticBranch`	`static_target` (+ `is_call`)	continue at a fixed PC
`CondBranch`	`cond`, `static_target`, `fallthrough_target`	`cond ? target : fallthrough`
`IndirectBranch`	`dyn_target` (Temp)	computed target
`Exception`	`exit_code`	deliver an architectural trap
`PowerDown`	—	core halted pending interrupt

A block whose entry has no translatable instruction returns insn_count == 0; the run loop falls back to the per-instruction reference path for that PC. A bail_at(pc) is the SPARC idiom: a FallThrough to the current PC with no ops, which the run loop's insn_count == 0 check turns into a single fallback step.

Delay-slot traps (delay-slot ISAs only)

If the ISA has a branch delay slot and the slot can fault (load/store/divide), a fault there must save the branch's resolved nPC, not trap_pc + 4. Record it on the block via delay_trap_pc / delay_trap_npc / delay_trap_dynamic (ir.hpp:156) — static for an always-taken target, dynamic when the slot has already stored the resolved nPC into the nPC slot. SPARC does this for CALL, BA, JMPL, and true conditional Bicc with a trapping delay slot (sparc_frontend.cpp:689). A delay-trap block must stay the region entry; build_jit_region refuses to bury one mid-region.

Step 1 — Choose the GuestState layout¶

GuestState (src/ir/include/lince/ir/guest_state.hpp) is an opaque byte region; the architecture owns its layout. Define the offsets in a header. The SPARC layout (core::layout in src/core/include/lince/core/cpu_state.hpp:40, re-exported through src/arch/sparc/include/lince/arch/sparc/sparc_layout.hpp) is the model:

[0]    8 globals %g0..%g7                                    GlobalsBase
[32]   NumWindows*16 windowed slots (window_slot() does the math) WindowedBase
[544]  Y, PSR, WIM, TBR, PC, nPC, ASR17                      SpecialBase
StateSize = 572

reg_offset(cwp, r) (cpu_state.hpp:74) resolves a register reference to a byte offset using the block's cwp — which is constant within a block (decision 50), so this is a translate-time computation, not a runtime indexed access. An ARM frontend places r0..r15, CPSR, and the banked registers for the current mode at fixed offsets the same way.

Constraints:

Power-of-two access sizes (½/4/8) keep both the interpreter and the JIT lowering simple. Avoid arbitrary (offset, size) reads.
Keep mode_ctx small (it multiplies block-cache entries — see Step 3).
The blob holds architectural integer state only. FP registers, cache-control registers, and step-loop micro-state (annul / pending-mode-write / error / power-down) are not in the blob — the IR never touches them.

A typed view is optional sugar

SPARC ships SparcView (sparc_layout.hpp:37), an ergonomic accessor used by tests and the lockstep harness to read architectural registers by name. The frontend itself emits raw LdState/StState by offset; the view is not on the translate path.

Step 2 — Implement `IArchitecture`¶

A concrete IArchitecture (model: sparc_arch.hpp + sparc_arch.cpp):

class ArmArchitecture final : public ir::IArchitecture {
    std::string_view name() const override        { return "arm"; }
    std::size_t      state_size() const override   { return layout::StateSize; }
    void             reset_state(GuestState&) const override;       // power-on values
    ir::ModeCtx      mode_ctx_of(const GuestState&) const override; // Step 3
    std::uint32_t    pc_of(const GuestState&) const override;       // read PC offset
    ir::IArchFrontend& frontend() override         { return frontend_; }
    void             take_exception(GuestState&, std::uint32_t code) override;
private:
    ArmFrontend frontend_;
};

reset_state writes the power-on register values into the blob (SparcArchitecture::reset_state, sparc_arch.cpp:30: S=1, ET=0, CWP=0, impl/ver, pc=0, npc=4).
pc_of reads the architectural PC from its blob offset (sparc_arch.cpp:49).
take_exception is the arch-specific trap delivery: compute the vector, save the return state, switch mode. The IR only names the reason via ExitKind::Exception / exit_code; this method does the delivery. For SPARC, SparcArchitecture::take_exception (sparc_arch.cpp:53) performs the SPARC V8 §7.4 trap entry on the blob — set TBR.tt, rotate CWP, set PS/S/ET, save PC/nPC into the new window's %l1/%l2, and jump to the handler (honouring single-vector trapping via ASR17.SVT).

Step 3 — Define the mode context¶

mode_ctx_of extracts the small arch value that, with the entry PC, keys the block cache. It must capture everything the frontend uses to statically resolve offsets or decode:

Arch	`mode_ctx` packs
SPARC V8 (today)	`CWP` (window pointer); S/PS/EF join as the privileged/FP paths land
ARMv7-M (suggested)	processor mode + Thumb (T) bit + data-endianness (E) bit

SPARC's mode_ctx_of (sparc_arch.cpp:43) returns ModeCtx{PSR & CwpMask} — only CWP affects register-offset resolution today. Because mode-changing instructions are block terminators (decision 50), mode_ctx is constant for a block's lifetime. Keep it within a byte or two: ModeCtx is part of the cache key, so a wide mode multiplies cache entries.

Step 4 — Implement `translate_block`¶

Decode from pc until the first terminator and return the IrBlock. The SPARC frontend (sparc_frontend.cpp:624) is the template. Its structure:

Read the mode fields needed for offset resolution (SPARC: cwp = mode & CwpMask, sparc_frontend.cpp:627).
Loop decoding instructions (SPARC reuses core::decode, capped at MaxInsns = 256, stamping each PC with set_cur_pc):
Fetch fault → an ExitKind::Exception block (instruction-access trap, sparc_frontend.cpp:643).
Straight-line op → emit its IR (translate_simple, sparc_frontend.cpp:478): ALU as emit_binary, loads/stores as emit_ld_guest/emit_st_guest with the arch's endianness, condition codes as explicit emit_st_state. ++insn_count, advance cur.
Trap-capable op → set_cur_pc(pc) then emit_trap_if(cond, tt) so a mid-block fault reports the exact PC (alignment, window over/underflow, divide-by-zero — sparc_frontend.cpp:302, :809, :386).
Control transfer → set block.exit and return, including the delay slot in this block if the ISA has one. SPARC handles Branch / Call / Jmpl each with its delay-slot semantics (sparc_frontend.cpp:650–793), translating branch + delay-slot as the trailing edge. CALL sets is_call = true so the region builder pulls in the return block.
Mode-changing op → emit its effect, set mode_change_kind and exit_mode_ctx, set the exit, return. SPARC SAVE/RESTORE shift CWP by a translate-time-constant delta → StaticDelta with exit_mode_ctx = ModeCtx{new_cwp} (sparc_frontend.cpp:799–833); the trap probe runs before any state change.
Anything not yet handled → bail_at(cur) and return so the run loop uses the per-instruction reference path (partial-but-correct). can_translate (sparc_frontend.cpp:191) gates the translatable set; SPARC currently bails on FP, atomics, alternate-space access, RETT/Ticc, the cc-setting multiply-step/divide forms, LDD/STD, and most special-register access.
Set entry_pc, mode_ctx, insn_count (SPARC sets these incrementally as it goes).

Emit explicit endianness on every memory op (ir::MemEndian::Big for SPARC, Little for ARM). The interpreter and the JIT both honour it through the shared guest_memory.hpp.

Step 5 — Decide the canonical state representation¶

The run loop (Emulator::run_ir_quantum, emulator.cpp:950) runs IR blocks against a GuestState and reads the architectural state from it.

A fresh architecture is GuestState-native. The blob is the canonical state. No sync layer is needed: allocate a GuestState of ir_arch_->state_size(), hand gs.bytes().data() to the JIT and gs to the interpreter, and store micro-state (annul, delayed-mode-write, error, power-down) in arch-private fields outside the blob.

SPARC is the exception — and even it no longer syncs

SPARC predates the IR, so its reference interpreter (core::step), GDB stub, and per-instruction observer use core::CpuState. Rather than keep two copies in sync, state was unified: core::CpuState embeds an ir::GuestState int_state_ and exposes it via CpuState::guest_state() (cpu_state.hpp:302). The frontend emits LdState/StState against core::layout offsets, and core::step reads/writes the same bytes — one representation, no copy. The old sync_cpu_to_guest / sync_guest_to_cpu helpers (and sparc_sync.cpp) were removed by that unification. A new arch with no legacy core skips all of this and is GuestState-native from day one.

Step 6 — Wire it into the Emulator¶

The run loop holds the architecture behind the ir::IArchitecture pointer ir_arch_, so it never names a concrete ISA. Today initialize() constructs arch::sparc::SparcArchitecture when translation is on. Adding an arch means selecting the implementation there — e.g. from EmulatorConfig::soc_family or a new arch field — and the run loop picks up state_size(), mode_ctx_of, frontend(), pc_of, and take_exception through the interface. The BlockCache, IR interpreter, tiered JIT, clean-boundary gate, region builder, and code-flush machinery are all arch-agnostic and need no change.

The neutral ExecutionEngine run loop (execution_engine.cpp) names arch::sparc nowhere: it reads no SPARC layout::NpcOff/PcOff blob offsets, and interrupt acknowledgement flows through the opaque IArchitecture::InterruptDecision::ack_mask (architecture.hpp:113) — the arch forms the controller bits (SPARC/GRLIB IRQMP: 1u << level), the engine passes them through without reconstructing a bitmask.

What is still SPARC-coupled in the engine today

The ExecutionEngine still holds std::vector<core::CpuState> cores_ and calls core::step inline (gated by arch_is_sparc_), so where it needs a delay-slot boundary it reads the SPARC lens accessor (sparc->npc()), and lince_runtime PUBLIC-links lince::core. A GuestState-native frontend does not use these paths — they run only under arch_is_sparc_. Relocating cores_ into the architecture (so the engine drops core/ entirely) is designed but deferred; see plans/arch-decoupling.md "Remaining". Until then, a second frontend coexists with the SPARC-gated paths rather than removing them.

For a GuestState-native arch you must also provide the canonical blob the run loop reads — for SPARC that is CpuState::guest_state(); a new arch supplies its own GuestState of state_size() bytes initialised by reset_state.

Step 7 — Validate by lockstep¶

Lockstep is the safety net (jit.md § Validation strategy). A new frontend is trusted only when it is bit-identical to a reference:

IR interpreter vs a reference core — if the arch has an independent reference interpreter, run both and memcmp the GuestState after every block (SPARC: tests/integration/test_ir_diff_lockstep.cpp + tests/support/ir_diff_harness.hpp). Without a second implementation, validate against the real guest's expected output (e.g. an RTEMS/bare-metal testsuite).
JIT vs IR interpreter — block-level lockstep (tests/unit/test_ir_jit.cpp): a block compiled and run leaves an identical blob + guest memory. This comes for free once translate_block is correct, because the JIT lowers the same IR the interpreter runs.
Full workload — boot a guest with translation = true and confirm the pass set matches the translation = false (Switch interpreter) reference path.

Per the plan, an architecture is trusted only after IR-vs-reference is bit-identical; the JIT is never trusted ahead of the interpreter.

The JIT is free¶

Once translate_block produces correct IR, the LLVM JIT compiles it with no arch-specific work: the lowering in src/jit/src/ir_jit.cpp consumes the arch-neutral IrBlock, the guest-memory helpers honour the endian field, and the tiered baseline/optimised pipeline (jit.md) applies unchanged. The one host assumption is little-endian (the inline-RAM byte-reverse, ADR-003), which is independent of the guest endianness — a little-endian guest's accesses take the helper path, a big-endian guest's take the inline llvm.bswap path.

Worked sketch — a minimal ARMv7-M frontend¶

The plan's T5 proof-of-seam (plans/phase11-arch-neutral-ir.md) recommends Cortex-M / ARMv7-M Thumb-2: no MMU, the simplest exception model.

Concern	ARMv7-M choice
GuestState	`r0..r15`, `xPSR`, banked SP (MSP/PSP) at fixed offsets
`mode_ctx`	Thumb is always 1 on M-profile; pack handler/thread mode + the E-bit
Endianness	`ir::MemEndian::Little` on every `LdGuest`/`StGuest`
Flags	compute N/Z/C/V into `xPSR` offset bits with `emit_st_state` (decision 52)
Block terminators	`B`/`BL`/`BX`, `IT`-block boundaries, `SVC` (→ `Exception`), mode change on exception entry/return
Exceptions	`take_exception` stacks the ARM exception frame and vectors via VTOR
Atomics	`LDREX`/`STREX` are block boundaries, like SPARC CASA (decision 53)
Canonical state	GuestState-native (no legacy core) — Step 5 short path

Scope T5 to a bare-metal blinky or a minimal RTEMS-ARM image; it is a proof of the seam, not a product ARM target. T5 is what confirms or corrects decisions 49–53 against a real second implementation — do not over-polish the abstraction before it exists.

Checklist¶

GuestState layout header with reg_offset / window or bank math and StateSize.
IArchitecture: name, state_size, reset_state, mode_ctx_of, pc_of, frontend, take_exception.
IArchFrontend::translate_block emitting arch-neutral IR with explicit endianness, mode_change_kind/exit_mode_ctx on mode-changing blocks, per-PC stamping (set_cur_pc) on trapping ops, and delay-slot trap fixups if the ISA has delay slots.
Block terminators cover every control transfer, trap, atomic, and mode change; unhandled ops bail (insn_count == 0 / bail_at).
Canonical state: GuestState-native, or (legacy-core arch) the unified-blob route SPARC uses.
Emulator wiring selects the architecture when translation is on.
Lockstep: IR-vs-reference bit-identical, then JIT-vs-IR (free), then a full workload.
New module under src/arch/<arch>/; lince_ir and lince_jit unchanged.