Adding a guest-architecture frontend¶
A new guest ISA is a new frontend, not a new core. The arch-neutral IR
(jit.md), the IR interpreter, and the LLVM JIT are shared; an
architecture supplies only the code that turns its guest bytes into the
arch-neutral ir::IrBlock plus a handful of coarse services. This page is the
contributor procedure, with the committed SPARC V8 implementation
(src/arch/sparc/) as the worked reference.
The boundary is two interfaces in src/ir/include/lince/ir/architecture.hpp:
| Interface | What it supplies | Call frequency |
|---|---|---|
ir::IArchFrontend |
translate_block(state, bus, pc, mode) → IrBlock |
once per untranslated (pc, mode) |
ir::IArchitecture |
state_size, reset_state, mode_ctx_of, pc_of, frontend, take_exception, name |
once per block / quantum / event |
Neither is on the per-instruction path. The only per-instruction cost is the IR op interpreter (or, when JIT-compiled, native code) — both arch-neutral.
flowchart LR
subgraph New["src/arch/<arch>/ (you write this)"]
FE["ArchFrontend : IArchFrontend<br/>translate_block"]
AR["ArchArchitecture : IArchitecture<br/>state_size / mode_ctx_of /<br/>pc_of / take_exception"]
LO["arch_layout.hpp<br/>GuestState offsets + reg math"]
end
subgraph Shared["unchanged"]
IR["lince_ir<br/>IrBlock / interpreter / BlockCache"]
JIT["lince_jit<br/>lower_block / TieredJit"]
RT["lince_runtime<br/>run_ir_quantum"]
end
FE -->|emits IrBlock| IR
AR -->|state_size, take_exception| RT
LO -.->|offsets| FE
IR --> JIT
RT -->|holds IArchitecture*| AR
What a frontend must not do¶
These are the invariants that keep the IR shareable. Breaking one re-couples the ISA to the engine and defeats the seam.
- No register names in the IR. Guest registers are byte offsets into the
GuestStateblob, touched only viaLdState/StState. SPARC register windows and ARM register banking are the frontend's choice of offset, invisible to the IR (decision 49). - No flags register. Condition codes are explicit guest-state writes; the
frontend computes each bit into its offset (SPARC:
flags_add/flags_sub/flags_logical,sparc_frontend.cpp:84). There is no shared NZVC/NZCV concept (decision 52). - No endianness in the bus.
LdGuest/StGuestcarry{size, endianness}; the swap happens in the op (interpreter,guest_memory.hpp) or lowered code (JIT). A big-endian SPARC and a little-endian ARM frontend emit the same ops with a differentendianfield (decision 51). - No mode dynamism inside a block. Any instruction that changes the mode context (window rotation, processor-mode switch, trap entry, Thumb toggle) is a block terminator, so within a block the mode is constant and every mode-dependent offset is resolved at translate time (decision 50).
The IR a frontend emits¶
src/ir/include/lince/ir/ir.hpp defines everything a frontend produces. An
IrBlock is a vector of IrInst plus one structured IrExit. Build it with the
member helpers — never hand-fill an IrInst:
| Helper | Emits | Notes |
|---|---|---|
emit_const(imm) → Temp |
Const |
block-local value |
emit_ld_state(off, size) → Temp |
LdState |
read guest reg (host order) |
emit_st_state(off, size, src) |
StState |
write guest reg |
emit_ld_guest(addr, size, endian) → Temp |
LdGuest |
guest memory load |
emit_st_guest(addr, val, size, endian) |
StGuest |
guest memory store |
emit_binary(Op, a, b) → Temp |
ALU/Cmp | Add Sub And Or Xor Shl Shr Sar Mul UMulHi SMulHi UDiv SDiv CmpEq CmpLtU CmpLtS |
emit_ternary(Op, a, b, c) → Temp |
UDiv64/SDiv64 |
three-operand divide (high|low / divisor) |
emit_select(cond, t, f) → Temp |
Select |
cond ? t : f |
emit_trap_if(cond, code) |
TrapIf |
mid-block conditional trap at cur_pc |
set_cur_pc(pc) |
— | stamp the PC onto subsequent trapping ops |
Temp values are block-local SSA-free temporaries; they die at the block
boundary. Cross-instruction state flows through the GuestState blob, never
through temps. See jit.md § The IR data model for
the full op semantics.
Set the block's identity and terminator:
block.entry_pc— the guest PC the block starts at.block.mode_ctx— the mode context (see Step 3); together withentry_pcit keys the block cache.block.insn_count— guest instructions covered (the run loop bills sim-time from it).block.mode_change_kind—ir::ModeChangeKind(ir.hpp:112):None(default),StaticDelta(a translate-time-constant mode shift; the region compiler may chain across it underexit_mode_ctx), orDynamic(an unknown post-change mode — a region terminator).block.exit_mode_ctx— the mode the static successor runs under; set it alongsideStaticDelta.block.exit— anIrExit:
ExitKind |
Fields | Meaning |
|---|---|---|
FallThrough / StaticBranch |
static_target (+ is_call) |
continue at a fixed PC |
CondBranch |
cond, static_target, fallthrough_target |
cond ? target : fallthrough |
IndirectBranch |
dyn_target (Temp) |
computed target |
Exception |
exit_code |
deliver an architectural trap |
PowerDown |
— | core halted pending interrupt |
A block whose entry has no translatable instruction returns insn_count == 0;
the run loop falls back to the per-instruction reference path for that PC. A
bail_at(pc) is the SPARC idiom: a FallThrough to the current PC with no ops,
which the run loop's insn_count == 0 check turns into a single fallback step.
Delay-slot traps (delay-slot ISAs only)
If the ISA has a branch delay slot and the slot can fault (load/store/divide),
a fault there must save the branch's resolved nPC, not trap_pc + 4. Record
it on the block via delay_trap_pc / delay_trap_npc / delay_trap_dynamic
(ir.hpp:156) — static for an always-taken target, dynamic when the slot has
already stored the resolved nPC into the nPC slot. SPARC does this for CALL,
BA, JMPL, and true conditional Bicc with a trapping delay slot
(sparc_frontend.cpp:689). A delay-trap block must stay the region entry;
build_jit_region refuses to bury one mid-region.
Step 1 — Choose the GuestState layout¶
GuestState (src/ir/include/lince/ir/guest_state.hpp) is an opaque byte
region; the architecture owns its layout. Define the offsets in a header. The
SPARC layout (core::layout in src/core/include/lince/core/cpu_state.hpp:40,
re-exported through src/arch/sparc/include/lince/arch/sparc/sparc_layout.hpp) is
the model:
[0] 8 globals %g0..%g7 GlobalsBase
[32] NumWindows*16 windowed slots (window_slot() does the math) WindowedBase
[544] Y, PSR, WIM, TBR, PC, nPC, ASR17 SpecialBase
StateSize = 572
reg_offset(cwp, r) (cpu_state.hpp:74) resolves a register reference to a byte
offset using the block's cwp — which is constant within a block (decision 50),
so this is a translate-time computation, not a runtime indexed access. An ARM
frontend places r0..r15, CPSR, and the banked registers for the current mode
at fixed offsets the same way.
Constraints:
- Power-of-two access sizes (½/4/8) keep both the interpreter and the JIT
lowering simple. Avoid arbitrary
(offset, size)reads. - Keep
mode_ctxsmall (it multiplies block-cache entries — see Step 3). - The blob holds architectural integer state only. FP registers, cache-control registers, and step-loop micro-state (annul / pending-mode-write / error / power-down) are not in the blob — the IR never touches them.
A typed view is optional sugar
SPARC ships SparcView (sparc_layout.hpp:37), an ergonomic accessor used by
tests and the lockstep harness to read architectural registers by name. The
frontend itself emits raw LdState/StState by offset; the view is not on the
translate path.
Step 2 — Implement IArchitecture¶
A concrete IArchitecture (model: sparc_arch.hpp + sparc_arch.cpp):
class ArmArchitecture final : public ir::IArchitecture {
std::string_view name() const override { return "arm"; }
std::size_t state_size() const override { return layout::StateSize; }
void reset_state(GuestState&) const override; // power-on values
ir::ModeCtx mode_ctx_of(const GuestState&) const override; // Step 3
std::uint32_t pc_of(const GuestState&) const override; // read PC offset
ir::IArchFrontend& frontend() override { return frontend_; }
void take_exception(GuestState&, std::uint32_t code) override;
private:
ArmFrontend frontend_;
};
reset_statewrites the power-on register values into the blob (SparcArchitecture::reset_state,sparc_arch.cpp:30: S=1, ET=0, CWP=0, impl/ver, pc=0, npc=4).pc_ofreads the architectural PC from its blob offset (sparc_arch.cpp:49).take_exceptionis the arch-specific trap delivery: compute the vector, save the return state, switch mode. The IR only names the reason viaExitKind::Exception/exit_code; this method does the delivery. For SPARC,SparcArchitecture::take_exception(sparc_arch.cpp:53) performs the SPARC V8 §7.4 trap entry on the blob — setTBR.tt, rotate CWP, setPS/S/ET, save PC/nPC into the new window's%l1/%l2, and jump to the handler (honouring single-vector trapping viaASR17.SVT).
Step 3 — Define the mode context¶
mode_ctx_of extracts the small arch value that, with the entry PC, keys the
block cache. It must capture everything the frontend uses to statically resolve
offsets or decode:
| Arch | mode_ctx packs |
|---|---|
| SPARC V8 (today) | CWP (window pointer); S/PS/EF join as the privileged/FP paths land |
| ARMv7-M (suggested) | processor mode + Thumb (T) bit + data-endianness (E) bit |
SPARC's mode_ctx_of (sparc_arch.cpp:43) returns ModeCtx{PSR & CwpMask} —
only CWP affects register-offset resolution today. Because mode-changing
instructions are block terminators (decision 50), mode_ctx is constant for a
block's lifetime. Keep it within a byte or two: ModeCtx is part of the cache
key, so a wide mode multiplies cache entries.
Step 4 — Implement translate_block¶
Decode from pc until the first terminator and return the IrBlock. The SPARC
frontend (sparc_frontend.cpp:624) is the template. Its structure:
- Read the mode fields needed for offset resolution (SPARC:
cwp = mode & CwpMask,sparc_frontend.cpp:627). - Loop decoding instructions (SPARC reuses
core::decode, capped atMaxInsns = 256, stamping each PC withset_cur_pc): - Fetch fault → an
ExitKind::Exceptionblock (instruction-access trap,sparc_frontend.cpp:643). - Straight-line op → emit its IR (
translate_simple,sparc_frontend.cpp:478): ALU asemit_binary, loads/stores asemit_ld_guest/emit_st_guestwith the arch's endianness, condition codes as explicitemit_st_state.++insn_count, advancecur. - Trap-capable op →
set_cur_pc(pc)thenemit_trap_if(cond, tt)so a mid-block fault reports the exact PC (alignment, window over/underflow, divide-by-zero —sparc_frontend.cpp:302,:809,:386). - Control transfer → set
block.exitand return, including the delay slot in this block if the ISA has one. SPARC handles Branch / Call / Jmpl each with its delay-slot semantics (sparc_frontend.cpp:650–793), translatingbranch + delay-slotas the trailing edge. CALL setsis_call = trueso the region builder pulls in the return block. - Mode-changing op → emit its effect, set
mode_change_kindandexit_mode_ctx, set the exit, return. SPARC SAVE/RESTORE shift CWP by a translate-time-constant delta →StaticDeltawithexit_mode_ctx = ModeCtx{new_cwp}(sparc_frontend.cpp:799–833); the trap probe runs before any state change. - Anything not yet handled →
bail_at(cur)and return so the run loop uses the per-instruction reference path (partial-but-correct).can_translate(sparc_frontend.cpp:191) gates the translatable set; SPARC currently bails on FP, atomics, alternate-space access,RETT/Ticc, the cc-setting multiply-step/divide forms,LDD/STD, and most special-register access. - Set
entry_pc,mode_ctx,insn_count(SPARC sets these incrementally as it goes).
Emit explicit endianness on every memory op (ir::MemEndian::Big for SPARC,
Little for ARM). The interpreter and the JIT both honour it through the shared
guest_memory.hpp.
Step 5 — Decide the canonical state representation¶
The run loop (Emulator::run_ir_quantum, emulator.cpp:950) runs IR blocks
against a GuestState and reads the architectural state from it.
- A fresh architecture is GuestState-native. The blob is the canonical
state. No sync layer is needed: allocate a
GuestStateofir_arch_->state_size(), handgs.bytes().data()to the JIT andgsto the interpreter, and store micro-state (annul, delayed-mode-write, error, power-down) in arch-private fields outside the blob.
SPARC is the exception — and even it no longer syncs
SPARC predates the IR, so its reference interpreter (core::step), GDB stub,
and per-instruction observer use core::CpuState. Rather than keep two copies
in sync, state was unified: core::CpuState embeds an
ir::GuestState int_state_ and exposes it via CpuState::guest_state()
(cpu_state.hpp:302). The frontend emits LdState/StState against
core::layout offsets, and core::step reads/writes the same bytes — one
representation, no copy. The old sync_cpu_to_guest / sync_guest_to_cpu
helpers (and sparc_sync.cpp) were removed by that unification. A new
arch with no legacy core skips all of this and is GuestState-native from day
one.
Step 6 — Wire it into the Emulator¶
The run loop holds the architecture behind the ir::IArchitecture pointer
ir_arch_, so it never names a concrete ISA. Today initialize() constructs
arch::sparc::SparcArchitecture when translation is on. Adding an arch means
selecting the implementation there — e.g. from EmulatorConfig::soc_family or a
new arch field — and the run loop picks up state_size(), mode_ctx_of,
frontend(), pc_of, and take_exception through the interface. The
BlockCache, IR interpreter, tiered JIT, clean-boundary gate, region builder,
and code-flush machinery are all arch-agnostic and need no change.
The neutral ExecutionEngine run loop (execution_engine.cpp) names
arch::sparc nowhere: it reads no SPARC layout::NpcOff/PcOff blob offsets,
and interrupt acknowledgement flows through the opaque
IArchitecture::InterruptDecision::ack_mask (architecture.hpp:113) — the arch
forms the controller bits (SPARC/GRLIB IRQMP: 1u << level), the engine passes
them through without reconstructing a bitmask.
What is still SPARC-coupled in the engine today
The ExecutionEngine still holds std::vector<core::CpuState> cores_ and
calls core::step inline (gated by arch_is_sparc_), so where it needs a
delay-slot boundary it reads the SPARC lens accessor (sparc->npc()), and
lince_runtime PUBLIC-links lince::core. A GuestState-native frontend does
not use these paths — they run only under arch_is_sparc_. Relocating
cores_ into the architecture (so the engine drops core/ entirely) is
designed but deferred; see plans/arch-decoupling.md "Remaining". Until then,
a second frontend coexists with the SPARC-gated paths rather than removing
them.
For a GuestState-native arch you must also provide the canonical blob the run
loop reads — for SPARC that is CpuState::guest_state(); a new arch supplies its
own GuestState of state_size() bytes initialised by reset_state.
Step 7 — Validate by lockstep¶
Lockstep is the safety net (jit.md § Validation strategy).
A new frontend is trusted only when it is bit-identical to a reference:
- IR interpreter vs a reference core — if the arch has an independent
reference interpreter, run both and
memcmptheGuestStateafter every block (SPARC:tests/integration/test_ir_diff_lockstep.cpp+tests/support/ir_diff_harness.hpp). Without a second implementation, validate against the real guest's expected output (e.g. an RTEMS/bare-metal testsuite). - JIT vs IR interpreter — block-level lockstep
(
tests/unit/test_ir_jit.cpp): a block compiled and run leaves an identical blob + guest memory. This comes for free oncetranslate_blockis correct, because the JIT lowers the same IR the interpreter runs. - Full workload — boot a guest with
translation = trueand confirm the pass set matches thetranslation = false(Switch interpreter) reference path.
Per the plan, an architecture is trusted only after IR-vs-reference is bit-identical; the JIT is never trusted ahead of the interpreter.
The JIT is free¶
Once translate_block produces correct IR, the LLVM JIT compiles it with no
arch-specific work: the lowering in src/jit/src/ir_jit.cpp consumes the
arch-neutral IrBlock, the guest-memory helpers honour the endian field, and
the tiered baseline/optimised pipeline (jit.md) applies unchanged.
The one host assumption is little-endian (the inline-RAM byte-reverse, ADR-003),
which is independent of the guest endianness — a little-endian guest's accesses
take the helper path, a big-endian guest's take the inline llvm.bswap path.
Worked sketch — a minimal ARMv7-M frontend¶
The plan's T5 proof-of-seam (plans/phase11-arch-neutral-ir.md) recommends
Cortex-M / ARMv7-M Thumb-2: no MMU, the simplest exception model.
| Concern | ARMv7-M choice |
|---|---|
| GuestState | r0..r15, xPSR, banked SP (MSP/PSP) at fixed offsets |
mode_ctx |
Thumb is always 1 on M-profile; pack handler/thread mode + the E-bit |
| Endianness | ir::MemEndian::Little on every LdGuest/StGuest |
| Flags | compute N/Z/C/V into xPSR offset bits with emit_st_state (decision 52) |
| Block terminators | B/BL/BX, IT-block boundaries, SVC (→ Exception), mode change on exception entry/return |
| Exceptions | take_exception stacks the ARM exception frame and vectors via VTOR |
| Atomics | LDREX/STREX are block boundaries, like SPARC CASA (decision 53) |
| Canonical state | GuestState-native (no legacy core) — Step 5 short path |
Scope T5 to a bare-metal blinky or a minimal RTEMS-ARM image; it is a proof of the seam, not a product ARM target. T5 is what confirms or corrects decisions 49–53 against a real second implementation — do not over-polish the abstraction before it exists.
Checklist¶
-
GuestStatelayout header withreg_offset/ window or bank math andStateSize. -
IArchitecture:name,state_size,reset_state,mode_ctx_of,pc_of,frontend,take_exception. -
IArchFrontend::translate_blockemitting arch-neutral IR with explicit endianness,mode_change_kind/exit_mode_ctxon mode-changing blocks, per-PC stamping (set_cur_pc) on trapping ops, and delay-slot trap fixups if the ISA has delay slots. - Block terminators cover every control transfer, trap, atomic, and mode
change; unhandled ops bail (
insn_count == 0/bail_at). - Canonical state: GuestState-native, or (legacy-core arch) the unified-blob route SPARC uses.
- Emulator wiring selects the architecture when
translationis on. - Lockstep: IR-vs-reference bit-identical, then JIT-vs-IR (free), then a full workload.
- New module under
src/arch/<arch>/;lince_irandlince_jitunchanged.
See also¶
- Arch-neutral IR and the LLVM JIT — the IR data model, the run loop, and the JIT this frontend feeds.
- Execution model — the
core::stepreference cycle the run loop falls back to. decisions.md— decisions 49–53 (the neutrality invariants this page enforces).