Skip to content

Design decisions

This is the canonical decision log for Tero. It has two parts:

  1. Architectural Decision Records (ADRs) — the four frozen, capital-A architectural decisions (ADR-001..004) that govern the 1:1 real-time roadmap. These change only with explicit human approval and an ADR update; they are reproduced here as the canonical decision log from CLAUDE.md and plans/post-mvp-1to1-roadmap.md.
  2. Numbered implementation decisions (1–59) — judgment calls taken during implementation that go beyond the frozen invariants in design principles. Each entry is a fact not visible from reading the code alone — the reasoning, the context, the alternative considered. Cross-references from prose elsewhere use the form (Decision N); the numbering is historical and must not be renumbered.

For the chronological context (when and why a session's batch of decisions happened), see Status and the archive. For the non-negotiable invariants (zero singletons, errors-as-values, …) see Design principles.

How to use this page

You want to … Action
Understand a frozen architectural choice Read the ADRs.
Find a specific (Decision N) reference from another doc Search for **N. in this file. Numbers are stable.
See all decisions for one module Use the section headers below.
Add a new decision Append it to the appropriate section with the next free number, then reference it as Decision N from prose. Do not renumber existing entries.
Change an ADR Get explicit human approval, edit plans/post-mvp-1to1-roadmap.md and CLAUDE.md first, then sync this page.

Sections


Architectural Decision Records (ADRs)

These four decisions freeze the major choices for the 1:1 real-time roadmap (plans/post-mvp-1to1-roadmap.md). They are frozen: override only with explicit human approval and an ADR update in that file. Each ADR below states the decision, the rationale, the mechanism (with code where it exists), and the consequences.

Relationship to the frozen technical decisions

The ADRs sit alongside the frozen technical decisions in CLAUDE.md (language/toolchain, architecture principles, execution model, timing, multi-core, memory/bus, FT, caches, MMU, FPU). Those are reproduced for a human reader in Design principles; the ADRs here are the four that specifically govern the post-MVP performance roadmap.

ADR-001 — Dual execution mode (single-thread + multi-thread)

Decision. EmulatorConfig::execution_mode selects between ExecutionMode::SingleThread (default, SMP2-compatible) and ExecutionMode::MultiThread (each simulated core on its own host thread, standalone-only). The field lives in src/runtime/include/tero/runtime/emulator_config.hpp:35-48.

Rationale. 1:1 GR740 needs aggregate throughput beyond what a single host thread can deliver even with a top-tier JIT (measured per-core Dhrystone ≈ 0.82× of GR740's per-core target; one host thread tops out well below the quad-core aggregate). Multi-thread is mandatory to close the gap. Single-thread must remain available for the external SMP2 wrapper, which expects single-threaded model execution driven by its own scheduler.

Mechanism. Thread-safe primitives are runtime-gated, not conditionally compiled, so single-thread mode pays near-zero sync overhead while both modes live in one binary:

  • Mutexes are tero::GatedMutex (src/interfaces/include/tero/gated_mutex.hpp) — a std::mutex that no-ops when its gate is inactive. The gate is derived from execution_mode and set once at initialize(), before any worker thread starts.
  • The thread-per-core dispatcher uses std::barrier: start_workers spawns N−1 worker threads parked at a start barrier; each round the main thread releases them, runs core 0 itself, and rejoins at a done barrier (src/runtime/src/emulator.cpp:690-742, :855-875).
  • Atomics use the ordering the SPARC-TSO/x86-64-host invariant allows (Decision 53; ADR-003 fixes the host so SPARC TSO maps with zero fences).

This supersedes the original compile-time NullMutex template-parameter mechanism — the runtime gate is required by the config-by-struct / no-behaviour-build-flags rule (principle 4).

Consequences. A GDB stub or a per-instruction observer forces SingleThread (they need a single, well-ordered instruction stream). MultiThread runs the per-core JIT path with per-core IR caches and TieredJits (each with its own Tier-2 background O2 thread, P14-2). Byte-identical ST==MT is impossible for lock-contended workloads (ST round-robin has no contention); the achievable determinism gate is shared-state convergence, not bit-exactness. Orthogonal to translation: either execution method runs under either mode.

ADR-002 — Tiered JIT (baseline + optimising)

Decision. The JIT has two compilation tiers:

  • Tier 1 (baseline): fast translation of a block to LLVM IR with minimal passes (O0). Used immediately when a block crosses its baseline threshold so a cold block runs at once.
  • Tier 2 (optimising): full LLVM O2 pipeline, compiled on a background thread; promotion is atomic.

The knobs are EmulatorConfig::jit_baseline_threshold (default 32, interpret-first warmup), jit_promotion_threshold (default 100), jit_background_opt (default true), and jit_max_region_blocks (default 8) — all in emulator_config.hpp:181-218. TieredJit (src/jit/src/tiered_jit.cpp) drives two IrJits (Decision 54).

Rationale. 1:1 requires bounded jitter, not just average throughput. A single optimising tier introduces visible compilation pauses on cold paths (multi-millisecond stalls). Tier 1 provides immediate execution while Tier 2 runs out-of-band — the same pattern HotSpot, V8, and Graal use. Measured: the single synchronous O2 tier put RTEMS boot at 4.5 s and the sptest-JitIr suite at ~15 min; the tier cut those to 1.0 s / 6.3 min with steady-state throughput unchanged (Decision 54).

Consequences. Blocks interpret-first (run on the IR interpreter until proven hot), then compile fast at O0, then recompile at O2 in the background. Both tiers lower the identical IR, so they are interchangeable. Region chaining fuses a same-mode block region into one LLVM function (Decision 55), budget-bounded so a fused region yields at the exact quantum boundary the switch would (no SMP interleaving drift). Full engine detail: IR and LLVM JIT.

ADR-003 — Host requirements

Decision. The tier-1 host for 1:1 acceptance is x86-64 Linux with ≥ 4 physical cores, AVX2, ≥ 16 GiB RAM, GCC ≥ 13 or Clang ≥ 17, LLVM ≥ 18. ARM64 Linux and macOS are tier-2 (functional but no 1:1 guarantee). The LLVM floor is enforced in CMake (CMakeLists.txt:110-114, FATAL_ERROR below 18).

Rationale.

  • x86-64 native TSO maps SPARC TSO trivially — zero memory fences needed (Decision 53). ARM's weak memory model would require explicit fences with measurable overhead, which is why the 1:1 acceptance host is fixed to x86-64.
  • AVX2 enables LLVM's vectorisation passes for hot helpers.
  • LLVM ≥ 18 because the JIT uses llvm::CodeGenOptLevel, which LLVM 18 introduced (LLVM 17 spells it llvm::CodeGenOpt::Level). The floor was raised from the original 17 to 18 after the code adopted the 18 API — a CI runner's bundled LLVM 17 passed the old >= 17 gate but then failed to compile (Decision 57).

Consequences. LLVM is a mandatory, unconditional dependency (find_package(LLVM REQUIRED CONFIG)), not gated behind a build option — the old TERO_ENABLE_JIT flag is gone (Decision 57). The strict host requirements are part of the build's contract, not the user's responsibility (CLAUDE.md toolchain section).

ADR-004 — No cycle-accurate timing

Decision. 1:1 means simulated seconds per wall-clock second, not simulated cycles per host cycle. Pipeline-level emulation is explicitly out of scope. Simulated-time fidelity is handled orthogonally by the CPI model (a single global EmulatorConfig::cpi, default 1.0; the planned bucket-resync refinement is P10).

Rationale. Cycle-accurate adds 5–10× host cost for fidelity that neither the RTEMS testsuite nor external-simulator integration require. A single global CPI captures the dominant simulated-time scaling without per-opcode hot-path cost; the bucket-resync model (when it lands) will capture the dominant variability (cache misses, write-buffer stalls, AHB contention) at scheduler boundaries, not per instruction.

Consequences. There is no per-opcode CPI table — Tero models a single global CPI. ns_per_insn = cpi * ns_per_cycle(cpu_clock_hz) (ns_per_insn_for, emulator_config.hpp:65), recomputed at Emulator::create (emulator.cpp:227). Caches and the SRMMU are not modelled (they are timing concerns Tero deliberately skips); the GPTIMER prescaler stays on the raw ns_per_cycle() (the peripheral/bus clock, unaffected by cpi). See Multicore and timing.


Bus and memory

1. SystemBus is non-copyable and non-movable. It owns RAM via unique_ptr<Ram> and peripherals cache raw pointers into it. Moving the bus would invalidate them. If you need to relocate a bus, construct a new one.

2. Big-endian translation lives in SystemBus, not in Ram. Ram holds raw bytes as they appear on the wire; SystemBus::{encode,decode}_be() does the BE shuffle at the typed access boundary. Keeps RAM trivially snapshot-able and matches how a real memory controller behaves.

3. Single region per access. A byte-span access that straddles two regions returns ErrorCode::BusError. Real hardware latches one transaction against one target; the bus does not silently split.

4. MMIO requires 1 / 2 / 4-byte naturally aligned accesses. Anything else is rejected at the bus with BusError or AlignmentError. CPU alignment traps belong in the handlers, not in the bus.

5. Bus does not own peripherals. SystemBus::map_peripheral takes a non-owning IPeripheral*. The runtime owns unique_ptr<IPeripheral> and hands raw pointers to the bus.

26. 4 KB RAM mapped at physical address 0x0. Real GR712RC boots from ROM at address 0. The RTEMS idle loop does lda [%o0] 0x1c, %g0 where %o0 = 0xFFFFFFF0; SPARC V8 address wrap-around lands at physical address 0. Mapping 4 KB of RAM at address 0 avoids a spurious data_access_exception in the idle loop.


Decoder and ISA

8. FPop1/FPop2 decoded as InsnKind::FpOp. op = 10, op3 ∈ {0x34, 0x35}. The handler returns FpDisabled (tt = 0x04), the correct LEON3 behaviour when no FPU is present (PSR.EF = 0). Coprocessor opcodes (op3 ∈ {0x36, 0x37}) remain Unknown and map to IllegalInstruction.

9. Instruction-fetch vs data-access bus errors are distinguished. ExecStatus::InsnFetchError (tt=0x01, instruction_access_exception) for failed fetches; BusError (tt=0x09, data_access_exception) for load/store failures.

10. TADDccTV / TSUBccTV set icc and write rd even when trapping. SPARC V8 §B.30: the tagged-add/sub trap variant computes the result and condition codes first, then traps if V is set. The handler writes rd and icc before returning ExecStatus::TagOverflow. The spec makes the result "unpredictable"; deterministic write makes tests reproducible.

11. handlers.cpp split into category files. handlers_alu.cpp, handlers_branch.cpp, handlers_loadstore.cpp, handlers_regwin.cpp, handlers_special.cpp, with shared helpers (alu_op2, eval_cond) in handlers_internal.hpp. The public execute() dispatcher remains in handlers.cpp.

21. BA encoding uses disp22 = 0, not 1. Earlier test_bare_metal.cpp helpers encoded BA with disp22 = 1, which targets PC+4 (the delay slot itself). The fix sets disp22 = 0 so BA .+0 becomes a proper self-branch when intended.

30. Cache config registers use GR712RC-specific values. Both I$ and D$ were 0x08101004 (placeholder). Replaced with: I$ = 0x132308e8 (4-way, 16 KiB, LRU, snooping, MMU present), D$ = 0x1b2208f8 (4-way, 16 KiB, LRU, snooping, MMU present, write-through). Read-only registers accessed via ASI 0x02 at addresses 0x08 and 0x0C. GR712RC §5.2.

31. ASR17 reset value includes GR712RC-specific fields. Original value only had V8 mul/div (bit 20) and NWINDOWS-1=7. Added FPU type [11:10]=01 (GRFPU) and watchpoints [7:5]=010 (2 watchpoints). New value: (1U << 20) | (1U << 11) | (2U << 5) | 7U = 0x100847. GR712RC §4.2.

32. PSR reset includes LEON3 impl/version fields. After reset, PSR had only S=1 (impl=0, ver=0). Fixed to include impl=0xF (Gaisler) and ver=0x3 (LEON3FT). New reset: (0xFU << 28) | (0x3U << 24) | kSBit = 0xF3000080. These fields are read-only masked so WRPSR preserves them. GR712RC §4.1.

37. enter_trap() clears annul_next_. SPARC V8 §5.1.2.2 specifies that the annul-bit mechanism is per-CTI and does not persist past a trap entry. CpuState::enter_trap() explicitly clears annul_next_ to prevent the first instruction of an ISR handler from being silently dropped when a hardware interrupt arrives after an annulled delay slot. This was the root cause of sp11's ErrorMode crash.

38. WRPSR is applied immediately, matching SIS. SPARC V8 §5.1.2.3 permits the S, ET, PS, and CWP changes from a WRPSR to be deferred up to three instructions, but that delay is implementation latitude — real software pads WRPSR with three NOPs, so the result is observably identical. The reference oracle (Gaisler SIS) applies the write immediately, and Tero matches it: write_psr_writable() (cpu_state.cpp:69) masks the read-only fields and writes the rest straight to psr_ in one shot, with no pending-write buffer. The earlier 3-instruction pending_psr_ / commit_psr_pipeline() model was removed because it desynced the register windows when a trap fired inside the window (smpschededf03).


Peripherals

12. MemCtrl (FTMCTRL) is a passive stub. MCFG1–MCFG4 are readable / writable with no side effects. MCFG3 bit 27 (reserved, reads as 1) is forced. No timing, no bank switching — just enough for the RTEMS memory probe.

13. IrqMP IFORCE write semantics. Writing IFORCE uses a clear-then-set protocol: the upper 16 bits clear bits, the lower 16 bits set bits (both masked to IRQ lines 1–15). Matches GRLIB behaviour where software can atomically set and clear force bits in one write.

14. IrqMP pending_mask(0) includes IFR0. CPU 0's pending mask is IPEND | IFR0 | (current_mask & IFR0). For CPU N>0, pending_mask(N) is IPEND | (current_mask & IFRN). Matches the GR712RC single-CPU force register design.

15. GPTimer control register writable mask is 0x2B. Bits 0 (EN), 1 (RS), 3 (IE), 5 (CH) are directly writable. Bit 2 (LD) is write-only — triggers a reload from the counter register then clears. Bit 4 (IP) uses write-0-clear (writing 1 has no effect, writing 0 clears the pending bit). Bit 6 (DH) is read-only 0.

16. GPTimer prescaler underflow logic. On tick(), the prescaler counter is decremented first; when it reaches zero, the prescaler value is reloaded and all enabled sub-timers are ticked.

17. GPTimer timer4 watchdog defaults. After reset, timer4 has EN=RS=IE=1 and counter/reload both set to 0xFFFF. Matches the GR712RC default where the watchdog is armed and must be disabled or fed by software.

18. ApbUart uses std::queue<uint8_t> for the RX FIFO (max 8). No TX FIFO is modelled — transmit() drains immediately via ICharacterDevice. Status bit 31 (FA, FIFO available) always reads as 1 since the queue fits within 8 entries and never overflows.

19. PeripheralContext now includes ICharacterDevice*. Added for APBUart to inject console I/O. Default is nullptr; the runtime wires it to the configured character device.

20. All peripheral MMIO handlers reject non-word accesses. Byte and half-word reads/writes return AlignmentError. Stricter than GR712RC (which allows byte writes to the UART data register), but matches the MVP approach: defer narrowing to when RTEMS demands it.

22. hello_uart.S must enable CTRL.TE before transmitting. ApbUart drops writes to the data register unless TE is set in the control register. Any asm test that writes directly to APBUart MMIO must first st the TE bit into CTRL. Missing this is silent — the test runs but no output is produced.

27. GPTimer bootloader prescaler initialization. The emulator simulates the GR712RC ROM bootloader's timer setup by writing to the prescaler value and reload registers during initialize(). Without this, the prescaler counter starts at 0 and takes 0xFFFF ticks (≈ 65 ms) before the first underflow, greatly delaying the first timer interrupt.

29. GPTimer Timer 4 reset comment corrected. The comment said RS=1 but 0x09 = 0b1001 has RS=0 (bit 1 is clear). Correct value is EN=1, RS=0, IE=1. The code value was already correct; only the comment was fixed.

33. FTMCTRL P&P device ID corrects to 0x054. The AHB Plug&Play descriptor used device ID 0x00F (MCTRL) instead of 0x054 (FTMCTRL, fault-tolerant memory controller with EDAC). The GR712RC has FTMCTRL, not MCTRL. Config word changed from 0x0100f020 to 0x01054020. Affects RTEMS auto-probing.

35. DemoDmaDevice does endian-safe byte-wise XOR. The original implementation memcpy'd a uint32_t over dma_read and XORed the host-endian word, producing different results on LE vs BE hosts. The corrected version XORs each byte against the corresponding byte of the mask in BE order (MSB ↔ addr+0), matching what a SPARC ld; xor; st sequence produces.


Runtime and scheduling

6. Warning set is stricter than the CLAUDE.md minimum. tero::warnings enables, on top of -Wall -Wextra -Wpedantic -Werror: -Wshadow -Wnon-virtual-dtor -Wold-style-cast -Wcast-align -Wunused -Woverloaded-virtual -Wconversion -Wsign-conversion -Wnull-dereference -Wdouble-promotion -Wformat=2. Everything builds with 0 warnings / 0 errors under this set.

7. tests/support/dummy_peripheral is a test fixture, not a module. Lives in the test tree; must not leak into tero_peripherals.

23. Emulator exposes injection, not construction, for services. set_character_device() and set_logger() swap the defaults at runtime. Lets the CLI wire StdoutCharDevice / StdoutLogger without forcing them into EmulatorConfig, and lets tests swap in CapturingCharDevice without touching config. SMP2 wrappers will use the same injection points.

25. Idle-time skip is bounded to 1 ms. When all cores are powered down, run_until() jumps simulated time forward to the next event. The GPTimer uses direct IInterruptSource::raise() calls rather than the EventScheduler, so next_event_time() does not see timer interrupts. Without a bound, time would jump to the deadline and the GPTimer's periodic interrupt would be delivered only once. The 1 ms bound (kMaxIdleNs) ensures timer interrupts arrive at roughly their expected rate.

28. Secondary cores start in power-down mode. On real GR712RC, only CPU 0 starts executing at reset. Secondary cores are parked in power-down mode (wr %g0, %asr19) and wait for the primary to release them via IRQMP. The emulator sets is_powered_down = true for all cores except CPU 0 after loading the ELF image.

34. Emulator::add_peripheral injects ctx.bus = &bus_. Previously PeripheralContext::bus was left null for user-defined peripherals, which silently broke any DMA-capable custom peripheral attached through the public API. The runtime now always wires the bus; the rest of PeripheralContext (irq, scheduler, logger, chardev) was already handled.

36. Sptest pass criterion is "*** END OF TEST in console". The previous test required HaltReason != ErrorMode and the END OF TEST string. RTEMS sptests print END OF TEST and then unwind into a halt that triggers a trap-while-ET=0 (ErrorMode), so the strict check failed all 8 actually-passing tests. The functional pass criterion is the END OF TEST banner alone.

39. IrqMP acknowledge() gives force precedence over pending. Per GR712RC §8 (p.~115): when a processor takes an interrupt trap, the corresponding pending bit is auto-cleared. Tero models this via IrqMP::acknowledge(cpu, bit), called by Emulator::sample_interrupts before enter_trap. If the interrupt was set in a force register (IFR0 for CPU 0 or IFORCE[cpu]), the force bit is cleared instead of the shared pending bit. This matches real hardware where a forced interrupt has priority over an externally-asserted interrupt on the same line. Without auto-ack, the same level would be re-sampled on every quantum and the trap handler would never complete.

40. GPTimer IRQ line pulses on every underflow, irrespective of sticky IP. The IP bit in the control register is a software-visible status that latches on underflow and is cleared by writing 0. The IRQ line to the IRQMP, however, is edge-triggered: process_timer_tick() now always calls irq->raise() when IE == 1 and the counter underflows, regardless of whether IP was already set (Decision 40 removes the !was_ip guard). Without the pulse on every underflow, the RTEMS clock driver would only get one interrupt because the IRQMP's auto-ack clears the pending bit — and the timer would never re-assert the line if it only raised on the 0→1 transition of the sticky IP bit.


GDB stub

41. GDB attach is late-binding by default; --gdb-wait is opt-in. initialize() opens the listening socket immediately and returns, even when gdb_stub_wait_for_client = false. The run loop calls GdbStub::poll_accept() once per quantum (non-blocking poll() on the listen FD), accepts new connections mid-run, and halts with HaltReason::Breakpoint so the CLI can drive process_until_resume(). A second connection while one is attached is drained and rejected (accept + immediate close) so the kernel backlog stays clear. Before this, --gdb-wait=false left the socket listening but never accepted, so target remote :PORT after the emulator started hung silently. The cost is one poll(fd, timeout=0) per quantum (~1 µs).

42. Hardcoded RTEMS layout constants with runtime validation, not DWARF parsing. rtems_layout::PerCpuEnvelopeSize=128, PerCpuExecutingOffset=32, ThreadObjectIdOffset=8, ThreadObjectNameOffset=12 are extracted from DWARF of hello-world.elf once and hardcoded in gdb_stub.hpp. Alternatives considered: (a) parse DWARF in the stub — days of work, large surface area; (b) ship a Python helper that GDB loads — couples the user setup to a script. The hardcoded path trades robustness for code size: each read is gated on three sanity checks (executing pointer non-zero, bus read succeeds, Object.id API bits ∈ {Internal, Classic, POSIX}), and any failure falls back to the legacy per-core TID model. A toolchain upgrade that shifts the layout fails the [!mayfail] live-guest test (test_gdb_stub_protocol.cpp) loudly — at which point the constants are refreshed against the new DWARF.

43. qSymbol state machine chains two symbol requests. The handshake asks first for _Per_CPU_Information (B1 — executing thread per core). On a non-zero resolve it transitions to AskingObjectsTable and asks for _Objects_Information_table (B2/B3/B4 root). Each round handles absent-symbol (empty <addr_hex>) independently: missing per-CPU disables thread-awareness entirely; missing objects-table degrades to B1-only. Reset-on-unsolicited qSymbol:: (GDB file reload) wipes both addresses so a stale value does not survive an image switch.

44. Stop-on-ErrorMode redirects through the GDB stub when attached. Previously, a core entering error_mode returned HaltReason::ErrorMode from run_until_unpaced straight to the caller — silent for any attached GDB. Now the run loop's error_mode check calls gdb_stub_->report_error_mode(core). If a client is attached and the error has not been reported, the stub arms a stop reply with the signal derived from TBR.tt and the loop returns HaltReason::Breakpoint. A per-client error_reported_ latch prevents an infinite re-notify when GDB continues a permanently-dead core: the second pass returns ErrorMode and the CLI's resume loop exits cleanly. Without the latch, c-after-crash spins forever.

45. GDB signal numbers are GDB's table, not Linux's. StopSignal hardcodes the values from gdb/include/gdb/signals.def: Ill=4, Trap=5, Fpe=8, Bus=10, Segv=11. SIGBUS diverges from Linux (7); using the host <signal.h> value would silently mis-label alignment faults as SIGUSR1. The mapping from SPARC TT to StopSignal is in signal_from_tt(uint32_t) — table-driven, unit-tested per TT class, with SIGTRAP reserved for software traps (ta, TT >= 0x80) so GDB treats them as breakpoint-class events rather than crashes.

46. Stop replies prefer the RTEMS Objects_Id when thread-awareness is active. send_stop_reply(core, sig) calls try_read_executing_id(core) per-invocation (no cache) and uses the resolved RTEMS thread ID in Tnnthread:<id>;. Falls back to the legacy core+1 if the read fails. Per-call reads avoid the cache-invalidation problem when RTEMS performs a context switch: the stub never has to track stores to Per_CPU.executing. The cost is 2 bus reads per stop event (~µs), negligible compared to the typical RSP round-trip.

49. B2 — full thread enumeration walks _Objects_Information_table[Classic][Tasks].local_table[]. With both qSymbol rounds resolved, qfThreadInfo reports every allocated Classic API task — not just the executing one per core. The walk is gated on a sanity check (Information.object_size == 400, matching sizeof(Thread_Control)); if it fails, the build configuration drifted (POSIX/SMP toggle) and the stub silently degrades to B1 enumeration rather than emit garbage TIDs. Cap of 256 slots prevents a corrupted maximum_id from looping wildly. NULL slots (deleted tasks) are skipped, not reported. The cache populated by qfThreadInfo powers qsThreadInfo (paginated emit under the 4000-byte packet budget), H g <tid> (TID → target translation, including non-executing tasks), and qC (current thread reporting). De-duplication by tid between the executing set and the table walk prevents IDLE/INIT from appearing twice.

50. B3 — g for a non-executing thread reconstructs from saved Context_Control; pc = saved o7. RTEMS only saves the callee-preserved subset (g5, g7, l0..l7, i0..i7, o6/sp, o7, psr) inside _CPU_Context_switch. The GDB register block requires 72 entries; everything RTEMS doesn't save is zero-filled (g0..g4, g6, o0..o5, all 32 FP regs, y, wim, tbr, fsr, csr). The synthetic PC is the saved o7 — the return address that _CPU_Context_switch will pop on resume — which is what bt needs to show "where the thread will continue". npc = pc + 4. G (write) for a non-executing target is rejected with E01: writing into a thread's saved context mid-flight races with the next dispatch in ways the stub cannot make safe. FP regs are deferred (Thread_Control.fp_context requires checking is_fp).

51. B4 — qThreadExtraInfo is name + state + priority, hex-encoded. The reply is the ASCII string "NAME [state] pri=N", hex-encoded for RSP. State decoding is in format_thread_state(uint32_t): 0 → "READY", exclusive flags (SUSPENDED, ZOMBIE, DORMANT, LIFE_CHANGING, DEBUGGER, INTERRUPTIBLE) listed by name, all STATES_WAITING_FOR_* bits collapsed into a single WAIT:<a>|<b>|... segment. Unknown bits surface as UNKNOWN so score/statesimpl.h drift is visible. Each enrichment is best-effort: a bus failure on state or priority degrades that field silently rather than failing the whole reply, so a partial answer (just name) still beats nothing. Priority reads only the low 32 bits of the Priority_Control priority (uint64_t) at offset +20 within Real_priority — Classic API priorities cap at 255 so the high half is always zero.

Diagnostics

47. IEmulatorObserver::on_instruction(cpu, pc) — per-instruction hook for diagnostic probes. Added 2026-05-13 alongside the smpschededf02 stack-overflow investigation. Fires from run_until_unpaced between the GDB-stub break check and core::step(), with the about-to-execute PC. When no observer is installed, the runtime cost is one null-pointer test per instruction (well predicted; benchmarked at <1 % overhead on the existing [smptests] suite). When an observer is installed, each instruction pays one virtual call — acceptable for diagnostic test binaries (the smpschededf02 dispatch probe captures ~700 samples per 200 ms simulated, with negligible host overhead).

Alternatives considered: (a) make the hook PC-set-conditional in the runtime (filter inside Emulator), rejected because it pushes diagnostic policy into the core; (b) build a separate "trace mode" of the emulator, rejected because it doubles the surface area to maintain. The existing IEmulatorObserver already accepted the "empty default, opt-in by override" model for on_irq_*, on_trap_*, on_peripheral_attached; this is a natural sibling.

48. The smptests / sptests / fptests harness installs StdoutLogger(LogLevel::Error) by default. Added 2026-05-13. The default Info level lets a single misbehaving guest emit millions of lines (e.g. a tight loop writing PROM area emits one [WARN] [prom] ignored write per cycle), which during a batch CTest run buries the harness's own per-test outcome lines and inflates the build log by orders of magnitude. The integration harness — not individual unit tests — is the right place to install the quieter logger because the per-test outcome is captured via the UART and the CSV row, not via emulator log output. Tests that need to assert on emulator log content can install their own logger between Emulator::create and Emulator::initialize.

Arch-neutral IR and JIT

See IR and LLVM JIT for the engine and Adding a frontend for the contributor procedure. The design is frozen in plans/phase11-arch-neutral-ir.md (D1–D6) and plans/post-mvp-1to1-roadmap.md (the JIT ADRs); these entries are the searchable index.

49. Guest state is an opaque byte blob; IR ops touch it only through LdState/StState at (offset, size). Added 2026-05-24. The IR knows no register names. SPARC %g/%o/%l/%i, %psr, %y — and a future ARM's r0-r15, CPSR, banked registers — are byte offsets the frontend chooses (src/arch/sparc/.../sparc_layout.hpp). Alternative: virtual registers mapping 1:1 to SPARC registers (the original Phase 11.1 spec) — rejected because it bakes SPARC's register set into the IR and makes register windows / banking an IR concept instead of a frontend offset choice.

50. The block-cache key is (PhysAddr, ModeCtx), and mode-changing instructions are block terminators. Added 2026-05-24. Register offsets and decoding depend on an arch mode (SPARC CWP; ARM Thumb/mode/endianness). Keying on PC alone is insufficient. Because any instruction that changes the mode context (SPARC SAVE/RESTORE/ trap/RETT/WRPSR; ARM mode switch / BX to Thumb) ends the block, the mode is constant within a block and the frontend resolves all mode-dependent offsets at translate time — no runtime-indexed state access is needed. IrBlock::mode_change marks such blocks so the region compiler does not chain across them. Alternative: a runtime-indexed state-access op — rejected as unnecessary once mode changes are terminators (verified against the SPARC window cases).

51. Endianness is an attribute of the guest-memory ops, not of the bus. Added 2026-05-24. LdGuest/StGuest carry {size, endianness}; the swap happens in the op (interpreter) or lowered code (JIT), centralised in src/ir/include/tero/ir/guest_memory.hpp. The SPARC frontend emits big-endian accesses, an ARM frontend little-endian. Alternative: a bus that byte-swaps for a fixed big-endian guest (the pre-IR behaviour) — rejected because it cannot serve two guest endiannesses; the bus now stores raw bytes.

52. The IR has no flags register; condition codes are explicit guest-state writes. Added 2026-05-24. SPARC icc (NZVC) and ARM CPSR (NZCV) differ. The frontend computes each flag bit into its guest-state offset (eager evaluation). Lazy flag evaluation (QEMU's cc_op) is a per-frontend optimisation layered later; it never enters the neutral IR.

53. Atomics are block boundaries; ordering is TSO. Added 2026-05-24. CASA/LDSTUB/SWAP (and ARM LDREX/STREX later) terminate a block, so atomicity holds even though the JIT introduces mid-region exits. TSO is satisfied trivially by single-threaded round-robin today, and maps to the x86-64 host with zero fences (ADR-003), which is why the host is fixed to x86-64 for 1:1 acceptance.

54. The JIT is tiered: an O0 baseline on the calling thread, an O2 optimised tier on a background thread (ADR-002). Added 2026-05-25. IrJit takes an OptLevel; TieredJit (src/jit/src/tiered_jit.cpp) drives two IrJits — Baseline compiled immediately so a cold block runs at once, hot blocks (> jit_promotion_threshold) recompiled at O2 in the background and published atomically. Both tiers lower the identical IR, so they are interchangeable. Alternative: a single synchronous O2 tier (the first cut) — rejected because per-block O2 codegen on the cold path dominated wall time (RTEMS boot 4.5 s; the sptest-JitIr suite ~15 min). The tier cut those to 1.0 s / 6.3 min with steady-state unchanged.

55. Region chaining fuses a block region into one LLVM function, not trampolines (12.4b). Added 2026-05-25. IrJit::compile_region lowers an entry block plus its same-mode static/conditional successors as one native function (one LLVM block per member; in-region branches chain directly, back-edges loop, all budget-bounded). Emulator::build_jit_region discovers the region by BFS over static/cond edges without touching the (evicting) ir_cache_. Alternative: per-block functions linked by runtime-patched trampolines (the Phase 12.4 sketch, the QEMU block-linking style) — rejected because keeping the region in one LLVM function lets LLVM optimise across it and generalises the 12.4a self-loop without a separate patch mechanism. A member that fails to lower falls back to the entry-only region, so chaining never does worse than single-block.

56. A guest core that halts (trap with ET=0) is reported as HaltedMode, distinct from ErrorMode. Added 2026-05-25. On SPARC a trap taken while traps are disabled (ET=0) stops the processor; RTEMS uses exactly this as its deliberate shutdown (_CPU_Fatal_halt / _exit issue ta 0 with ET=0). That is the guest halting, not an emulator failure, so HaltReason::HaltedMode (and the bench's HALTED outcome, which is not a harness failure) name it; ErrorMode is reserved for the emulator's own internal errors. Diagnosed when a sustained GR740 SMP compute workload halted at an FP-disabled fatal (the guest's Init task lacked RTEMS_FLOATING_POINT) — the emulator's behaviour was correct end to end, which the old ErrorMode label obscured.

57. The execution method is a runtime bool translation; LLVM is a mandatory dependency. Added 2026-05-25. The four-way Dispatch enum (Switch / Threaded / Ir / JitIr) and the TERO_ENABLE_JIT compile flag are gone. EmulatorConfig carries translation (default true): false runs the core::step switch interpreter, true runs the tiered LLVM JIT with the IR interpreter as fallback. Both paths are always compiled in and chosen per Emulator, per the config-by-struct principle (no behaviour-selecting build flags). LLVM (≥ 18) is now found unconditionally by CMake — the floor was raised from 17 to 18 because the JIT adopted llvm::CodeGenOptLevel (LLVM 17 spells it llvm::CodeGenOpt::Level), which surfaced when a CI runner's bundled LLVM 17 passed the old >= 17 gate but then failed to compile. Alternatives considered: keep Ir exposed as its own mode (rejected — it is an implementation detail of the JIT fallback, not a method a user picks) and keep TERO_ENABLE_JIT as a build option (rejected — it gated behaviour, not just a dependency; making LLVM mandatory removes a whole untested build configuration).

58. GDB debugging works under binary translation, not only the interpreter. Added 2026-05-25. Breakpoints are an external BreakpointSet of PCs, never patched into guest memory, so the block translator is unaffected by them. run_ir_quantum is breakpoint-aware: with a stub attached it runs native until a block boundary, checks should_break at each entry, and single-steps (via core::step) through any block holding an interior breakpoint so it stops on the exact PC; build_jit_region also stops fusing blocks while a stub is attached so that interior-breakpoint check stays exact. Only a per-instruction observer (trace) still forces the interpreter. Alternative: keep forcing the Switch path whenever a stub is attached (the interim behaviour) — rejected because it changed the executed code under the debugger, defeating the point of debugging the path that actually runs in production.

59. The threaded-code dispatcher (Phase 10.2) was removed. Added 2026-05-25. dispatch_threaded.{hpp,cpp} (th_skeleton<Body> tail-call chains, the ThreadedHandler typedef, the DecodeCacheEntry::fn pointer, and the CpuState chain counters) are deleted. Threaded code reached ~1.5× over the switch interpreter but missed its Phase 10.2 exit targets, and the arch-neutral IR JIT (decisions 49–55) supersedes it as the fast path. The decode cache keeps pc_tag + DecodedInsn for the Switch path; the flag helpers in handlers_internal.hpp, once shared between the threaded templates and the exec_alu switch, now serve the switch alone. Keeping a third, unused execution path would be dead weight against the project's no-premature-abstraction rule.

79. The IR interpreter carries the reference-path duties through a per-instruction step hook; stops resume via set_pc at straight-line boundaries only. Added 2026-06-10 (E0, plans/e0-ir-reference-path.md; ADR-006 stage gate). An IR-only architecture (no switch oracle — every post-SPARC frontend under ADR-006) still needs the three per-instruction reference duties the SPARC switch oracle provides: trace (IEmulatorObserver::on_instruction with honest pre-instruction state), GDB single-step / interior breakpoints, and lockstep state compare. ir::IStepHook supplies them on the interpreter: every builder-emitted op is stamped with its insn_index/pc (previously only trapping ops were), the interpreter fires the hook before the first op of each guest instruction (op-less instructions included, via the gap-filling boundary walk), and an honoured stop returns a BlockExit::boundary_stop whose committed prefix reuses the precise-trap "prefix committed" guarantee. Resumption is IArchitecture::set_pc(entry_pc + stride · n) — exact because a stop is only honoured at a straight-line boundary: IrBlock::no_stop_tail (stamped by the frontend, 1 on every SPARC delay-slot-bearing terminator) excludes the CTI→delay-slot shadow, where nPC ≠ PC+4 and an annulled slot must not be re-entered as a fresh block. Three consequences: run_ir_interpret_quantum's observer now fires interleaved with execution instead of pre-fired per block (each callback sees the committed state of every prior instruction); an IR-only frontend yields at the EXACT quantum boundary (the SMP determinism guard no longer needs an oracle) and single_step is a true one-instruction step; and GDB interior breakpoints run the block on the interpreter with a stop before the breakpoint PC instead of running past it. SPARC defaults are untouched: an observer still forces the switch path (the frozen oracle is the SPARC trace path), and the oracle still walks SPARC quantum tails and breakpoint blocks — the hook activates for SPARC only under force_ir_interpret (validation) and for the per-instruction lockstep harness (run_ir_diff(..., per_insn = true)), which compares the full blob at every interior boundary across an RTEMS boot.

80. New guest architectures are IR-only; the Switch interpreter is the frozen SPARC oracle; the future frontend generator emits Tero IR, never LLVM IR (ADR-006). Added 2026-06-10 (full text: plans/multiarch-emugen-frontends.md; public summary: EmuGen). A new ISA implements exactly IArchFrontend + IArchitecture — its reference path is the IR interpreter (Decision 79), its fast path the tiered JIT; no per-arch switch interpreter is ever written. The SPARC Switch stays hand-written and frozen indefinitely: it is the most battle-tested validation asset in the project and costs ~zero to keep, while replicating it per ISA is the cost the decision avoids. The planned generator (EmuGen, gated on two committed architectures — rule of three) produces only what a hand-written frontend provides (decoder, translate_block, layout header) and targets Tero IR: generating LLVM IR directly — the alternative considered — would orphan the IR interpreter, the block cache, the tiering, and the GDB metadata in one stroke. Amended same day: the SPARC retrofit (E3) is mandatory, and the generated SPARC frontend goes to production once validated block-by-block against the hand-written one over the RTEMS corpus; LEON3 and LEON4 share that single SPARC V8 frontend.

Entity model, compose, and runtime decomposition

The 2026-06 refactor track (entity-model S0–S11 + A-series, compose, the cleanliness sweep). Full migration history: plans/entity-object-model.md; the searchable judgment calls are indexed here.

60. Everything in a machine is a tero::IEntity; the name is IEntity, not IObject or IModel. Added 2026-06-08. Peripherals, host-facing plugins, comms buses, CPUs, and the bus fabric share one base (src/interfaces/include/tero/ientity.hpp); "being a peripheral" means exposing IMmio, not deriving a separate root. IObject/IModel were rejected because the future SMP2 wrapper bridges to Smp::IObject/Smp::IModel and same-named types on both sides of that bridge guarantee ambiguity.

61. Capability lookup has two spellings: the get_interface<T>() member and the interface_cast<T>(IEntity*) free function. Added 2026-06-08. The free form is null-safe and usable on an IEntity*; the member reads better on a known entity. They are named differently because a same-named free function is shadowed by the member inside an entity's own methods — a real lookup failure hit during S0, not a style preference. Both subsume the per-protocol dynamic_cast<…Provider*> chains that predated them.

62. No string-keyed reflective properties; configuration stays typed. Added 2026-06-08. The TEMU-style set_property("freq", "80MHz") model was rejected: config-by-struct is a frozen principle, and typed fields fail at compile time where string keys fail at run time. Read-only introspection remains the IPublisher seam (→ SMP2 IPublication).

63. Wiring is typed interfaces + named slots, not IPort. Added 2026-06-08. A declarative edge is Connection{from_slot, peer, peer_slot} (peripheral_spec.hpp); the runtime resolves it in one generic pass — a find_port(from_slot) hit binds a port slot (signals, SpaceWire), otherwise the consumer's IConnectable::connect joins a shared bus (CAN, SPI, 1553). The old plan of routing all wiring through stateful IPort objects was rejected: a port is one capability among several, not the mechanism.

64. IRQ lines stay the PeripheralSpec::irqs shorthand, resolved to IrqBridges at attach() — not post-attach connection edges. Added 2026-06-08 (S3.C finding). Bridges must exist before the post-attach connection pass and require controller-first ordering; migrating IRQs to generic edges would reorder the lifecycle under every RTEMS guest for zero expressiveness gain.

65. Emulator is a facade; the subsystems are Soc, ExecutionEngine, and DebugServer, and teardown correctness is encoded in declaration order. Added 2026-06-08 (S4–S6). emulator.cpp shrank from 1806 lines to a thin delegation layer. engine_ is declared last so it destructs first (its workers touch the SoC bus through the per-core bridges); ~Soc stops plugins before freeing the buses they observe. Reordering the members of Emulator or Soc is a teardown bug — the invariant is documented at the member declarations and was proven by the full suite under ASan/UBSan + leak detection.

66. core(idx) keeps returning core::CpuState&; the canonical state is the per-core arch-sized GuestState blob. Added 2026-06-08 (S7, revalidated A1–A3). An opaque accessor would have broken 82 call sites across 12 files for no functional gain. CpuState is a lens over the engine-owned blob (core_blobs_), so there is no CpuState⇄GuestState sync layer; non-SPARC architectures never construct the lens.

67. Trap and interrupt delivery sit behind three coarse ir::IArchitecture methods: set_pc, raise_block_exception, evaluate_interrupt. Added 2026-06-08 (S10). One virtual call per block exit or scheduling round — never per instruction — keeps the seam off the hot path. The whole SPARC fault tail (data-access vector, delay-slot trap PC/nPC, the ET=0 → error-mode rule of SPARC V8 §7.3) lives in src/arch/sparc/src/sparc_arch.cpp; the engine only halts the core when raise_block_exception returns false.

68. External interrupts are decided and delivered through the architecture. Added 2026-06-08/09 (S10 + arch-decoupling). evaluate_interrupt is a pure decision (level select + enable/priority gates + trap type + ack_mask, the controller's own encoding formed by the arch); deliver_interrupt performs the entry on the architecture's per-core state so micro-state outside the blob (the SPARC annul flag) is cleared exactly as the switch oracle does. The engine never reconstructs a controller bitmask.

69. A new ISA is a CpuArch enumerator plus one factory case; EmulatorConfig::arch_factory is the injection seam for out-of-tree or test frontends. Added 2026-06-08 (S7 + S11). make_architecture (src/runtime/src/architecture_factory.cpp) is the single extension point; the arch_factory std::function field lets an embedder plug an ir::IArchitecture without an enumerator. The toy frontend (tests/integration/test_toy_frontend.cpp) runs end-to-end through both run-loop paths on its own 64-byte state layout — the seam's proof.

70. A peripheral's name() is its registry identity — the spec instance_name, injected by the runtime; authors implement device_class(). Added 2026-06-09. IPeripheral::name() is final: find_entity(x)->name() == x now holds for every entity kind in the one flat instance-name namespace (peripherals, plugins, comms buses; uniqueness enforced by validate_emulator_config). Previously peripherals self-reported their IP-core class ("apbuart") while everything else self-reported instance identity, so a peripheral did not know its own registry name. The injection is one call in Soc::build's factory loop — the funnel every config source passes through. IPeripheral grew state and a vtable slot, so ComponentAbiVersion was bumped 1 → 2 and v1 component libraries are rejected at dlopen time.

71. Silicon kits are Machine compositions in tero_compose; the runtime recipe functions were deleted. Added 2026-06-09. tero::compose::gr712rc_config() / gr740_config() (src/compose/src/kits.cpp) build the same EmulatorConfig the deleted tero::runtime recipes did, but through the typed object graph that also serves .tero scripts and component libraries — one assembly path instead of two. Deliberate API break; the migration is a namespace change.

72. The owning entity lists stay typed: peripherals_, buses_, plugins_. Added 2026-06-09. Merging them into one vector<unique_ptr<IEntity>> was rejected: the three lists encode real operational roles (peripherals tick every round, plugins have start/stop lifecycle and must destruct before the buses they observe, bus media are passive), and one flat vector turns the "plugins before buses" destruction guarantee from a compile-time declaration-order fact into a fragile runtime insertion-order one. find_entity already iterates all of them uniformly.

73. The AMBA AHB/APB hierarchy is a descriptive overlay; the hot dispatch path is unchanged. Added 2026-06-09. bus::AmbaBus ("ahb"/"apb" entities) and bus::MemorySpace/IMemoryAccess model the topology for inspection; rewiring the dispatch through them (and adding an ATC) was declined after reading the hot path: the RAM fast paths are RAM-typed (TSO atomics via std::atomic_ref on Ram's buffer) and the JIT already inlines a host-pointer RamWindow, so the erased indirection buys nothing. An ATC becomes the software TLB when the SRMMU (P3) lands.

74. The opcode histogram is emitted by clients; the library exposes data. Added 2026-06-09. ~ExecutionEngine used to fopen a CSV and read $TERO_HISTOGRAM_OUT — the last direct-I/O + env-config escape hatch in the core. Emulator::opcode_histogram() returns the counters and format_opcode_histogram_csv renders them (pure function, empty string when all-zero); tero-bench and tero-emu own the I/O policy through an explicit --histogram-out <path> flag (stderr when absent; the env var was removed 2026-06-10). The per-instruction counting stays compile-gated (TERO_OPCODE_HISTOGRAM) — that flag gates instrumentation cost, not behaviour.

75. Machine scripts have one grammar (v2): expressions fold at parse time, and the file can never become a program. Added 2026-06-10. let constants, bounded integer arithmetic (`+ - * / <<

| & ~, parens, K/M/G suffixes) and read-onlyobj.propreferences to earlier lines all constant-fold in the parser (src/compose/src/script.cpp) — theMachineonly ever sees literals. Conditionals, loops, strings and forward references are structurally impossible: a.terofile stays declarative, diffable, and reads like the datasheet's memory-map table. The v1 verbs (create/write/positionalmap/4-tokenconnect) were removed rather than aliased — two grammars in examples and docs diverge. Interface tokens onconnectedges are the realtero::interface names, validated byentity-check` against a known set.

76. PnP publication is edge-based: a bridge's slaves edges ARE the table. Added 2026-06-10. connect <bridge>.slaves <dev>:IAmbaPnp declares membership and slot (edge order = slot; slaves[N] pins a sparse one); a device with no edge stays off the table; one such edge switches the machine to explicit-publication mode, while a machine with none keeps the legacy address-derived auto placement (a minimal board needs no slot bookkeeping). This replaced the pnp_slot/pnp_publish device attributes: the bridge owns its slot table as in the silicon, membership stops being inferred from address ranges, and "unpublished" stops being a magic knob. The audit of the deleted hardcoded tables also corrected a design premise: the five IAmbaPnp implementors are the complete historically-published set, so no mass identity rollout was needed — identity resolution was already device-first (pnp_table.cpp), and further devices adopt IAmbaPnp per-class when a board publishes them with a manual-sourced GRLIB §3.4 ID.

77. The silicon kits are embedded .tero files; the C++ kit functions are loaders. Added 2026-06-10. src/compose/machines/gr712rc.tero / gr740.tero are the single source of truth: gr712rc_machine()/gr740_machine() parse configure-time-embedded copies (no runtime file lookup, no install-path dependency), and the same files install under share/tero/machines/ as the user's copy-and-derive starting point. Equivalence to the deleted C++ compositions was proven by dump() byte-equality on both SoCs before the swap; test_compose_kits (placement content) and RTEMS [boot] passed unchanged throughout. The alternative — shipping files alongside C++ compositions — was rejected: two definitions of one SoC drift.

78. PnP slots are internal; slaves edges are membership only. The host sink entity is StdoutMonitor. Added 2026-06-10. Amends Decision 76: a connect <bridge>.slaves <dev>:IAmbaPnp edge only publishes the device on that bridge's table — records compact in edge order and the slot number is an internal detail. GRLIB software (RTEMS, MKPROM, Linux) enumerates the table and matches records by vendor/device identity and BAR address, never by slot position, so the index pinned by slaves[N] and the pnp_ahb_slot property bought silicon byte-fidelity of the PnP area at the cost of order-dependence (moving an edge silently renumbered the table) and a stringly port-name hack (slaves[3] → port "slaves3"). Both forms were removed; slaves[N] now fails build() with a diagnostic (silently treating it as a structural edge would drop the device off the table unnoticed). Consequence accepted: the kits' published tables compact (GR740 APB 0,1,3,4,5 → 0..4) and the PnP MMIO region is no longer byte-identical to the silicon manuals' sparse layouts — test_compose_kits asserts the published device set, and RTEMS [boot] passed unchanged on both kits. In the same pass the Console component was renamed StdoutMonitor (ComponentKind::Monitor): the entity is exactly a host stdout sink, not hardware, and the honest name keeps future host sinks as new registry classes rather than properties.