Design decisions¶
This is the canonical decision log for Tero. It has two parts:
- Architectural Decision Records (ADRs)
— the four frozen, capital-A architectural decisions (ADR-001..004)
that govern the 1:1 real-time roadmap. These change only with explicit
human approval and an ADR update; they are reproduced here as the
canonical decision log from
CLAUDE.mdandplans/post-mvp-1to1-roadmap.md. - Numbered implementation decisions (1–59) — judgment calls taken
during implementation that go beyond the frozen invariants in
design principles. Each entry is a fact not
visible from reading the code alone — the reasoning, the context, the
alternative considered. Cross-references from prose elsewhere use the
form
(Decision N); the numbering is historical and must not be renumbered.
For the chronological context (when and why a session's batch of decisions happened), see Status and the archive. For the non-negotiable invariants (zero singletons, errors-as-values, …) see Design principles.
How to use this page¶
| You want to … | Action |
|---|---|
| Understand a frozen architectural choice | Read the ADRs. |
Find a specific (Decision N) reference from another doc |
Search for **N. in this file. Numbers are stable. |
| See all decisions for one module | Use the section headers below. |
| Add a new decision | Append it to the appropriate section with the next free number, then reference it as Decision N from prose. Do not renumber existing entries. |
| Change an ADR | Get explicit human approval, edit plans/post-mvp-1to1-roadmap.md and CLAUDE.md first, then sync this page. |
Sections¶
- Architectural Decision Records (ADRs) — ADR-001..004
- Bus and memory — Decisions 1–5, 26
- Decoder and ISA — Decisions 8–11, 21, 30–32, 37, 38
- Peripherals — Decisions 12–20, 22, 27, 29, 33, 35
- Runtime and scheduling — Decisions 6, 7, 23, 25, 28, 34, 36, 39, 40
- GDB stub — Decisions 41–46
- Diagnostics — Decisions 47, 48
- Arch-neutral IR and JIT — Decisions 49–59
- Entity model, compose, and runtime decomposition — Decisions 60–77
Architectural Decision Records (ADRs)¶
These four decisions freeze the major choices for the 1:1 real-time
roadmap (plans/post-mvp-1to1-roadmap.md). They are frozen: override
only with explicit human approval and an ADR update in that file. Each
ADR below states the decision, the rationale, the mechanism (with code
where it exists), and the consequences.
Relationship to the frozen technical decisions
The ADRs sit alongside the frozen technical decisions in CLAUDE.md
(language/toolchain, architecture principles, execution model, timing,
multi-core, memory/bus, FT, caches, MMU, FPU). Those are reproduced
for a human reader in Design principles; the
ADRs here are the four that specifically govern the post-MVP
performance roadmap.
ADR-001 — Dual execution mode (single-thread + multi-thread)¶
Decision. EmulatorConfig::execution_mode selects between
ExecutionMode::SingleThread (default, SMP2-compatible) and
ExecutionMode::MultiThread (each simulated core on its own host thread,
standalone-only). The field lives in
src/runtime/include/tero/runtime/emulator_config.hpp:35-48.
Rationale. 1:1 GR740 needs aggregate throughput beyond what a single host thread can deliver even with a top-tier JIT (measured per-core Dhrystone ≈ 0.82× of GR740's per-core target; one host thread tops out well below the quad-core aggregate). Multi-thread is mandatory to close the gap. Single-thread must remain available for the external SMP2 wrapper, which expects single-threaded model execution driven by its own scheduler.
Mechanism. Thread-safe primitives are runtime-gated, not conditionally compiled, so single-thread mode pays near-zero sync overhead while both modes live in one binary:
- Mutexes are
tero::GatedMutex(src/interfaces/include/tero/gated_mutex.hpp) — astd::mutexthat no-ops when its gate is inactive. The gate is derived fromexecution_modeand set once atinitialize(), before any worker thread starts. - The thread-per-core dispatcher uses
std::barrier:start_workersspawns N−1 worker threads parked at a start barrier; each round the main thread releases them, runs core 0 itself, and rejoins at a done barrier (src/runtime/src/emulator.cpp:690-742,:855-875). - Atomics use the ordering the SPARC-TSO/x86-64-host invariant allows (Decision 53; ADR-003 fixes the host so SPARC TSO maps with zero fences).
This supersedes the original compile-time NullMutex template-parameter
mechanism — the runtime gate is required by the config-by-struct /
no-behaviour-build-flags rule
(principle 4).
Consequences. A GDB stub or a per-instruction observer forces
SingleThread (they need a single, well-ordered instruction stream).
MultiThread runs the per-core JIT path with per-core IR caches and
TieredJits (each with its own Tier-2 background O2 thread, P14-2). Byte-identical ST==MT is
impossible for lock-contended workloads (ST round-robin has no
contention); the achievable determinism gate is shared-state convergence,
not bit-exactness. Orthogonal to translation: either execution method
runs under either mode.
ADR-002 — Tiered JIT (baseline + optimising)¶
Decision. The JIT has two compilation tiers:
- Tier 1 (baseline): fast translation of a block to LLVM IR with minimal passes (O0). Used immediately when a block crosses its baseline threshold so a cold block runs at once.
- Tier 2 (optimising): full LLVM O2 pipeline, compiled on a background thread; promotion is atomic.
The knobs are EmulatorConfig::jit_baseline_threshold (default 32,
interpret-first warmup), jit_promotion_threshold (default 100),
jit_background_opt (default true), and jit_max_region_blocks
(default 8) — all in emulator_config.hpp:181-218. TieredJit
(src/jit/src/tiered_jit.cpp) drives two IrJits (Decision 54).
Rationale. 1:1 requires bounded jitter, not just average throughput. A single optimising tier introduces visible compilation pauses on cold paths (multi-millisecond stalls). Tier 1 provides immediate execution while Tier 2 runs out-of-band — the same pattern HotSpot, V8, and Graal use. Measured: the single synchronous O2 tier put RTEMS boot at 4.5 s and the sptest-JitIr suite at ~15 min; the tier cut those to 1.0 s / 6.3 min with steady-state throughput unchanged (Decision 54).
Consequences. Blocks interpret-first (run on the IR interpreter until proven hot), then compile fast at O0, then recompile at O2 in the background. Both tiers lower the identical IR, so they are interchangeable. Region chaining fuses a same-mode block region into one LLVM function (Decision 55), budget-bounded so a fused region yields at the exact quantum boundary the switch would (no SMP interleaving drift). Full engine detail: IR and LLVM JIT.
ADR-003 — Host requirements¶
Decision. The tier-1 host for 1:1 acceptance is x86-64 Linux with
≥ 4 physical cores, AVX2, ≥ 16 GiB RAM, GCC ≥ 13 or Clang ≥ 17, LLVM ≥
18. ARM64 Linux and macOS are tier-2 (functional but no 1:1
guarantee). The LLVM floor is enforced in CMake
(CMakeLists.txt:110-114, FATAL_ERROR below 18).
Rationale.
- x86-64 native TSO maps SPARC TSO trivially — zero memory fences needed (Decision 53). ARM's weak memory model would require explicit fences with measurable overhead, which is why the 1:1 acceptance host is fixed to x86-64.
- AVX2 enables LLVM's vectorisation passes for hot helpers.
- LLVM ≥ 18 because the JIT uses
llvm::CodeGenOptLevel, which LLVM 18 introduced (LLVM 17 spells itllvm::CodeGenOpt::Level). The floor was raised from the original 17 to 18 after the code adopted the 18 API — a CI runner's bundled LLVM 17 passed the old>= 17gate but then failed to compile (Decision 57).
Consequences. LLVM is a mandatory, unconditional dependency
(find_package(LLVM REQUIRED CONFIG)), not gated behind a build option —
the old TERO_ENABLE_JIT flag is gone (Decision 57). The strict host
requirements are part of the build's contract, not the user's
responsibility (CLAUDE.md toolchain section).
ADR-004 — No cycle-accurate timing¶
Decision. 1:1 means simulated seconds per wall-clock second, not
simulated cycles per host cycle. Pipeline-level emulation is explicitly
out of scope. Simulated-time fidelity is handled orthogonally by the CPI
model (a single global EmulatorConfig::cpi, default 1.0; the planned
bucket-resync refinement is P10).
Rationale. Cycle-accurate adds 5–10× host cost for fidelity that neither the RTEMS testsuite nor external-simulator integration require. A single global CPI captures the dominant simulated-time scaling without per-opcode hot-path cost; the bucket-resync model (when it lands) will capture the dominant variability (cache misses, write-buffer stalls, AHB contention) at scheduler boundaries, not per instruction.
Consequences. There is no per-opcode CPI table — Tero models a
single global CPI. ns_per_insn = cpi * ns_per_cycle(cpu_clock_hz)
(ns_per_insn_for, emulator_config.hpp:65), recomputed at
Emulator::create (emulator.cpp:227). Caches and the SRMMU are not
modelled (they are timing concerns Tero deliberately skips); the GPTIMER
prescaler stays on the raw ns_per_cycle() (the peripheral/bus clock,
unaffected by cpi). See Multicore and timing.
Bus and memory¶
1. SystemBus is non-copyable and non-movable.
It owns RAM via unique_ptr<Ram> and peripherals cache raw pointers
into it. Moving the bus would invalidate them. If you need to relocate
a bus, construct a new one.
2. Big-endian translation lives in SystemBus, not in Ram.
Ram holds raw bytes as they appear on the wire;
SystemBus::{encode,decode}_be() does the BE shuffle at the typed
access boundary. Keeps RAM trivially snapshot-able and matches how a
real memory controller behaves.
3. Single region per access.
A byte-span access that straddles two regions returns
ErrorCode::BusError. Real hardware latches one transaction against
one target; the bus does not silently split.
4. MMIO requires 1 / 2 / 4-byte naturally aligned accesses.
Anything else is rejected at the bus with BusError or
AlignmentError. CPU alignment traps belong in the handlers, not in
the bus.
5. Bus does not own peripherals.
SystemBus::map_peripheral takes a non-owning IPeripheral*. The
runtime owns unique_ptr<IPeripheral> and hands raw pointers to the
bus.
26. 4 KB RAM mapped at physical address 0x0.
Real GR712RC boots from ROM at address 0. The RTEMS idle loop does
lda [%o0] 0x1c, %g0 where %o0 = 0xFFFFFFF0; SPARC V8
address wrap-around lands at physical address 0. Mapping 4 KB of RAM
at address 0 avoids a spurious data_access_exception in the idle
loop.
Decoder and ISA¶
8. FPop1/FPop2 decoded as InsnKind::FpOp.
op = 10, op3 ∈ {0x34, 0x35}. The handler returns FpDisabled (tt =
0x04), the correct LEON3 behaviour when no FPU is present
(PSR.EF = 0). Coprocessor opcodes (op3 ∈ {0x36, 0x37}) remain
Unknown and map to IllegalInstruction.
9. Instruction-fetch vs data-access bus errors are distinguished.
ExecStatus::InsnFetchError (tt=0x01, instruction_access_exception)
for failed fetches; BusError (tt=0x09, data_access_exception) for
load/store failures.
10. TADDccTV / TSUBccTV set icc and write rd even when trapping.
SPARC V8 §B.30: the tagged-add/sub trap variant computes the result
and condition codes first, then traps if V is set. The handler writes
rd and icc before returning ExecStatus::TagOverflow. The spec
makes the result "unpredictable"; deterministic write makes tests
reproducible.
11. handlers.cpp split into category files.
handlers_alu.cpp, handlers_branch.cpp, handlers_loadstore.cpp,
handlers_regwin.cpp, handlers_special.cpp, with shared helpers
(alu_op2, eval_cond) in handlers_internal.hpp. The public
execute() dispatcher remains in handlers.cpp.
21. BA encoding uses disp22 = 0, not 1.
Earlier test_bare_metal.cpp helpers encoded BA with disp22 = 1,
which targets PC+4 (the delay slot itself). The fix sets disp22 = 0
so BA .+0 becomes a proper self-branch when intended.
30. Cache config registers use GR712RC-specific values.
Both I$ and D$ were 0x08101004 (placeholder). Replaced with: I$ =
0x132308e8 (4-way, 16 KiB, LRU, snooping, MMU present), D$ =
0x1b2208f8 (4-way, 16 KiB, LRU, snooping, MMU present,
write-through). Read-only registers accessed via ASI 0x02 at addresses
0x08 and 0x0C. GR712RC §5.2.
31. ASR17 reset value includes GR712RC-specific fields.
Original value only had V8 mul/div (bit 20) and NWINDOWS-1=7. Added
FPU type [11:10]=01 (GRFPU) and watchpoints [7:5]=010 (2
watchpoints). New value: (1U << 20) | (1U << 11) | (2U << 5) | 7U =
0x100847. GR712RC §4.2.
32. PSR reset includes LEON3 impl/version fields.
After reset, PSR had only S=1 (impl=0, ver=0). Fixed to include
impl=0xF (Gaisler) and ver=0x3 (LEON3FT). New reset: (0xFU << 28)
| (0x3U << 24) | kSBit = 0xF3000080. These fields are read-only
masked so WRPSR preserves them. GR712RC §4.1.
37. enter_trap() clears annul_next_.
SPARC V8 §5.1.2.2 specifies that the annul-bit mechanism is per-CTI and
does not persist past a trap entry. CpuState::enter_trap() explicitly
clears annul_next_ to prevent the first instruction of an ISR handler
from being silently dropped when a hardware interrupt arrives after an
annulled delay slot. This was the root cause of sp11's ErrorMode crash.
38. WRPSR is applied immediately, matching SIS.
SPARC V8 §5.1.2.3 permits the S, ET, PS, and CWP changes from a WRPSR
to be deferred up to three instructions, but that delay is implementation
latitude — real software pads WRPSR with three NOPs, so the result is
observably identical. The reference oracle (Gaisler SIS) applies the write
immediately, and Tero matches it: write_psr_writable()
(cpu_state.cpp:69) masks the read-only fields and writes the rest
straight to psr_ in one shot, with no pending-write buffer. The earlier
3-instruction pending_psr_ / commit_psr_pipeline() model was removed
because it desynced the register windows when a trap fired inside the
window (smpschededf03).
Peripherals¶
12. MemCtrl (FTMCTRL) is a passive stub. MCFG1–MCFG4 are readable / writable with no side effects. MCFG3 bit 27 (reserved, reads as 1) is forced. No timing, no bank switching — just enough for the RTEMS memory probe.
13. IrqMP IFORCE write semantics. Writing IFORCE uses a clear-then-set protocol: the upper 16 bits clear bits, the lower 16 bits set bits (both masked to IRQ lines 1–15). Matches GRLIB behaviour where software can atomically set and clear force bits in one write.
14. IrqMP pending_mask(0) includes IFR0.
CPU 0's pending mask is IPEND | IFR0 | (current_mask & IFR0). For
CPU N>0, pending_mask(N) is IPEND | (current_mask & IFRN). Matches
the GR712RC single-CPU force register design.
15. GPTimer control register writable mask is 0x2B.
Bits 0 (EN), 1 (RS), 3 (IE), 5 (CH) are directly writable. Bit 2 (LD)
is write-only — triggers a reload from the counter register then
clears. Bit 4 (IP) uses write-0-clear (writing 1 has no effect, writing
0 clears the pending bit). Bit 6 (DH) is read-only 0.
16. GPTimer prescaler underflow logic.
On tick(), the prescaler counter is decremented first; when it
reaches zero, the prescaler value is reloaded and all enabled
sub-timers are ticked.
17. GPTimer timer4 watchdog defaults. After reset, timer4 has EN=RS=IE=1 and counter/reload both set to 0xFFFF. Matches the GR712RC default where the watchdog is armed and must be disabled or fed by software.
18. ApbUart uses std::queue<uint8_t> for the RX FIFO (max 8).
No TX FIFO is modelled — transmit() drains immediately via
ICharacterDevice. Status bit 31 (FA, FIFO available) always reads
as 1 since the queue fits within 8 entries and never overflows.
19. PeripheralContext now includes ICharacterDevice*.
Added for APBUart to inject console I/O. Default is nullptr; the
runtime wires it to the configured character device.
20. All peripheral MMIO handlers reject non-word accesses.
Byte and half-word reads/writes return AlignmentError. Stricter than
GR712RC (which allows byte writes to the UART data register), but
matches the MVP approach: defer narrowing to when RTEMS demands it.
22. hello_uart.S must enable CTRL.TE before transmitting.
ApbUart drops writes to the data register unless TE is set in the
control register. Any asm test that writes directly to APBUart MMIO
must first st the TE bit into CTRL. Missing this is silent — the
test runs but no output is produced.
27. GPTimer bootloader prescaler initialization.
The emulator simulates the GR712RC ROM bootloader's timer setup by
writing to the prescaler value and reload registers during
initialize(). Without this, the prescaler counter starts at 0 and
takes 0xFFFF ticks (≈ 65 ms) before the first underflow, greatly
delaying the first timer interrupt.
29. GPTimer Timer 4 reset comment corrected.
The comment said RS=1 but 0x09 = 0b1001 has RS=0 (bit 1 is clear).
Correct value is EN=1, RS=0, IE=1. The code value was already correct;
only the comment was fixed.
33. FTMCTRL P&P device ID corrects to 0x054.
The AHB Plug&Play descriptor used device ID 0x00F (MCTRL) instead of
0x054 (FTMCTRL, fault-tolerant memory controller with EDAC). The
GR712RC has FTMCTRL, not MCTRL. Config word changed from 0x0100f020
to 0x01054020. Affects RTEMS auto-probing.
35. DemoDmaDevice does endian-safe byte-wise XOR.
The original implementation memcpy'd a uint32_t over dma_read and
XORed the host-endian word, producing different results on LE vs BE
hosts. The corrected version XORs each byte against the corresponding
byte of the mask in BE order (MSB ↔ addr+0), matching what a SPARC
ld; xor; st sequence produces.
Runtime and scheduling¶
6. Warning set is stricter than the CLAUDE.md minimum.
tero::warnings enables, on top of -Wall -Wextra -Wpedantic
-Werror: -Wshadow -Wnon-virtual-dtor -Wold-style-cast -Wcast-align
-Wunused -Woverloaded-virtual -Wconversion -Wsign-conversion
-Wnull-dereference -Wdouble-promotion -Wformat=2. Everything builds
with 0 warnings / 0 errors under this set.
7. tests/support/dummy_peripheral is a test fixture, not a module.
Lives in the test tree; must not leak into tero_peripherals.
23. Emulator exposes injection, not construction, for services.
set_character_device() and set_logger() swap the defaults at
runtime. Lets the CLI wire StdoutCharDevice / StdoutLogger without
forcing them into EmulatorConfig, and lets tests swap in
CapturingCharDevice without touching config. SMP2 wrappers will use
the same injection points.
25. Idle-time skip is bounded to 1 ms.
When all cores are powered down, run_until() jumps simulated time
forward to the next event. The GPTimer uses direct
IInterruptSource::raise() calls rather than the EventScheduler, so
next_event_time() does not see timer interrupts. Without a bound,
time would jump to the deadline and the GPTimer's periodic interrupt
would be delivered only once. The 1 ms bound (kMaxIdleNs) ensures
timer interrupts arrive at roughly their expected rate.
28. Secondary cores start in power-down mode.
On real GR712RC, only CPU 0 starts executing at reset. Secondary cores
are parked in power-down mode (wr %g0, %asr19) and wait for the
primary to release them via IRQMP. The emulator sets
is_powered_down = true for all cores except CPU 0 after loading the
ELF image.
34. Emulator::add_peripheral injects ctx.bus = &bus_.
Previously PeripheralContext::bus was left null for user-defined
peripherals, which silently broke any DMA-capable custom peripheral
attached through the public API. The runtime now always wires the bus;
the rest of PeripheralContext (irq, scheduler, logger, chardev) was
already handled.
36. Sptest pass criterion is "*** END OF TEST in console".
The previous test required HaltReason != ErrorMode and the END OF
TEST string. RTEMS sptests print END OF TEST and then unwind into a
halt that triggers a trap-while-ET=0 (ErrorMode), so the strict check
failed all 8 actually-passing tests. The functional pass criterion is
the END OF TEST banner alone.
39. IrqMP acknowledge() gives force precedence over pending.
Per GR712RC §8 (p.~115): when a processor takes an interrupt trap, the
corresponding pending bit is auto-cleared. Tero models this via
IrqMP::acknowledge(cpu, bit), called by Emulator::sample_interrupts
before enter_trap. If the interrupt was set in a force register
(IFR0 for CPU 0 or IFORCE[cpu]), the force bit is cleared instead
of the shared pending bit. This matches real hardware where a forced
interrupt has priority over an externally-asserted interrupt on the
same line. Without auto-ack, the same level would be re-sampled on
every quantum and the trap handler would never complete.
40. GPTimer IRQ line pulses on every underflow, irrespective of
sticky IP.
The IP bit in the control register is a software-visible status that
latches on underflow and is cleared by writing 0. The IRQ line to the
IRQMP, however, is edge-triggered: process_timer_tick() now always
calls irq->raise() when IE == 1 and the counter underflows,
regardless of whether IP was already set (Decision 40 removes the
!was_ip guard). Without the pulse on every underflow, the RTEMS
clock driver would only get one interrupt because the IRQMP's
auto-ack clears the pending bit — and the timer would never re-assert
the line if it only raised on the 0→1 transition of the sticky IP
bit.
GDB stub¶
41. GDB attach is late-binding by default; --gdb-wait is opt-in.
initialize() opens the listening socket immediately and returns,
even when gdb_stub_wait_for_client = false. The run loop calls
GdbStub::poll_accept() once per quantum (non-blocking poll() on
the listen FD), accepts new connections mid-run, and halts with
HaltReason::Breakpoint so the CLI can drive process_until_resume().
A second connection while one is attached is drained and rejected
(accept + immediate close) so the kernel backlog stays clear.
Before this, --gdb-wait=false left the socket listening but never
accepted, so target remote :PORT after the emulator started hung
silently. The cost is one poll(fd, timeout=0) per quantum (~1 µs).
42. Hardcoded RTEMS layout constants with runtime validation, not
DWARF parsing.
rtems_layout::PerCpuEnvelopeSize=128, PerCpuExecutingOffset=32,
ThreadObjectIdOffset=8, ThreadObjectNameOffset=12 are extracted
from DWARF of hello-world.elf once and hardcoded in
gdb_stub.hpp. Alternatives considered: (a) parse DWARF in the stub —
days of work, large surface area; (b) ship a Python helper that GDB
loads — couples the user setup to a script. The hardcoded path
trades robustness for code size: each read is gated on three sanity
checks (executing pointer non-zero, bus read succeeds,
Object.id API bits ∈ {Internal, Classic, POSIX}), and any failure
falls back to the legacy per-core TID model. A toolchain upgrade that
shifts the layout fails the [!mayfail] live-guest test
(test_gdb_stub_protocol.cpp) loudly — at which point the constants
are refreshed against the new DWARF.
43. qSymbol state machine chains two symbol requests.
The handshake asks first for _Per_CPU_Information (B1 — executing
thread per core). On a non-zero resolve it transitions to
AskingObjectsTable and asks for _Objects_Information_table
(B2/B3/B4 root). Each round handles absent-symbol (empty <addr_hex>)
independently: missing per-CPU disables thread-awareness entirely;
missing objects-table degrades to B1-only. Reset-on-unsolicited
qSymbol:: (GDB file reload) wipes both addresses so a stale value
does not survive an image switch.
44. Stop-on-ErrorMode redirects through the GDB stub when attached.
Previously, a core entering error_mode returned
HaltReason::ErrorMode from run_until_unpaced straight to the
caller — silent for any attached GDB. Now the run loop's error_mode
check calls gdb_stub_->report_error_mode(core). If a client is
attached and the error has not been reported, the stub arms a
stop reply with the signal derived from TBR.tt and the loop returns
HaltReason::Breakpoint. A per-client error_reported_ latch
prevents an infinite re-notify when GDB continues a permanently-dead
core: the second pass returns ErrorMode and the CLI's resume loop
exits cleanly. Without the latch, c-after-crash spins forever.
45. GDB signal numbers are GDB's table, not Linux's.
StopSignal hardcodes the values from gdb/include/gdb/signals.def:
Ill=4, Trap=5, Fpe=8, Bus=10, Segv=11. SIGBUS diverges
from Linux (7); using the host <signal.h> value would silently
mis-label alignment faults as SIGUSR1. The mapping from SPARC TT
to StopSignal is in signal_from_tt(uint32_t) —
table-driven, unit-tested per TT class, with SIGTRAP reserved for
software traps (ta, TT >= 0x80) so GDB treats them as
breakpoint-class events rather than crashes.
46. Stop replies prefer the RTEMS Objects_Id when thread-awareness
is active.
send_stop_reply(core, sig) calls try_read_executing_id(core)
per-invocation (no cache) and uses the resolved RTEMS thread ID in
Tnnthread:<id>;. Falls back to the legacy core+1 if the read
fails. Per-call reads avoid the cache-invalidation problem when RTEMS
performs a context switch: the stub never has to track stores to
Per_CPU.executing. The cost is 2 bus reads per stop event
(~µs), negligible compared to the typical RSP round-trip.
49. B2 — full thread enumeration walks
_Objects_Information_table[Classic][Tasks].local_table[].
With both qSymbol rounds resolved, qfThreadInfo reports every
allocated Classic API task — not just the executing one per core.
The walk is gated on a sanity check (Information.object_size == 400,
matching sizeof(Thread_Control)); if it fails, the build
configuration drifted (POSIX/SMP toggle) and the stub silently
degrades to B1 enumeration rather than emit garbage TIDs. Cap of
256 slots prevents a corrupted maximum_id from looping wildly.
NULL slots (deleted tasks) are skipped, not reported. The cache
populated by qfThreadInfo powers qsThreadInfo (paginated emit
under the 4000-byte packet budget), H g <tid> (TID → target
translation, including non-executing tasks), and qC (current
thread reporting). De-duplication by tid between the executing
set and the table walk prevents IDLE/INIT from appearing twice.
50. B3 — g for a non-executing thread reconstructs from saved
Context_Control; pc = saved o7.
RTEMS only saves the callee-preserved subset (g5, g7, l0..l7,
i0..i7, o6/sp, o7, psr) inside _CPU_Context_switch. The GDB
register block requires 72 entries; everything RTEMS doesn't save
is zero-filled (g0..g4, g6, o0..o5, all 32 FP regs, y, wim, tbr,
fsr, csr). The synthetic PC is the saved o7 — the return
address that _CPU_Context_switch will pop on resume — which is
what bt needs to show "where the thread will continue". npc =
pc + 4. G (write) for a non-executing target is rejected with
E01: writing into a thread's saved context mid-flight races with
the next dispatch in ways the stub cannot make safe. FP regs are
deferred (Thread_Control.fp_context requires checking is_fp).
51. B4 — qThreadExtraInfo is name + state + priority, hex-encoded.
The reply is the ASCII string "NAME [state] pri=N", hex-encoded
for RSP. State decoding is in format_thread_state(uint32_t):
0 → "READY", exclusive flags (SUSPENDED, ZOMBIE, DORMANT,
LIFE_CHANGING, DEBUGGER, INTERRUPTIBLE) listed by name, all
STATES_WAITING_FOR_* bits collapsed into a single
WAIT:<a>|<b>|... segment. Unknown bits surface as UNKNOWN
so score/statesimpl.h drift is visible. Each enrichment is
best-effort: a bus failure on state or priority degrades that
field silently rather than failing the whole reply, so a partial
answer (just name) still beats nothing. Priority reads only the
low 32 bits of the Priority_Control priority (uint64_t) at
offset +20 within Real_priority — Classic API priorities cap at
255 so the high half is always zero.
Diagnostics¶
47. IEmulatorObserver::on_instruction(cpu, pc) — per-instruction
hook for diagnostic probes.
Added 2026-05-13 alongside the smpschededf02 stack-overflow
investigation. Fires from run_until_unpaced between the GDB-stub
break check and core::step(), with the about-to-execute PC. When
no observer is installed, the runtime cost is one null-pointer test
per instruction (well predicted; benchmarked at <1 % overhead on the
existing [smptests] suite). When an observer is installed, each
instruction pays one virtual call — acceptable for diagnostic test
binaries (the smpschededf02 dispatch probe captures ~700 samples per
200 ms simulated, with negligible host overhead).
Alternatives considered: (a) make the hook PC-set-conditional in the
runtime (filter inside Emulator), rejected because it pushes
diagnostic policy into the core; (b) build a separate "trace mode"
of the emulator, rejected because it doubles the surface area to
maintain. The existing IEmulatorObserver already accepted the
"empty default, opt-in by override" model for on_irq_*, on_trap_*,
on_peripheral_attached; this is a natural sibling.
48. The smptests / sptests / fptests harness installs
StdoutLogger(LogLevel::Error) by default.
Added 2026-05-13. The default Info level lets a single misbehaving
guest emit millions of lines (e.g. a tight loop writing PROM area
emits one [WARN] [prom] ignored write per cycle), which during a
batch CTest run buries the harness's own per-test outcome lines and
inflates the build log by orders of magnitude. The integration
harness — not individual unit tests — is the right place to install
the quieter logger because the per-test outcome is captured via the
UART and the CSV row, not via emulator log output. Tests that need
to assert on emulator log content can install their own logger
between Emulator::create and Emulator::initialize.
Arch-neutral IR and JIT¶
See IR and LLVM JIT for the engine and
Adding a frontend for the contributor procedure.
The design is frozen in plans/phase11-arch-neutral-ir.md (D1–D6) and
plans/post-mvp-1to1-roadmap.md (the JIT ADRs); these entries are the
searchable index.
49. Guest state is an opaque byte blob; IR ops touch it only through
LdState/StState at (offset, size).
Added 2026-05-24. The IR knows no register names. SPARC %g/%o/%l/%i,
%psr, %y — and a future ARM's r0-r15, CPSR, banked registers — are
byte offsets the frontend chooses (src/arch/sparc/.../sparc_layout.hpp).
Alternative: virtual registers mapping 1:1 to SPARC registers (the original
Phase 11.1 spec) — rejected because it bakes SPARC's register set into the
IR and makes register windows / banking an IR concept instead of a frontend
offset choice.
50. The block-cache key is (PhysAddr, ModeCtx), and mode-changing
instructions are block terminators.
Added 2026-05-24. Register offsets and decoding depend on an arch mode
(SPARC CWP; ARM Thumb/mode/endianness). Keying on PC alone is insufficient.
Because any instruction that changes the mode context (SPARC SAVE/RESTORE/
trap/RETT/WRPSR; ARM mode switch / BX to Thumb) ends the block, the mode
is constant within a block and the frontend resolves all mode-dependent
offsets at translate time — no runtime-indexed state access is needed.
IrBlock::mode_change marks such blocks so the region compiler does not chain
across them. Alternative: a runtime-indexed state-access op — rejected as
unnecessary once mode changes are terminators (verified against the SPARC
window cases).
51. Endianness is an attribute of the guest-memory ops, not of the bus.
Added 2026-05-24. LdGuest/StGuest carry {size, endianness}; the swap
happens in the op (interpreter) or lowered code (JIT), centralised in
src/ir/include/tero/ir/guest_memory.hpp. The SPARC frontend emits
big-endian accesses, an ARM frontend little-endian. Alternative: a bus that
byte-swaps for a fixed big-endian guest (the pre-IR behaviour) — rejected
because it cannot serve two guest endiannesses; the bus now stores raw bytes.
52. The IR has no flags register; condition codes are explicit guest-state
writes.
Added 2026-05-24. SPARC icc (NZVC) and ARM CPSR (NZCV) differ. The
frontend computes each flag bit into its guest-state offset (eager
evaluation). Lazy flag evaluation (QEMU's cc_op) is a per-frontend
optimisation layered later; it never enters the neutral IR.
53. Atomics are block boundaries; ordering is TSO. Added 2026-05-24. CASA/LDSTUB/SWAP (and ARM LDREX/STREX later) terminate a block, so atomicity holds even though the JIT introduces mid-region exits. TSO is satisfied trivially by single-threaded round-robin today, and maps to the x86-64 host with zero fences (ADR-003), which is why the host is fixed to x86-64 for 1:1 acceptance.
54. The JIT is tiered: an O0 baseline on the calling thread, an O2
optimised tier on a background thread (ADR-002).
Added 2026-05-25. IrJit takes an OptLevel; TieredJit
(src/jit/src/tiered_jit.cpp) drives two IrJits — Baseline compiled
immediately so a cold block runs at once, hot blocks (> jit_promotion_threshold)
recompiled at O2 in the background and published atomically. Both tiers lower
the identical IR, so they are interchangeable. Alternative: a single
synchronous O2 tier (the first cut) — rejected because per-block O2 codegen on
the cold path dominated wall time (RTEMS boot 4.5 s; the sptest-JitIr suite
~15 min). The tier cut those to 1.0 s / 6.3 min with steady-state unchanged.
55. Region chaining fuses a block region into one LLVM function, not
trampolines (12.4b).
Added 2026-05-25. IrJit::compile_region lowers an entry block plus its
same-mode static/conditional successors as one native function (one LLVM
block per member; in-region branches chain directly, back-edges loop, all
budget-bounded). Emulator::build_jit_region discovers the region by BFS over
static/cond edges without touching the (evicting) ir_cache_. Alternative:
per-block functions linked by runtime-patched trampolines (the Phase 12.4
sketch, the QEMU block-linking style) — rejected because keeping the region in
one LLVM function lets LLVM optimise across it and generalises the 12.4a
self-loop without a separate patch mechanism. A member that fails to lower
falls back to the entry-only region, so chaining never does worse than
single-block.
56. A guest core that halts (trap with ET=0) is reported as HaltedMode,
distinct from ErrorMode.
Added 2026-05-25. On SPARC a trap taken while traps are disabled (ET=0) stops
the processor; RTEMS uses exactly this as its deliberate shutdown
(_CPU_Fatal_halt / _exit issue ta 0 with ET=0). That is the guest
halting, not an emulator failure, so HaltReason::HaltedMode (and the bench's
HALTED outcome, which is not a harness failure) name it; ErrorMode is
reserved for the emulator's own internal errors. Diagnosed when a sustained
GR740 SMP compute workload halted at an FP-disabled fatal (the guest's Init
task lacked RTEMS_FLOATING_POINT) — the emulator's behaviour was correct end
to end, which the old ErrorMode label obscured.
57. The execution method is a runtime bool translation; LLVM is a
mandatory dependency.
Added 2026-05-25. The four-way Dispatch enum (Switch / Threaded / Ir /
JitIr) and the TERO_ENABLE_JIT compile flag are gone. EmulatorConfig
carries translation (default true): false runs the core::step switch
interpreter, true runs the tiered LLVM JIT with the IR interpreter as
fallback. Both paths are always compiled in and chosen per Emulator, per the
config-by-struct principle (no behaviour-selecting build flags). LLVM (≥ 18) is
now found unconditionally by CMake — the floor was raised from 17 to 18 because
the JIT adopted llvm::CodeGenOptLevel (LLVM 17 spells it
llvm::CodeGenOpt::Level), which surfaced when a CI runner's bundled LLVM 17
passed the old >= 17 gate but then failed to compile. Alternatives considered: keep Ir exposed
as its own mode (rejected — it is an implementation detail of the JIT fallback,
not a method a user picks) and keep TERO_ENABLE_JIT as a build option
(rejected — it gated behaviour, not just a dependency; making LLVM mandatory
removes a whole untested build configuration).
58. GDB debugging works under binary translation, not only the interpreter.
Added 2026-05-25. Breakpoints are an external BreakpointSet of PCs, never
patched into guest memory, so the block translator is unaffected by them.
run_ir_quantum is breakpoint-aware: with a stub attached it runs native until
a block boundary, checks should_break at each entry, and single-steps (via
core::step) through any block holding an interior breakpoint so it stops on
the exact PC; build_jit_region also stops fusing blocks while a stub is
attached so that interior-breakpoint check stays exact. Only a per-instruction
observer (trace) still forces the interpreter. Alternative: keep forcing the
Switch path whenever a stub is attached (the interim behaviour) — rejected
because it changed the executed code under the debugger, defeating the point of
debugging the path that actually runs in production.
59. The threaded-code dispatcher (Phase 10.2) was removed.
Added 2026-05-25. dispatch_threaded.{hpp,cpp} (th_skeleton<Body> tail-call
chains, the ThreadedHandler typedef, the DecodeCacheEntry::fn pointer, and
the CpuState chain counters) are deleted. Threaded code reached ~1.5× over the
switch interpreter but missed its Phase 10.2 exit targets, and the arch-neutral
IR JIT (decisions 49–55) supersedes it as the fast path. The decode cache keeps
pc_tag + DecodedInsn for the Switch path; the flag helpers in
handlers_internal.hpp, once shared between the threaded templates and the
exec_alu switch, now serve the switch alone. Keeping a third, unused execution
path would be dead weight against the project's no-premature-abstraction rule.
79. The IR interpreter carries the reference-path duties through a
per-instruction step hook; stops resume via set_pc at straight-line
boundaries only.
Added 2026-06-10 (E0, plans/e0-ir-reference-path.md; ADR-006 stage gate).
An IR-only architecture (no switch oracle — every post-SPARC frontend under
ADR-006) still needs the three per-instruction reference duties the SPARC
switch oracle provides: trace (IEmulatorObserver::on_instruction with
honest pre-instruction state), GDB single-step / interior breakpoints, and
lockstep state compare. ir::IStepHook supplies them on the interpreter:
every builder-emitted op is stamped with its insn_index/pc (previously
only trapping ops were), the interpreter fires the hook before the first op
of each guest instruction (op-less instructions included, via the
gap-filling boundary walk), and an honoured stop returns a
BlockExit::boundary_stop whose committed prefix reuses the precise-trap
"prefix committed" guarantee. Resumption is IArchitecture::set_pc(entry_pc
+ stride · n) — exact because a stop is only honoured at a straight-line
boundary: IrBlock::no_stop_tail (stamped by the frontend, 1 on every
SPARC delay-slot-bearing terminator) excludes the CTI→delay-slot shadow,
where nPC ≠ PC+4 and an annulled slot must not be re-entered as a fresh
block. Three consequences: run_ir_interpret_quantum's observer now fires
interleaved with execution instead of pre-fired per block (each callback
sees the committed state of every prior instruction); an IR-only frontend
yields at the EXACT quantum boundary (the SMP determinism guard no longer
needs an oracle) and single_step is a true one-instruction step; and GDB
interior breakpoints run the block on the interpreter with a stop before
the breakpoint PC instead of running past it. SPARC defaults are untouched:
an observer still forces the switch path (the frozen oracle is the SPARC
trace path), and the oracle still walks SPARC quantum tails and breakpoint
blocks — the hook activates for SPARC only under force_ir_interpret
(validation) and for the per-instruction lockstep harness
(run_ir_diff(..., per_insn = true)), which compares the full blob at
every interior boundary across an RTEMS boot.
80. New guest architectures are IR-only; the Switch interpreter is the
frozen SPARC oracle; the future frontend generator emits Tero IR, never
LLVM IR (ADR-006).
Added 2026-06-10 (full text: plans/multiarch-emugen-frontends.md;
public summary: EmuGen). A new ISA implements exactly
IArchFrontend + IArchitecture — its reference path is the IR
interpreter (Decision 79), its fast path the tiered JIT; no per-arch
switch interpreter is ever written. The SPARC Switch stays hand-written
and frozen indefinitely: it is the most battle-tested validation asset in
the project and costs ~zero to keep, while replicating it per ISA is the
cost the decision avoids. The planned generator (EmuGen, gated on two
committed architectures — rule of three) produces only what a hand-written
frontend provides (decoder, translate_block, layout header) and targets
Tero IR: generating LLVM IR directly — the alternative considered — would
orphan the IR interpreter, the block cache, the tiering, and the GDB
metadata in one stroke. Amended same day: the SPARC retrofit (E3) is
mandatory, and the generated SPARC frontend goes to production once
validated block-by-block against the hand-written one over the RTEMS
corpus; LEON3 and LEON4 share that single SPARC V8 frontend.
Entity model, compose, and runtime decomposition¶
The 2026-06 refactor track (entity-model S0–S11 + A-series, compose, the
cleanliness sweep). Full migration history: plans/entity-object-model.md;
the searchable judgment calls are indexed here.
60. Everything in a machine is a tero::IEntity; the name is IEntity,
not IObject or IModel.
Added 2026-06-08. Peripherals, host-facing plugins, comms buses, CPUs, and
the bus fabric share one base (src/interfaces/include/tero/ientity.hpp);
"being a peripheral" means exposing IMmio, not deriving a separate root.
IObject/IModel were rejected because the future SMP2 wrapper bridges to
Smp::IObject/Smp::IModel and same-named types on both sides of that
bridge guarantee ambiguity.
61. Capability lookup has two spellings: the get_interface<T>() member
and the interface_cast<T>(IEntity*) free function.
Added 2026-06-08. The free form is null-safe and usable on an IEntity*;
the member reads better on a known entity. They are named differently
because a same-named free function is shadowed by the member inside an
entity's own methods — a real lookup failure hit during S0, not a style
preference. Both subsume the per-protocol dynamic_cast<…Provider*>
chains that predated them.
62. No string-keyed reflective properties; configuration stays typed.
Added 2026-06-08. The TEMU-style set_property("freq", "80MHz") model was
rejected: config-by-struct is a frozen principle, and typed fields fail at
compile time where string keys fail at run time. Read-only introspection
remains the IPublisher seam (→ SMP2 IPublication).
63. Wiring is typed interfaces + named slots, not IPort.
Added 2026-06-08. A declarative edge is
Connection{from_slot, peer, peer_slot} (peripheral_spec.hpp); the
runtime resolves it in one generic pass — a find_port(from_slot) hit
binds a port slot (signals, SpaceWire), otherwise the consumer's
IConnectable::connect joins a shared bus (CAN, SPI, 1553). The old plan
of routing all wiring through stateful IPort objects was rejected: a
port is one capability among several, not the mechanism.
64. IRQ lines stay the PeripheralSpec::irqs shorthand, resolved to
IrqBridges at attach() — not post-attach connection edges.
Added 2026-06-08 (S3.C finding). Bridges must exist before the post-attach
connection pass and require controller-first ordering; migrating IRQs to
generic edges would reorder the lifecycle under every RTEMS guest for zero
expressiveness gain.
65. Emulator is a facade; the subsystems are Soc, ExecutionEngine,
and DebugServer, and teardown correctness is encoded in declaration
order.
Added 2026-06-08 (S4–S6). emulator.cpp shrank from 1806 lines to a thin
delegation layer. engine_ is declared last so it destructs first (its
workers touch the SoC bus through the per-core bridges); ~Soc stops
plugins before freeing the buses they observe. Reordering the members of
Emulator or Soc is a teardown bug — the invariant is documented at the
member declarations and was proven by the full suite under
ASan/UBSan + leak detection.
66. core(idx) keeps returning core::CpuState&; the canonical state is
the per-core arch-sized GuestState blob.
Added 2026-06-08 (S7, revalidated A1–A3). An opaque accessor would have
broken 82 call sites across 12 files for no functional gain. CpuState is
a lens over the engine-owned blob (core_blobs_), so there is no
CpuState⇄GuestState sync layer; non-SPARC architectures never construct
the lens.
67. Trap and interrupt delivery sit behind three coarse
ir::IArchitecture methods: set_pc, raise_block_exception,
evaluate_interrupt.
Added 2026-06-08 (S10). One virtual call per block exit or scheduling
round — never per instruction — keeps the seam off the hot path. The whole
SPARC fault tail (data-access vector, delay-slot trap PC/nPC, the ET=0 →
error-mode rule of SPARC V8 §7.3) lives in
src/arch/sparc/src/sparc_arch.cpp; the engine only halts the core when
raise_block_exception returns false.
68. External interrupts are decided and delivered through the
architecture.
Added 2026-06-08/09 (S10 + arch-decoupling). evaluate_interrupt is a pure
decision (level select + enable/priority gates + trap type + ack_mask,
the controller's own encoding formed by the arch);
deliver_interrupt performs the entry on the architecture's per-core state
so micro-state outside the blob (the SPARC annul flag) is cleared exactly
as the switch oracle does. The engine never reconstructs a controller
bitmask.
69. A new ISA is a CpuArch enumerator plus one factory case;
EmulatorConfig::arch_factory is the injection seam for out-of-tree or
test frontends.
Added 2026-06-08 (S7 + S11). make_architecture
(src/runtime/src/architecture_factory.cpp) is the single extension
point; the arch_factory std::function field lets an embedder plug an
ir::IArchitecture without an enumerator. The toy frontend
(tests/integration/test_toy_frontend.cpp) runs end-to-end through both
run-loop paths on its own 64-byte state layout — the seam's proof.
70. A peripheral's name() is its registry identity — the spec
instance_name, injected by the runtime; authors implement
device_class().
Added 2026-06-09. IPeripheral::name() is final:
find_entity(x)->name() == x now holds for every entity kind in the one
flat instance-name namespace (peripherals, plugins, comms buses; uniqueness
enforced by validate_emulator_config). Previously peripherals
self-reported their IP-core class ("apbuart") while everything else
self-reported instance identity, so a peripheral did not know its own
registry name. The injection is one call in Soc::build's factory loop —
the funnel every config source passes through. IPeripheral grew state and
a vtable slot, so ComponentAbiVersion was bumped 1 → 2 and v1 component
libraries are rejected at dlopen time.
71. Silicon kits are Machine compositions in tero_compose; the
runtime recipe functions were deleted.
Added 2026-06-09. tero::compose::gr712rc_config() / gr740_config()
(src/compose/src/kits.cpp) build the same EmulatorConfig the deleted
tero::runtime recipes did, but through the typed object graph that also
serves .tero scripts and component libraries — one assembly path instead
of two. Deliberate API break; the migration is a namespace change.
72. The owning entity lists stay typed: peripherals_, buses_,
plugins_.
Added 2026-06-09. Merging them into one vector<unique_ptr<IEntity>> was
rejected: the three lists encode real operational roles (peripherals tick
every round, plugins have start/stop lifecycle and must destruct before the
buses they observe, bus media are passive), and one flat vector turns the
"plugins before buses" destruction guarantee from a compile-time
declaration-order fact into a fragile runtime insertion-order one.
find_entity already iterates all of them uniformly.
73. The AMBA AHB/APB hierarchy is a descriptive overlay; the hot dispatch
path is unchanged.
Added 2026-06-09. bus::AmbaBus ("ahb"/"apb" entities) and
bus::MemorySpace/IMemoryAccess model the topology for inspection;
rewiring the dispatch through them (and adding an ATC) was declined after
reading the hot path: the RAM fast paths are RAM-typed (TSO atomics via
std::atomic_ref on Ram's buffer) and the JIT already inlines a
host-pointer RamWindow, so the erased indirection buys nothing. An ATC
becomes the software TLB when the SRMMU (P3) lands.
74. The opcode histogram is emitted by clients; the library exposes
data.
Added 2026-06-09. ~ExecutionEngine used to fopen a CSV and read
$TERO_HISTOGRAM_OUT — the last direct-I/O + env-config escape hatch in
the core. Emulator::opcode_histogram() returns the counters and
format_opcode_histogram_csv renders them (pure function, empty string
when all-zero); tero-bench and tero-emu own the I/O policy through an explicit
--histogram-out <path> flag (stderr when absent; the env var was removed
2026-06-10). The per-instruction counting stays compile-gated
(TERO_OPCODE_HISTOGRAM) — that flag gates instrumentation cost, not
behaviour.
75. Machine scripts have one grammar (v2): expressions fold at parse time,
and the file can never become a program.
Added 2026-06-10. let constants, bounded integer arithmetic (`+ - * / <<
| & ~
, parens, K/M/G suffixes) and read-onlyobj.propreferences to earlier lines all constant-fold in the parser (src/compose/src/script.cpp) — theMachineonly ever sees literals. Conditionals, loops, strings and forward references are structurally impossible: a.terofile stays declarative, diffable, and reads like the datasheet's memory-map table. The v1 verbs (create/write/positionalmap/4-tokenconnect) were removed rather than aliased — two grammars in examples and docs diverge. Interface tokens onconnectedges are the realtero::interface names, validated byentity-check` against a known set.
76. PnP publication is edge-based: a bridge's slaves edges ARE the
table.
Added 2026-06-10. connect <bridge>.slaves <dev>:IAmbaPnp declares
membership and slot (edge order = slot; slaves[N] pins a sparse one); a
device with no edge stays off the table; one such edge switches the machine
to explicit-publication mode, while a machine with none keeps the legacy
address-derived auto placement (a minimal board needs no slot bookkeeping).
This replaced the pnp_slot/pnp_publish device attributes: the bridge
owns its slot table as in the silicon, membership stops being inferred from
address ranges, and "unpublished" stops being a magic knob. The audit of
the deleted hardcoded tables also corrected a design premise: the five
IAmbaPnp implementors are the complete historically-published set, so no
mass identity rollout was needed — identity resolution was already
device-first (pnp_table.cpp), and further devices adopt IAmbaPnp
per-class when a board publishes them with a manual-sourced GRLIB §3.4 ID.
77. The silicon kits are embedded .tero files; the C++ kit functions are
loaders.
Added 2026-06-10. src/compose/machines/gr712rc.tero / gr740.tero are
the single source of truth: gr712rc_machine()/gr740_machine() parse
configure-time-embedded copies (no runtime file lookup, no install-path
dependency), and the same files install under share/tero/machines/ as
the user's copy-and-derive starting point. Equivalence to the deleted C++
compositions was proven by dump() byte-equality on both SoCs before the
swap; test_compose_kits (placement content) and RTEMS [boot] passed
unchanged throughout. The alternative — shipping files alongside C++
compositions — was rejected: two definitions of one SoC drift.
78. PnP slots are internal; slaves edges are membership only. The host
sink entity is StdoutMonitor.
Added 2026-06-10. Amends Decision 76: a connect <bridge>.slaves
<dev>:IAmbaPnp edge only publishes the device on that bridge's table —
records compact in edge order and the slot number is an internal detail.
GRLIB software (RTEMS, MKPROM, Linux) enumerates the table and matches
records by vendor/device identity and BAR address, never by slot position,
so the index pinned by slaves[N] and the pnp_ahb_slot property bought
silicon byte-fidelity of the PnP area at the cost of order-dependence
(moving an edge silently renumbered the table) and a stringly port-name
hack (slaves[3] → port "slaves3"). Both forms were removed; slaves[N]
now fails build() with a diagnostic (silently treating it as a structural
edge would drop the device off the table unnoticed). Consequence accepted:
the kits' published tables compact (GR740 APB 0,1,3,4,5 → 0..4) and the PnP
MMIO region is no longer byte-identical to the silicon manuals' sparse
layouts — test_compose_kits asserts the published device set, and RTEMS
[boot] passed unchanged on both kits. In the same pass the Console
component was renamed StdoutMonitor (ComponentKind::Monitor): the
entity is exactly a host stdout sink, not hardware, and the honest name
keeps future host sinks as new registry classes rather than properties.