Design principles¶
These are the non-negotiable invariants that shape every line of
Tero. They are frozen in CLAUDE.md (the architectural contract for
agents working on the code); this page restates them for a human reader
with the reasoning behind each one, what it costs, and — for each — the
concrete code that enforces it.
Authority
When this page and CLAUDE.md drift, CLAUDE.md is the contract;
this page is the explanation. If a rule needs to change, edit
CLAUDE.md first, then bring this page back into sync. Concrete
judgment calls beyond these principles — the implementation-time
trade-offs — live in Design decisions; the frozen
architectural decisions (dual exec mode, tiered JIT, host
requirements, no cycle-accurate) live in
Design decisions log / ADRs.
The principles answer a single recurring question: "can this design couple two things that should be independent, or read state it should not?" Every rule below closes one such door.
1. Zero singletons, zero global mutable state¶
The test: "Can I instantiate two
Emulatorobjects in the same process without them interfering?" — must always be YES.
Why.
- An external simulation wrapper (SMP2) may want one model per simulated SoC in the same binary.
- Test executables instantiate dozens of bus/CPU pairs in parallel Catch2 sections.
- Global mutable state silently couples unrelated tests, producing flaky CI that costs days to diagnose.
How it is enforced.
- No
staticmutable variables anywhere. No singleton classes. - Every dependency is owned by
Emulatoror injected through it. TheEmulatorowns its cores, bus, scheduler, peripherals, IRQ bridges, per-core IR caches and JITs — all as members (src/runtime/include/tero/runtime/emulator.hpp). - Strong types are
enum classaliases with zero storage (types.hpp), so even the "vocabulary" carries no shared state.
The cost is discipline: anything that looks like it wants to be a
global (a decode cache, an opcode histogram) is instead a per-Emulator
or per-CpuState member.
2. Zero direct I/O from the core¶
No printf, no std::cout, no fopen, no sockets inside any
non-tero_app translation unit. Every byte that reaches the host's
console goes through ICharacterDevice; every log line goes through
ILogger.
Why.
- A simulator embedded inside a host simulation environment must redirect output through the host's logging service, not the process's stdout.
- Unit tests need to capture or suppress output without touching the code
under test (
tests/support/capturing_char_device). - Replacing the implementation must be a one-line
set_*call (Decision 23):Emulator::set_logger,set_character_device,set_uart_character_device.
The only exception is src/app/main.cpp, which is itself a default
implementation — the CLI is a consumer, not the core.
3. Time as a parameter, never wall-clock¶
Emulator never reads the host system clock to compute simulated
time. Time advances exclusively through:
EmulatorConfig::cpu_clock_hz+EmulatorConfig::cpi→ns_per_insn(the rate; seens_per_insn_for,src/runtime/include/tero/runtime/emulator_config.hpp:65).Emulator::run_for(SimTimeNs)(caller-driven duration).Emulator::run_until(SimTimeNs)(caller-driven deadline).
Inside run_until_unpaced, simulated time is a pure accumulation of
per-instruction deltas plus the idle-skip jump
(src/runtime/src/emulator.cpp:875, :897) — it is computed, never
sampled.
Why.
- Determinism: replays produce identical traces regardless of host
load. This is what makes the Tero-vs-SIS lockstep comparator
and the IR-vs-switch oracle (
run_oracle_lockstep) meaningful. - SMP2 compatibility: the simulation environment owns the clock; the model just consumes it.
The reconciled exception: PacingMode::Realtime
The original frozen rule was "the core must never read the host
clock." After human review (2026-04-28) wall-clock pacing was
moved into the core, gated by PacingMode. In
PacingMode::Realtime, run_until reads steady_clock::now() and
sleep_untils so 1 s simulated ≈ 1 s real
(src/runtime/src/emulator.cpp:571, :597). The intent of the
original rule — that an external scheduler can own time — is
preserved by PacingMode::Turbo, which never touches the host clock
and which the SMP2 wrapper hard-sets. Crucially, the host clock
affects only when work happens, never what the simulated clock
reads: sim_time_ is identical under both pacing modes.
4. Configuration by struct, not by files¶
Emulator::create() takes an EmulatorConfig plain C++ struct
(emulator_config.hpp). No parsing of YAML, JSON, INI, or environment
variables happens inside the core library.
Why.
- The CLI is one consumer; a future SMP2 wrapper, a Python binding, or an automated test all want to skip the file format entirely and build the struct directly.
- It keeps the surface area minimal, so changes are localised.
Consequence — no behaviour-selecting build flags. Because the config
is a struct, behaviour must be a runtime field of that struct, not a
compile-time #define. This is why:
- execution method is
EmulatorConfig::translation(abool), not aDispatchenum orTERO_ENABLE_JITflag (Decision 57); - pacing is
EmulatorConfig::pacing; - thread mapping is
EmulatorConfig::execution_mode(ADR-001).
Both code paths are always compiled in and chosen per-Emulator. The
only option()s left in CMake gate a dependency or instrumentation
(TERO_BUILD_TESTS, TERO_OPCODE_HISTOGRAM), never guest-visible
behaviour.
5. Errors as values at the public API boundary¶
Every method on the public surface returns Result<T> — a
tl::expected<T, ErrorCode> (types.hpp:67). Internal exceptions are
permitted but must never cross the library boundary.
tl::expected stands in for C++23 std::expected so the project stays
on C++20. ErrorCode is a small enum class
(Ok, BusError, InvalidAddress, AlignmentError, TrapGenerated,
InvalidConfig, ElfLoadError, IoError, JitError).
Why.
- Embedded simulation environments often disable C++ exceptions entirely.
Result<T>makes the error path syntactically explicit at every call site.[[nodiscard]](used aggressively across the API, includingmake_error,types.hpp:71) enforces handling at compile time.
6. Strong types everywhere¶
PhysAddr, VirtAddr, CoreId, SimTimeNs, IrqLine, AccessSize
are all enum class over fixed-width integers (types.hpp:23-42). Raw
uint32_t appears only at the byte-level encode/decode boundary.
The types carry only the operations that make physical sense
(types.hpp:83-152):
| Type | Underlying | Allowed ops |
|---|---|---|
PhysAddr / VirtAddr |
uint32_t |
addr + off, addr - off, addr - addr (distance), += |
CoreId |
uint8_t |
to_underlying only — an identifier, not arithmetic |
IrqLine |
uint8_t |
to_underlying only |
SimTimeNs |
uint64_t |
+, -, += (durations add) |
AccessSize |
uint8_t |
bytes(size) → ½/4 |
You can subtract two addresses (a distance); you cannot add two
CoreIds (meaningless) — the operator simply does not exist, so the
mistake is a compile error.
Why.
- Eliminates an entire class of mix-up bugs ("I passed a virtual address
where a physical was expected"). A function taking
PhysAddrcannot be handed aVirtAddror a bare integer. - Costs zero at runtime:
enum classis a typed alias, no wrapper object, no vtable. - Reads correctly at the call site:
read_physical(PhysAddr{0x40000000}, …)is unambiguous.
7. The Switch interpreter is the correctness oracle¶
Tero has two execution methods, chosen at runtime by
EmulatorConfig::translation (both always compiled in):
- the Switch interpreter (
core::step,translation = false) — a fetch-decode-execute loop with a per-PC decode cache; and - binary translation (
translation = true, the default) — the arch-neutral IR run through the tiered LLVM JIT, with the IR interpreter as fallback. See IR and LLVM JIT.
The invariant is not "no JIT"; it is that the Switch interpreter
remains the reference. It is kept deliberately simple
(src/core/src/step.cpp is ~120 lines) so it can be verified by
inspection, and every translated path is validated bit-identical against
it — at block granularity (Emulator::run_oracle_lockstep,
emulator.hpp) and across full RTEMS boots (the Tero-vs-SIS lockstep
comparator).
Why.
- Correctness and peripheral coverage come before performance, always (CLAUDE.md rule 4). A fast path that can diverge from a small, auditable oracle is worse than no fast path.
- Every optimisation introduces a category of hardware-modelling bugs that take days to track down — so each one (decode cache, IR, JIT tiers) must earn its place against the oracle, never replace it.
- The path is selected per-
Emulatorat runtime, so the SMP2-facing contract is unaffected and the oracle is always one config field away for debugging.
8. Round-robin single-threaded multicore (default)¶
In the default ExecutionMode::SingleThread, all cores execute on the
host's main thread, one quantum at a time, in core order
(run_until_unpaced's else branch, src/runtime/src/emulator.cpp:876).
Why this is the default.
- TSO is satisfied by construction. Only one core runs at a time, so Total Store Order — SPARC's memory model — holds trivially.
- Atomics are correct by construction.
CASA,LDSTUB,SWAPcannot race when execution is cooperative. - Determinism matters more than parallelism for the target RTEMS workloads; the host CPU is not the bottleneck for a 4-core SoC running sptests.
The escape hatch: ExecutionMode::MultiThread (ADR-001). For 1:1
GR740 throughput, each simulated core can run on its own host thread
(thread-per-core + std::barrier, start_workers/worker_loop,
src/runtime/src/emulator.cpp:690). Thread-safe primitives are
runtime-gated, not conditionally compiled: tero::GatedMutex
(src/interfaces/include/tero/gated_mutex.hpp) is a std::mutex that
no-ops when its gate is inactive, so SingleThread pays near-zero
sync overhead while both modes live in one binary. MultiThread is
standalone-only and never used by the SMP2 wrapper. See
Execution model and
Decisions / ADR-001.
9. Endianness lives at the typed-access boundary¶
SPARC is big-endian. Internal memory is raw bytes
(Ram holds a std::vector<std::byte> as it appears on the wire);
the byte-swap happens only when bytes become a typed value:
- in the bus typed accessors,
SystemBus::{encode,decode}_be(src/bus/src/system_bus.cpp:57,:66) and the RAM fast paths (__builtin_bswap32,src/bus/src/ram.cpp:18); - in the IR, endianness is an attribute of the guest-memory op
(
LdGuest/StGuestcarry{size, endianness}, Decision 51), not of the bus — so the bus stores raw bytes and a future little-endian guest works without a bus change.
Why. Keeping RAM as raw bytes makes it trivially snapshot-able (P5
Save/Restore is a memcpy) and matches how real memory controllers
behave. Putting the swap in the op (not the bus) is what lets one bus
serve two guest endiannesses (Decision 51).
10. SMP2-aligned lifecycle¶
The Emulator exposes a lifecycle that mirrors SMP2 model states even
though the SMP2 wrapper itself is a separate repository:
Run() (run_for/run_until), Initialize(), Reset(), Store(),
Restore() are implemented (P5 landed Save/Restore 2026-05-11); the rest
are stubs for the wrapper to delegate to.
Why. Aligning the API boundary now means the wrapper is a thin adapter, not a refactor — and the same shape is reusable by other simulation frameworks with minimal glue.
11. Extension by IPeripheral + declarative PeripheralSpec¶
Adding a custom peripheral must be one C++ file plus a single
PeripheralSpec pushed into cfg.peripherals (or a single
add_peripheral() call for REPL/test usage). No CMake surgery, no
recompilation of the runtime library, no global registration step.
Why.
- The customer story is "I have a proprietary IP block; model it next to your IRQMP." That story breaks the moment the integration cost exceeds a few lines.
- The reference is
examples/demo-dma/: ~200 lines for a fully wired DMA-capable peripheral with an IRQ, registered both as a spec and viaadd_peripheral.
See also¶
- Design decisions log + ADRs — the frozen architectural decisions and every implementation-time judgment call
- Layers and modules — how the boundaries are drawn in CMake
- Execution model — how principles 7 and 8 play out in the run loop