Skip to content

Design principles

These are the non-negotiable invariants that shape every line of Tero. They are frozen in CLAUDE.md (the architectural contract for agents working on the code); this page restates them for a human reader with the reasoning behind each one, what it costs, and — for each — the concrete code that enforces it.

Authority

When this page and CLAUDE.md drift, CLAUDE.md is the contract; this page is the explanation. If a rule needs to change, edit CLAUDE.md first, then bring this page back into sync. Concrete judgment calls beyond these principles — the implementation-time trade-offs — live in Design decisions; the frozen architectural decisions (dual exec mode, tiered JIT, host requirements, no cycle-accurate) live in Design decisions log / ADRs.

The principles answer a single recurring question: "can this design couple two things that should be independent, or read state it should not?" Every rule below closes one such door.

1. Zero singletons, zero global mutable state

The test: "Can I instantiate two Emulator objects in the same process without them interfering?" — must always be YES.

Why.

  • An external simulation wrapper (SMP2) may want one model per simulated SoC in the same binary.
  • Test executables instantiate dozens of bus/CPU pairs in parallel Catch2 sections.
  • Global mutable state silently couples unrelated tests, producing flaky CI that costs days to diagnose.

How it is enforced.

  • No static mutable variables anywhere. No singleton classes.
  • Every dependency is owned by Emulator or injected through it. The Emulator owns its cores, bus, scheduler, peripherals, IRQ bridges, per-core IR caches and JITs — all as members (src/runtime/include/tero/runtime/emulator.hpp).
  • Strong types are enum class aliases with zero storage (types.hpp), so even the "vocabulary" carries no shared state.

The cost is discipline: anything that looks like it wants to be a global (a decode cache, an opcode histogram) is instead a per-Emulator or per-CpuState member.

2. Zero direct I/O from the core

No printf, no std::cout, no fopen, no sockets inside any non-tero_app translation unit. Every byte that reaches the host's console goes through ICharacterDevice; every log line goes through ILogger.

Why.

  • A simulator embedded inside a host simulation environment must redirect output through the host's logging service, not the process's stdout.
  • Unit tests need to capture or suppress output without touching the code under test (tests/support/capturing_char_device).
  • Replacing the implementation must be a one-line set_* call (Decision 23): Emulator::set_logger, set_character_device, set_uart_character_device.

The only exception is src/app/main.cpp, which is itself a default implementation — the CLI is a consumer, not the core.

3. Time as a parameter, never wall-clock

Emulator never reads the host system clock to compute simulated time. Time advances exclusively through:

  • EmulatorConfig::cpu_clock_hz + EmulatorConfig::cpins_per_insn (the rate; see ns_per_insn_for, src/runtime/include/tero/runtime/emulator_config.hpp:65).
  • Emulator::run_for(SimTimeNs) (caller-driven duration).
  • Emulator::run_until(SimTimeNs) (caller-driven deadline).

Inside run_until_unpaced, simulated time is a pure accumulation of per-instruction deltas plus the idle-skip jump (src/runtime/src/emulator.cpp:875, :897) — it is computed, never sampled.

Why.

  • Determinism: replays produce identical traces regardless of host load. This is what makes the Tero-vs-SIS lockstep comparator and the IR-vs-switch oracle (run_oracle_lockstep) meaningful.
  • SMP2 compatibility: the simulation environment owns the clock; the model just consumes it.

The reconciled exception: PacingMode::Realtime

The original frozen rule was "the core must never read the host clock." After human review (2026-04-28) wall-clock pacing was moved into the core, gated by PacingMode. In PacingMode::Realtime, run_until reads steady_clock::now() and sleep_untils so 1 s simulated ≈ 1 s real (src/runtime/src/emulator.cpp:571, :597). The intent of the original rule — that an external scheduler can own time — is preserved by PacingMode::Turbo, which never touches the host clock and which the SMP2 wrapper hard-sets. Crucially, the host clock affects only when work happens, never what the simulated clock reads: sim_time_ is identical under both pacing modes.

4. Configuration by struct, not by files

Emulator::create() takes an EmulatorConfig plain C++ struct (emulator_config.hpp). No parsing of YAML, JSON, INI, or environment variables happens inside the core library.

Why.

  • The CLI is one consumer; a future SMP2 wrapper, a Python binding, or an automated test all want to skip the file format entirely and build the struct directly.
  • It keeps the surface area minimal, so changes are localised.

Consequence — no behaviour-selecting build flags. Because the config is a struct, behaviour must be a runtime field of that struct, not a compile-time #define. This is why:

  • execution method is EmulatorConfig::translation (a bool), not a Dispatch enum or TERO_ENABLE_JIT flag (Decision 57);
  • pacing is EmulatorConfig::pacing;
  • thread mapping is EmulatorConfig::execution_mode (ADR-001).

Both code paths are always compiled in and chosen per-Emulator. The only option()s left in CMake gate a dependency or instrumentation (TERO_BUILD_TESTS, TERO_OPCODE_HISTOGRAM), never guest-visible behaviour.

5. Errors as values at the public API boundary

Every method on the public surface returns Result<T> — a tl::expected<T, ErrorCode> (types.hpp:67). Internal exceptions are permitted but must never cross the library boundary.

template <typename T>
using Result = tl::expected<T, ErrorCode>;   // types.hpp:67

tl::expected stands in for C++23 std::expected so the project stays on C++20. ErrorCode is a small enum class (Ok, BusError, InvalidAddress, AlignmentError, TrapGenerated, InvalidConfig, ElfLoadError, IoError, JitError).

Why.

  • Embedded simulation environments often disable C++ exceptions entirely.
  • Result<T> makes the error path syntactically explicit at every call site.
  • [[nodiscard]] (used aggressively across the API, including make_error, types.hpp:71) enforces handling at compile time.

6. Strong types everywhere

PhysAddr, VirtAddr, CoreId, SimTimeNs, IrqLine, AccessSize are all enum class over fixed-width integers (types.hpp:23-42). Raw uint32_t appears only at the byte-level encode/decode boundary.

The types carry only the operations that make physical sense (types.hpp:83-152):

Type Underlying Allowed ops
PhysAddr / VirtAddr uint32_t addr + off, addr - off, addr - addr (distance), +=
CoreId uint8_t to_underlying only — an identifier, not arithmetic
IrqLine uint8_t to_underlying only
SimTimeNs uint64_t +, -, += (durations add)
AccessSize uint8_t bytes(size) → ½/4

You can subtract two addresses (a distance); you cannot add two CoreIds (meaningless) — the operator simply does not exist, so the mistake is a compile error.

Why.

  • Eliminates an entire class of mix-up bugs ("I passed a virtual address where a physical was expected"). A function taking PhysAddr cannot be handed a VirtAddr or a bare integer.
  • Costs zero at runtime: enum class is a typed alias, no wrapper object, no vtable.
  • Reads correctly at the call site: read_physical(PhysAddr{0x40000000}, …) is unambiguous.

7. The Switch interpreter is the correctness oracle

Tero has two execution methods, chosen at runtime by EmulatorConfig::translation (both always compiled in):

  • the Switch interpreter (core::step, translation = false) — a fetch-decode-execute loop with a per-PC decode cache; and
  • binary translation (translation = true, the default) — the arch-neutral IR run through the tiered LLVM JIT, with the IR interpreter as fallback. See IR and LLVM JIT.

The invariant is not "no JIT"; it is that the Switch interpreter remains the reference. It is kept deliberately simple (src/core/src/step.cpp is ~120 lines) so it can be verified by inspection, and every translated path is validated bit-identical against it — at block granularity (Emulator::run_oracle_lockstep, emulator.hpp) and across full RTEMS boots (the Tero-vs-SIS lockstep comparator).

Why.

  • Correctness and peripheral coverage come before performance, always (CLAUDE.md rule 4). A fast path that can diverge from a small, auditable oracle is worse than no fast path.
  • Every optimisation introduces a category of hardware-modelling bugs that take days to track down — so each one (decode cache, IR, JIT tiers) must earn its place against the oracle, never replace it.
  • The path is selected per-Emulator at runtime, so the SMP2-facing contract is unaffected and the oracle is always one config field away for debugging.

8. Round-robin single-threaded multicore (default)

In the default ExecutionMode::SingleThread, all cores execute on the host's main thread, one quantum at a time, in core order (run_until_unpaced's else branch, src/runtime/src/emulator.cpp:876).

Why this is the default.

  • TSO is satisfied by construction. Only one core runs at a time, so Total Store Order — SPARC's memory model — holds trivially.
  • Atomics are correct by construction. CASA, LDSTUB, SWAP cannot race when execution is cooperative.
  • Determinism matters more than parallelism for the target RTEMS workloads; the host CPU is not the bottleneck for a 4-core SoC running sptests.

The escape hatch: ExecutionMode::MultiThread (ADR-001). For 1:1 GR740 throughput, each simulated core can run on its own host thread (thread-per-core + std::barrier, start_workers/worker_loop, src/runtime/src/emulator.cpp:690). Thread-safe primitives are runtime-gated, not conditionally compiled: tero::GatedMutex (src/interfaces/include/tero/gated_mutex.hpp) is a std::mutex that no-ops when its gate is inactive, so SingleThread pays near-zero sync overhead while both modes live in one binary. MultiThread is standalone-only and never used by the SMP2 wrapper. See Execution model and Decisions / ADR-001.

9. Endianness lives at the typed-access boundary

SPARC is big-endian. Internal memory is raw bytes (Ram holds a std::vector<std::byte> as it appears on the wire); the byte-swap happens only when bytes become a typed value:

  • in the bus typed accessors, SystemBus::{encode,decode}_be (src/bus/src/system_bus.cpp:57, :66) and the RAM fast paths (__builtin_bswap32, src/bus/src/ram.cpp:18);
  • in the IR, endianness is an attribute of the guest-memory op (LdGuest/StGuest carry {size, endianness}, Decision 51), not of the bus — so the bus stores raw bytes and a future little-endian guest works without a bus change.

Why. Keeping RAM as raw bytes makes it trivially snapshot-able (P5 Save/Restore is a memcpy) and matches how real memory controllers behave. Putting the swap in the op (not the bus) is what lets one bus serve two guest endiannesses (Decision 51).

10. SMP2-aligned lifecycle

The Emulator exposes a lifecycle that mirrors SMP2 model states even though the SMP2 wrapper itself is a separate repository:

Publish() → Configure() → Connect() → Initialize() → Run() → Hold() → Store() → Restore()

Run() (run_for/run_until), Initialize(), Reset(), Store(), Restore() are implemented (P5 landed Save/Restore 2026-05-11); the rest are stubs for the wrapper to delegate to.

Why. Aligning the API boundary now means the wrapper is a thin adapter, not a refactor — and the same shape is reusable by other simulation frameworks with minimal glue.

11. Extension by IPeripheral + declarative PeripheralSpec

Adding a custom peripheral must be one C++ file plus a single PeripheralSpec pushed into cfg.peripherals (or a single add_peripheral() call for REPL/test usage). No CMake surgery, no recompilation of the runtime library, no global registration step.

Why.

  • The customer story is "I have a proprietary IP block; model it next to your IRQMP." That story breaks the moment the integration cost exceeds a few lines.
  • The reference is examples/demo-dma/: ~200 lines for a fully wired DMA-capable peripheral with an IRQ, registered both as a spec and via add_peripheral.

Custom peripheral guide

See also