Skip to content

tero_runtime

The orchestration layer. It owns the Emulator facade and the three subsystems it delegates to — Soc (entity graph), ExecutionEngine (cores + run loop), DebugServer (GDB stub + breakpoints) — plus everything needed to assemble a working SoC from an EmulatorConfig: the event scheduler, the ELF loader, the CPU↔bus bridge, the AMBA Plug&Play table builder, the config validator, and the GDB remote stub. It is the only module that sees every other layer. The facade/subsystem split is runtime decomposition.

# src/runtime/CMakeLists.txt
target_link_libraries(tero_runtime
    PUBLIC  tero::interfaces tero::bus tero::peripherals tero::jit
    PRIVATE tero::core tero::defaults tero::arch_sparc tero::warnings)
# tero::core is PRIVATE (arch-decoupling L14): the public headers only
# forward-declare core::CpuState for the SPARC-only core(idx) accessor.

Responsibility

Compose the cores, bus, peripherals, scheduler, and translation engine into one runnable machine; advance simulated time (round-robin or thread-per-core); load images; expose memory + debug access. No CLI, no host I/O of its own — it is a user of the defaults.

Source layout

src/runtime/
├── include/tero/runtime/
│   ├── emulator.hpp          ← the public Emulator facade
│   ├── soc.hpp               ← Soc (entity graph + assembly) + IrqBridge
│   ├── execution_engine.hpp  ← ExecutionEngine (cores, run loop, clock, JIT, MT)
│   ├── debug_server.hpp      ← DebugServer (BreakpointSet + GdbStub lifetime)
│   ├── cpu.hpp               ← SparcCpu / Leon3 / Leon4 / GenericCpu entities
│   ├── architecture_factory.hpp ← make_architecture(CpuArch) → ir::IArchitecture
│   ├── emulator_config.hpp   ← EmulatorConfig, PacingMode, ExecutionMode, CpuArch
│   ├── peripheral_spec.hpp   ← PeripheralSpec / Connection / PeripheralFactory
│   ├── plugin_spec.hpp       ← PluginSpec / PluginFactory (host-facing plugins)
│   ├── soc_family.hpp        ← SocFamily enum + per-SoC base-address constants
│   ├── run_result.hpp        ← RunResult + HaltReason
│   ├── event_scheduler.hpp   ← EventScheduler (min-heap, header-only)
│   ├── elf_loader.hpp        ← load_elf_to_bus / flatten_elf_to_prom / is_elf32
│   ├── cpu_bus_bridge.hpp    ← CpuBusBridge (ICpuBus over SystemBus)
│   ├── config_validate.hpp   ← validate_emulator_config()
│   ├── pnp_table.hpp         ← AMBA PnP scratch builder (hardcoded per-SoC path)
│   ├── pnp_placement.hpp     ← per-device PnP placement for composed machines
│   ├── gdb_stub.hpp          ← GdbStub + RSP codec + rtems_layout offsets
│   └── version.hpp
└── src/
    ├── emulator.cpp          ← facade: create/initialize/load + delegation
    ├── soc.cpp               ← Soc::build assembly (RAM/PROM/specs/plugins/PnP)
    ├── execution_engine.cpp  ← engine initialize/reset, image entry, accessors
    ├── engine_run_loop.cpp   ← pacing, round-robin round, quantum dispatch
    ├── engine_translate.cpp  ← JIT quantum, IR-interpret quantum, region build
    ├── engine_irq_time.cpp   ← interrupt sampling, up-counter re-base
    ├── engine_mt.cpp         ← MultiThread workers + barriers (ADR-001)
    ├── engine_oracle.cpp     ← IR-vs-core::step oracle-lockstep harness
    ├── debug_server.cpp  cpu.cpp  architecture_factory.cpp
    ├── emulator_config.cpp  config_validate.cpp
    ├── cpu_bus_bridge.cpp  elf_loader.cpp  pnp_table.cpp
    ├── gdb_codec.cpp  gdb_stub.cpp  gdb_stub_transport.cpp
    └── version.cpp

Emulator — the public API

Emulator (emulator.hpp:29) is the entry point. It is created through a factory that validates the config; it is non-copyable and non-movable (own it through the returned unique_ptr). It is a facade: it owns a Soc, a DebugServer, and an ExecutionEngine (emulator.hpp:253,258,266) and forwards run / state / memory / debug calls to them — see runtime decomposition for the ownership and teardown contract.

static Result<std::unique_ptr<Emulator>> create(EmulatorConfig cfg);
~Emulator();

Service injection (before initialize())

Method Effect
set_logger(unique_ptr<ILogger>) replace the default StdoutLogger
set_character_device(dev) set the console UART0 chardev (= set_uart_character_device(0, dev))
set_uart_character_device(index, dev) set the chardev for any UART (index < config().uarts); nullptr silences it
set_observer(unique_ptr<IEmulatorObserver>) install IRQ/trap/attach/per-instruction hooks; nullptr removes

Lifecycle and run loop

Result<void> initialize();        // wire RAM, PROM, peripherals, PnP, bus routing, JIT
Result<void> reset();             // power-on reset (keeps the loaded image)
RunResult    run_for  (SimTimeNs duration);
RunResult    run_until(SimTimeNs deadline);
RunResult    single_step(CoreId core);          // exactly one instruction, no IRQ sampling
SimTimeNs    current_sim_time() const noexcept;

run_for/run_until honour config().pacing: Realtime slices the run into pacing_slice_ns chunks and sleeps on steady_clock; Turbo free-runs without ever reading the host clock. The facade delegates both to the ExecutionEngine, where run_until_unpaced (execution_engine.hpp:198) drives the round-robin (or thread-per-core) loop; run_core_quantum / run_core_batch execute one core's quantum via the powered-down fast-advance, the tiered-JIT/IR path (run_ir_quantum, execution_engine.hpp:200), the universal IR-interpret path (run_ir_interpret_quantum, execution_engine.hpp:210), or the per-core Switch path.

Image loading

Result<void> load_elf(const std::filesystem::path&);                       // SPARC BE ET_EXEC
Result<void> load_binary(PhysAddr base, std::span<const std::byte>, PhysAddr entry);
Result<void> load_ram_image(PhysAddr addr, std::span<const std::byte>);    // no CPU side effects
Result<void> load_ram_image_from_file(PhysAddr addr, const std::filesystem::path&);

External memory access

Result<void>          read_physical (PhysAddr, std::span<std::byte>);
Result<void>          write_physical(PhysAddr, std::span<const std::byte>);
Result<std::uint32_t> read_physical_u32 (PhysAddr);
Result<void>          write_physical_u32(PhysAddr, std::uint32_t);

These bypass the MMU (there is none yet). Virtual-address accessors are on the public-API roadmap; today the CPU's own read_virtual/write_virtual go through CpuBusBridge. A successful external write also flushes the translated-code caches (emulator.cpp:262-264) — an external writer issues no guest FLUSH, so the engine assumes translated code may be stale.

Peripherals, scheduling, introspection

Result<void> add_peripheral(std::unique_ptr<IPeripheral>, IrqLine);  // post-initialize sugar
void         schedule_event(SimTimeNs when, IEvent*);

const core::CpuState& core(std::size_t idx) const;    // SPARC lens over the blob
const ir::GuestState& core_blob(std::size_t idx) const;     // canonical register blob
const CoreControl&    core_control(std::size_t idx) const;  // run latches
IEntity&              cpu(std::size_t idx);            // CPU entity (ICpu / ISparc)
std::size_t           num_cores() const noexcept;
const EmulatorConfig& config() const noexcept;
bus::SystemBus&       bus() noexcept;
GdbStub*              gdb_stub() noexcept;            // nullptr unless a stub is bound
ICharacterDevice*     character_device() noexcept;    // UART0 chardev
ICharacterDevice*     uart_character_device(std::size_t) noexcept;
BreakpointSet&        breakpoints() noexcept;

add_peripheral is sugar for adding a device after initialize() (REPL / test usage); it delegates to Soc::add_peripheral (soc.hpp:122), which allocates the IrqBridge, attaches the peripheral, and maps its MMIO. For statically assembled configs, prefer the PeripheralSpec form in EmulatorConfig::peripherals. core(idx) is SPARC-typed; core_blob / cpu are the architecture-neutral views over the same state (emulator.hpp:196-216).

Diagnostic harnesses

The runtime exposes an IR-vs-Switch validator used by the --oracle-lockstep CLI mode: run_oracle_lockstep(deadline) returns the first OracleDivergence (execution_engine.hpp:83, aliased at emulator.hpp:101), replaying the production SMP round-robin while validating every clean-exit IR block against core::step on a scratch copy (engine_oracle.cpp). Turbo + translation only. The --lockstep CLI mode is implemented in tero_app instead: it runs two full Emulator instances (Switch vs IR) and diffs the GuestState blobs each instruction.

EmulatorConfig and the SoC kits

EmulatorConfig (emulator_config.hpp:117) is a plain struct (config by struct, not by file). Key fields:

Field Default Meaning
soc_family Gr712rc which SoC family the kits wire
num_cores 1 LEON cores to instantiate
ram_base / ram_size 0x40000000 / 16 MiB RAM region
cpu_clock_hz 50 MHz (kits: 80 / 250 MHz) the primary frequency knob
cpi 1.0 global cycles-per-instruction scalar
ns_per_insn 20 derived cpi * ns_per_cycle(clock) (recomputed by create)
quantum 1000 instructions per core per scheduling round
quantum_batch 1 MultiThread quanta per cross-core barrier (Phase 14)
time_advance Concurrent per-core delta fold: Concurrent (max) vs Sum (ADR-005)
pacing Realtime Realtime vs Turbo time advance
execution_mode SingleThread round-robin vs thread-per-core (ADR-001)
translation true binary translation (JIT/IR) vs the Switch interpreter
arch / arch_factory Sparc / empty guest ISA selector / out-of-tree ir::IArchitecture override (S11)
force_ir_interpret false route the SPARC interpret path through the IR interpreter (S9 validation)
jit_baseline_threshold / jit_promotion_threshold 32 / 100 interpret-first / O2-promotion thresholds (ADR-002)
jit_background_opt / jit_max_region_blocks true / 8 background O2 tier; max blocks fused per region
gdb_stub_port / gdb_stub_wait_for_client 0 / false GDB stub binding
prom_base / prom_size / prom_image_* / prom_fill / reset_pc 0x0 / 32 MiB / … boot ROM
peripherals {} declarative PeripheralSpec list
buses {} declarative shared comms-bus media (CAN / SPI / 1553), built with can_bus() / spi_bus() / mil_std_bus()
plugins {} declarative PluginSpec list (host-facing observers)
character_devices {} non-owning ICharacterDevice* pool, indexed by chardev_index
pnp_placement empty per-device PnP placement for composed machines; empty → the hardcoded pnp_table path

The timing helpers ns_per_cycle(clock_hz) and ns_per_insn_for(clock_hz, cpi) (emulator_config.hpp:93,103) are the only place CPI enters the model; peripheral/timer clocks stay on ns_per_cycle (unaffected by cpi).

Frequency-independent direct-ELF guests

The kits model real silicon frequencies (GR712RC 80 MHz, GR740 250 MHz). A direct-ELF guest needs no rebuild when the clock changes: initialize() simulates the bootloader and re-derives the GPTIMER scaler so RTEMS still sees a 1 MHz timer tick. Only mkprom2 guests bake a fixed -freq and must be wrapped to match.

The ready-to-run configs are kitsMachine compositions in tero_compose (src/compose/kits.cpp), no longer functions in this module: tero::compose::gr712rc_config() (2 cores) and tero::compose::gr740_config() (4 cores), plus their *_uniprocessor_config() variants. What stays in tero_runtime are the BusSpec helpers can_bus() / spi_bus() / mil_std_bus() (emulator_config.cpp:14-28).

SocFamily (soc_family.hpp:10) and the per-SoC base-address constants (namespace gr712rc / gr740) live in soc_family.hpp.

PeripheralSpec — declarative device assembly

PeripheralSpec (peripheral_spec.hpp:63) is the data the Soc reads to build a peripheral during Soc::build:

Field Meaning
instance_name unique, non-empty; used in Connection, observer events, logs
factory PeripheralFactory lambda, run once during initialize()
irqs IRQ lines (each [1,31]); one IrqBridge per entry, mask 1u << irq
chardev_index optional index into character_devices
connections Connection edges (from_slotpeer.peer_slot), resolved at connect_ports

The IrqBridge (soc.hpp:49) is the SoC-agnostic adapter that converts an IInterruptSource::raise()/lower() into the IInterruptController::external_assert/clear(mask) of whichever controller (IrqMP / IrqAMP) was wired at build.

config_validate

validate_emulator_config(cfg, logger = nullptr) (config_validate.hpp:45) runs at Emulator::create time, before any factory. It enforces: non-empty instance names, non-null factories, unique names, IRQ range [1,31], chardev_index bounds, and Connection field non-emptiness + peer existence. It returns ErrorCode::InvalidConfig on the first violation. Port name resolution and MMIO-overlap checking happen later (at initialize() / map_peripheral), since they need the peripheral object.

EventScheduler

EventScheduler (event_scheduler.hpp:22) is a min-heap of (SimTimeNs when, IEvent*), implementing IScheduler. The ExecutionEngine owns it, and it is live from the engine's construction so Soc::build can hand it to peripherals before engine_.initialize runs (execution_engine.hpp:167-169). The run loop calls fire_pending(now) to dispatch all matured events in chronological order (events may re-schedule themselves); next_event_time() powers idle-time skipping — when all cores idle, the clock jumps straight to the next event. Not thread-safe (the round-boundary work is serialised).

CpuBusBridge

CpuBusBridge (cpu_bus_bridge.hpp:16) adapts bus::SystemBus (physical, byte-oriented) to ICpuBus (virtual, instruction-aware). With no MMU, it is a 1:1 passthrough; when SRMMU lands this is the natural injection point for translation + fault generation. Two things make it more than a passthrough:

  • It overrides the ICpuBus atomics (atomic_swap_u32, atomic_cas_u32, atomic_ldstub) with a true atomic RMW on the backing store — correct once cores run on separate host threads.
  • It owns a per-core SystemBus::RamFastCache (cpu_bus_bridge.hpp:43): there is one bridge per core (Phase 13 Inc 5b), so the RAM fast-path cache is never shared across host threads.

ELF loader

elf_loader.hpp parses SPARC big-endian ELF32:

  • load_elf_to_bus(path, bus) (elf_loader.hpp:27) validates ELFCLASS32 / ELFDATA2MSB / EM_SPARC / executable, copies each PT_LOAD segment's p_filesz bytes to PhysAddr{p_paddr}, zero-fills the BSS tail, and returns {entry_point, lowest/highest loaded addr}. Errors map to ErrorCode::ElfLoadError.
  • is_elf32(blob) (elf_loader.hpp:34) discriminates a flat PROM image from an mkprom2 ELF wrapper.
  • flatten_elf_to_prom(elf_blob, prom_base, prom_buffer) (elf_loader.hpp:52) flattens an mkprom2 .rom (always ELF) into a PROM-shaped buffer by LMA — mirroring how GRMON / real silicon load the same ELF. Emulator::initialize() calls this to consume mkprom2 output directly.

PnP table

pnp_table.hpp builds the GRLIB AMBA Plug&Play scratch areas the RTEMS ambapp_scan driver walks. pnp_ram_regions(cfg) (pnp_table.hpp:33) returns the regions to map (AHB-PnP at 0xFFFFF000, plus one APB-PnP per bridge: GR740 one, GR712RC two); write_pnp_entries populates them with master/slave entries, deriving each device's identity from the live entities through IAmbaPnp (soc.cpp:305-308). For a machine composed through tero_compose, EmulatorConfig::pnp_placement (pnp_placement.hpp) carries per-device placement and the table is derived from the composed graph instead; the kits leave it empty and run the hardcoded path byte-identical.

GDB stub

GdbStub (gdb_stub.hpp:247) turns Tero into a target for sparc-rtems5-gdb over TCP, speaking the RSP subset RTEMS debugging needs. The user-facing reference (packets, attach flow, signal mapping) is the Debugging with GDB guide; this is the module structure.

File Responsibility
gdb_stub.hpp public API, RSP codec helpers, StopSignal, signal_from_tt, ResumeAction, rtems_layout::*
gdb_codec.cpp encode_packet/decode_packet, hex parsing, run-length expansion
gdb_stub.cpp packet dispatch + handlers (?, g/G, m/M, Z/z, H, qC, qfThreadInfo/qsThreadInfo, qSymbol, qThreadExtraInfo, c/s/D/k)
gdb_stub_transport.cpp TCP listener (start_listening, wait_for_client, poll_accept), packet I/O, RTEMS RAM reads, report_error_mode, send_stop_reply

The stub's lifetime and the shared BreakpointSet live on DebugServer (debug_server.hpp:31); Emulator::initialize calls DebugServer::start last, after the SoC and the engine are up (emulator.cpp:172).

Integration with the run loop (engine_run_loop.cpp):

  • run_until_unpaced calls poll_accept() once per quantum for late-binding attaches; a fresh accept returns HaltReason::Breakpoint so the CLI can drive process_until_resume().
  • On a core entering error mode with a client attached, report_error_mode(core) arms a T<sig> reply (signal_from_tt(TBR.tt)) and the loop returns Breakpoint instead of HaltedMode. A per-client latch prevents re-notifying.
  • Per-step should_break(core, pc) is the hot-path hook: software breakpoint set membership, single-step completion, and a once-per-quantum Ctrl-C poll.

RTEMS thread-awareness (rtems_layout::*, gdb_stub.hpp:103-191) reads thread state directly from guest RAM using offsets verified against RCC 1.3.2 / RTEMS 5.3 via DWARF. The two-round qSymbol handshake resolves _Per_CPU_Information (executing-thread enumeration) and _Objects_Information_table (full Classic-task enumeration via enumerate_classic_tasks). Every read is validated; on a check failure the stub degrades to the per-core thread model rather than reporting garbage. All thread-awareness state lives on GdbStub and resets on client detach.

GDB forces the Switch path

With a stub attached, the translation path stays native until a breakpoint, then single-steps the breakpoint-bearing block via run_ir_quantum. MultiThread dispatch is bypassed whenever a stub (or a trace observer) is attached — mt_dispatch_now() returns false (engine_mt.cpp:17-20) — so the debug control flow stays serial and exact.

RunResult

enum class HaltReason : std::uint8_t {  // run_result.hpp:10
    DurationExpired,   // run_for budget reached
    DeadlineReached,   // run_until deadline reached
    HaltedMode,        // a guest core took a trap with ET=0 (deliberate shutdown / fault)
    ErrorMode,         // internal emulator error (reserved; distinct from HaltedMode)
    Breakpoint,        // GDB stub stopped (breakpoint / single-step / attach)
};

struct RunResult {                      // run_result.hpp:29
    std::uint64_t instructions_executed{0};   // across all cores
    SimTimeNs     time_elapsed{SimTimeNs{0}};
    HaltReason    reason{HaltReason::DurationExpired};
};

HaltedMode is the guest's doing, not a Tero failure

A SPARC core that takes a trap while ET=0 halts — which is what RTEMS _CPU_Fatal_halt / _exit do via ta 0. The emulator behaved correctly and simply has nothing more to run. ErrorMode is reserved for an internal emulator error. The CLI loop pattern is while (run_until(...).reason == Breakpoint) stub->process_until_resume();.

Execution-mode and multithread machinery

PacingMode (emulator_config.hpp:29) and ExecutionMode (emulator_config.hpp:44) are runtime fields, not compile flags. Under MultiThread the ExecutionEngine spawns one worker thread per core 1..N-1 (main thread runs core 0 and all serialized round-boundary work), synchronised by two std::barriers, with per-core CpuBusBridge, ir_cache_, tiered_jit_, and ir_interp_ so concurrent cores never share mutable translation state (execution_engine.hpp:282-320). start_workers / stop_workers / worker_loop (engine_mt.cpp) manage the pool; request_code_flush latches a FLUSH for the main thread to drain at the serial boundary. See multicore and timing.

What is intentionally not in tero_runtime

  • No CLI parsing — that is tero_app.
  • No file format other than ELF (raw images go through load_binary / load_ram_image).
  • No host-service implementations — the runtime uses the defaults, it does not define them.

See also