tero_runtime¶
The orchestration layer. It owns the Emulator facade and the three
subsystems it delegates to — Soc (entity graph), ExecutionEngine (cores
+ run loop), DebugServer (GDB stub + breakpoints) — plus everything needed
to assemble a working SoC from an EmulatorConfig: the event scheduler, the
ELF loader, the CPU↔bus bridge, the AMBA Plug&Play table builder, the config
validator, and the GDB remote stub. It is the only module that sees
every other layer. The facade/subsystem split is
runtime decomposition.
# src/runtime/CMakeLists.txt
target_link_libraries(tero_runtime
PUBLIC tero::interfaces tero::bus tero::peripherals tero::jit
PRIVATE tero::core tero::defaults tero::arch_sparc tero::warnings)
# tero::core is PRIVATE (arch-decoupling L14): the public headers only
# forward-declare core::CpuState for the SPARC-only core(idx) accessor.
Responsibility
Compose the cores, bus, peripherals, scheduler, and translation engine into one runnable machine; advance simulated time (round-robin or thread-per-core); load images; expose memory + debug access. No CLI, no host I/O of its own — it is a user of the defaults.
Source layout¶
src/runtime/
├── include/tero/runtime/
│ ├── emulator.hpp ← the public Emulator facade
│ ├── soc.hpp ← Soc (entity graph + assembly) + IrqBridge
│ ├── execution_engine.hpp ← ExecutionEngine (cores, run loop, clock, JIT, MT)
│ ├── debug_server.hpp ← DebugServer (BreakpointSet + GdbStub lifetime)
│ ├── cpu.hpp ← SparcCpu / Leon3 / Leon4 / GenericCpu entities
│ ├── architecture_factory.hpp ← make_architecture(CpuArch) → ir::IArchitecture
│ ├── emulator_config.hpp ← EmulatorConfig, PacingMode, ExecutionMode, CpuArch
│ ├── peripheral_spec.hpp ← PeripheralSpec / Connection / PeripheralFactory
│ ├── plugin_spec.hpp ← PluginSpec / PluginFactory (host-facing plugins)
│ ├── soc_family.hpp ← SocFamily enum + per-SoC base-address constants
│ ├── run_result.hpp ← RunResult + HaltReason
│ ├── event_scheduler.hpp ← EventScheduler (min-heap, header-only)
│ ├── elf_loader.hpp ← load_elf_to_bus / flatten_elf_to_prom / is_elf32
│ ├── cpu_bus_bridge.hpp ← CpuBusBridge (ICpuBus over SystemBus)
│ ├── config_validate.hpp ← validate_emulator_config()
│ ├── pnp_table.hpp ← AMBA PnP scratch builder (hardcoded per-SoC path)
│ ├── pnp_placement.hpp ← per-device PnP placement for composed machines
│ ├── gdb_stub.hpp ← GdbStub + RSP codec + rtems_layout offsets
│ └── version.hpp
└── src/
├── emulator.cpp ← facade: create/initialize/load + delegation
├── soc.cpp ← Soc::build assembly (RAM/PROM/specs/plugins/PnP)
├── execution_engine.cpp ← engine initialize/reset, image entry, accessors
├── engine_run_loop.cpp ← pacing, round-robin round, quantum dispatch
├── engine_translate.cpp ← JIT quantum, IR-interpret quantum, region build
├── engine_irq_time.cpp ← interrupt sampling, up-counter re-base
├── engine_mt.cpp ← MultiThread workers + barriers (ADR-001)
├── engine_oracle.cpp ← IR-vs-core::step oracle-lockstep harness
├── debug_server.cpp cpu.cpp architecture_factory.cpp
├── emulator_config.cpp config_validate.cpp
├── cpu_bus_bridge.cpp elf_loader.cpp pnp_table.cpp
├── gdb_codec.cpp gdb_stub.cpp gdb_stub_transport.cpp
└── version.cpp
Emulator — the public API¶
Emulator (emulator.hpp:29) is the entry point. It is created through a
factory that validates the config; it is non-copyable and non-movable
(own it through the returned unique_ptr). It is a facade: it owns a Soc,
a DebugServer, and an ExecutionEngine (emulator.hpp:253,258,266) and
forwards run / state / memory / debug calls to them — see
runtime decomposition for the
ownership and teardown contract.
Service injection (before initialize())¶
| Method | Effect |
|---|---|
set_logger(unique_ptr<ILogger>) |
replace the default StdoutLogger |
set_character_device(dev) |
set the console UART0 chardev (= set_uart_character_device(0, dev)) |
set_uart_character_device(index, dev) |
set the chardev for any UART (index < config().uarts); nullptr silences it |
set_observer(unique_ptr<IEmulatorObserver>) |
install IRQ/trap/attach/per-instruction hooks; nullptr removes |
Lifecycle and run loop¶
Result<void> initialize(); // wire RAM, PROM, peripherals, PnP, bus routing, JIT
Result<void> reset(); // power-on reset (keeps the loaded image)
RunResult run_for (SimTimeNs duration);
RunResult run_until(SimTimeNs deadline);
RunResult single_step(CoreId core); // exactly one instruction, no IRQ sampling
SimTimeNs current_sim_time() const noexcept;
run_for/run_until honour config().pacing: Realtime slices the run
into pacing_slice_ns chunks and sleeps on steady_clock; Turbo
free-runs without ever reading the host clock. The facade delegates both to
the ExecutionEngine, where run_until_unpaced
(execution_engine.hpp:198) drives the round-robin (or thread-per-core)
loop; run_core_quantum / run_core_batch execute one core's quantum via
the powered-down fast-advance, the tiered-JIT/IR path (run_ir_quantum,
execution_engine.hpp:200), the universal IR-interpret path
(run_ir_interpret_quantum, execution_engine.hpp:210), or the per-core
Switch path.
Image loading¶
Result<void> load_elf(const std::filesystem::path&); // SPARC BE ET_EXEC
Result<void> load_binary(PhysAddr base, std::span<const std::byte>, PhysAddr entry);
Result<void> load_ram_image(PhysAddr addr, std::span<const std::byte>); // no CPU side effects
Result<void> load_ram_image_from_file(PhysAddr addr, const std::filesystem::path&);
External memory access¶
Result<void> read_physical (PhysAddr, std::span<std::byte>);
Result<void> write_physical(PhysAddr, std::span<const std::byte>);
Result<std::uint32_t> read_physical_u32 (PhysAddr);
Result<void> write_physical_u32(PhysAddr, std::uint32_t);
These bypass the MMU (there is none yet). Virtual-address accessors are on
the public-API roadmap; today the CPU's own read_virtual/write_virtual
go through CpuBusBridge. A successful external write also flushes the
translated-code caches (emulator.cpp:262-264) — an external writer issues
no guest FLUSH, so the engine assumes translated code may be stale.
Peripherals, scheduling, introspection¶
Result<void> add_peripheral(std::unique_ptr<IPeripheral>, IrqLine); // post-initialize sugar
void schedule_event(SimTimeNs when, IEvent*);
const core::CpuState& core(std::size_t idx) const; // SPARC lens over the blob
const ir::GuestState& core_blob(std::size_t idx) const; // canonical register blob
const CoreControl& core_control(std::size_t idx) const; // run latches
IEntity& cpu(std::size_t idx); // CPU entity (ICpu / ISparc)
std::size_t num_cores() const noexcept;
const EmulatorConfig& config() const noexcept;
bus::SystemBus& bus() noexcept;
GdbStub* gdb_stub() noexcept; // nullptr unless a stub is bound
ICharacterDevice* character_device() noexcept; // UART0 chardev
ICharacterDevice* uart_character_device(std::size_t) noexcept;
BreakpointSet& breakpoints() noexcept;
add_peripheral is sugar for adding a device after initialize() (REPL
/ test usage); it delegates to Soc::add_peripheral (soc.hpp:122), which
allocates the IrqBridge, attaches the peripheral, and maps its MMIO. For
statically assembled configs, prefer the PeripheralSpec form in
EmulatorConfig::peripherals. core(idx) is SPARC-typed; core_blob /
cpu are the architecture-neutral views over the same state
(emulator.hpp:196-216).
Diagnostic harnesses¶
The runtime exposes an IR-vs-Switch validator used by the
--oracle-lockstep CLI mode: run_oracle_lockstep(deadline) returns the
first OracleDivergence (execution_engine.hpp:83, aliased at
emulator.hpp:101), replaying the production SMP round-robin while
validating every clean-exit IR block against core::step on a scratch copy
(engine_oracle.cpp). Turbo + translation only. The --lockstep CLI mode
is implemented in tero_app instead: it runs two full Emulator instances
(Switch vs IR) and diffs the GuestState blobs each instruction.
EmulatorConfig and the SoC kits¶
EmulatorConfig (emulator_config.hpp:117) is a plain struct (config by
struct, not by file). Key fields:
| Field | Default | Meaning |
|---|---|---|
soc_family |
Gr712rc |
which SoC family the kits wire |
num_cores |
1 | LEON cores to instantiate |
ram_base / ram_size |
0x40000000 / 16 MiB |
RAM region |
cpu_clock_hz |
50 MHz (kits: 80 / 250 MHz) | the primary frequency knob |
cpi |
1.0 | global cycles-per-instruction scalar |
ns_per_insn |
20 | derived cpi * ns_per_cycle(clock) (recomputed by create) |
quantum |
1000 | instructions per core per scheduling round |
quantum_batch |
1 | MultiThread quanta per cross-core barrier (Phase 14) |
time_advance |
Concurrent |
per-core delta fold: Concurrent (max) vs Sum (ADR-005) |
pacing |
Realtime |
Realtime vs Turbo time advance |
execution_mode |
SingleThread |
round-robin vs thread-per-core (ADR-001) |
translation |
true |
binary translation (JIT/IR) vs the Switch interpreter |
arch / arch_factory |
Sparc / empty |
guest ISA selector / out-of-tree ir::IArchitecture override (S11) |
force_ir_interpret |
false |
route the SPARC interpret path through the IR interpreter (S9 validation) |
jit_baseline_threshold / jit_promotion_threshold |
32 / 100 | interpret-first / O2-promotion thresholds (ADR-002) |
jit_background_opt / jit_max_region_blocks |
true / 8 | background O2 tier; max blocks fused per region |
gdb_stub_port / gdb_stub_wait_for_client |
0 / false | GDB stub binding |
prom_base / prom_size / prom_image_* / prom_fill / reset_pc |
0x0 / 32 MiB / … |
boot ROM |
peripherals |
{} |
declarative PeripheralSpec list |
buses |
{} |
declarative shared comms-bus media (CAN / SPI / 1553), built with can_bus() / spi_bus() / mil_std_bus() |
plugins |
{} |
declarative PluginSpec list (host-facing observers) |
character_devices |
{} |
non-owning ICharacterDevice* pool, indexed by chardev_index |
pnp_placement |
empty | per-device PnP placement for composed machines; empty → the hardcoded pnp_table path |
The timing helpers ns_per_cycle(clock_hz) and ns_per_insn_for(clock_hz,
cpi) (emulator_config.hpp:93,103) are the only place CPI enters the
model; peripheral/timer clocks stay on ns_per_cycle (unaffected by
cpi).
Frequency-independent direct-ELF guests
The kits model real silicon frequencies (GR712RC 80 MHz, GR740 250
MHz). A direct-ELF guest needs no rebuild when the clock changes:
initialize() simulates the bootloader and re-derives the GPTIMER
scaler so RTEMS still sees a 1 MHz timer tick. Only mkprom2 guests bake
a fixed -freq and must be wrapped to match.
The ready-to-run configs are kits — Machine compositions in
tero_compose (src/compose/kits.cpp), no longer functions in this
module: tero::compose::gr712rc_config() (2 cores) and
tero::compose::gr740_config() (4 cores), plus their
*_uniprocessor_config() variants. What stays in tero_runtime are the
BusSpec helpers can_bus() / spi_bus() / mil_std_bus()
(emulator_config.cpp:14-28).
SocFamily (soc_family.hpp:10) and the per-SoC base-address constants
(namespace gr712rc / gr740) live in soc_family.hpp.
PeripheralSpec — declarative device assembly¶
PeripheralSpec (peripheral_spec.hpp:63) is the data the Soc reads to
build a peripheral during Soc::build:
| Field | Meaning |
|---|---|
instance_name |
unique, non-empty; used in Connection, observer events, logs |
factory |
PeripheralFactory lambda, run once during initialize() |
irqs |
IRQ lines (each [1,31]); one IrqBridge per entry, mask 1u << irq |
chardev_index |
optional index into character_devices |
connections |
Connection edges (from_slot ← peer.peer_slot), resolved at connect_ports |
The IrqBridge (soc.hpp:49) is the SoC-agnostic adapter that
converts an IInterruptSource::raise()/lower() into the
IInterruptController::external_assert/clear(mask) of whichever controller
(IrqMP / IrqAMP) was wired at build.
config_validate¶
validate_emulator_config(cfg, logger = nullptr) (config_validate.hpp:45)
runs at Emulator::create time, before any factory. It enforces:
non-empty instance names, non-null factories, unique names, IRQ range
[1,31], chardev_index bounds, and Connection field non-emptiness
+ peer existence. It returns ErrorCode::InvalidConfig on the first
violation. Port name resolution and MMIO-overlap checking happen later
(at initialize() / map_peripheral), since they need the peripheral
object.
EventScheduler¶
EventScheduler (event_scheduler.hpp:22) is a min-heap of (SimTimeNs
when, IEvent*), implementing IScheduler. The ExecutionEngine owns it,
and it is live from the engine's construction so Soc::build can hand it
to peripherals before engine_.initialize runs
(execution_engine.hpp:167-169). The run loop calls
fire_pending(now) to dispatch all matured events in chronological order
(events may re-schedule themselves); next_event_time() powers idle-time
skipping — when all cores idle, the clock jumps straight to the next
event. Not thread-safe (the round-boundary work is serialised).
CpuBusBridge¶
CpuBusBridge (cpu_bus_bridge.hpp:16) adapts bus::SystemBus
(physical, byte-oriented) to ICpuBus (virtual, instruction-aware). With
no MMU, it is a 1:1 passthrough; when SRMMU lands this is the natural
injection point for translation + fault generation. Two things make it
more than a passthrough:
- It overrides the
ICpuBusatomics (atomic_swap_u32,atomic_cas_u32,atomic_ldstub) with a true atomic RMW on the backing store — correct once cores run on separate host threads. - It owns a per-core
SystemBus::RamFastCache(cpu_bus_bridge.hpp:43): there is one bridge per core (Phase 13 Inc 5b), so the RAM fast-path cache is never shared across host threads.
ELF loader¶
elf_loader.hpp parses SPARC big-endian ELF32:
load_elf_to_bus(path, bus)(elf_loader.hpp:27) validatesELFCLASS32/ELFDATA2MSB/EM_SPARC/ executable, copies eachPT_LOADsegment'sp_fileszbytes toPhysAddr{p_paddr}, zero-fills the BSS tail, and returns{entry_point, lowest/highest loaded addr}. Errors map toErrorCode::ElfLoadError.is_elf32(blob)(elf_loader.hpp:34) discriminates a flat PROM image from an mkprom2 ELF wrapper.flatten_elf_to_prom(elf_blob, prom_base, prom_buffer)(elf_loader.hpp:52) flattens an mkprom2.rom(always ELF) into a PROM-shaped buffer by LMA — mirroring how GRMON / real silicon load the same ELF.Emulator::initialize()calls this to consume mkprom2 output directly.
PnP table¶
pnp_table.hpp builds the GRLIB AMBA Plug&Play scratch areas the RTEMS
ambapp_scan driver walks. pnp_ram_regions(cfg) (pnp_table.hpp:33)
returns the regions to map (AHB-PnP at 0xFFFFF000, plus one APB-PnP per
bridge: GR740 one, GR712RC two); write_pnp_entries populates them with
master/slave entries, deriving each device's identity from the live entities
through IAmbaPnp (soc.cpp:305-308). For a machine composed through
tero_compose, EmulatorConfig::pnp_placement (pnp_placement.hpp)
carries per-device placement and the table is derived from the composed
graph instead; the kits leave it empty and run the hardcoded path
byte-identical.
GDB stub¶
GdbStub (gdb_stub.hpp:247) turns Tero into a target for
sparc-rtems5-gdb over TCP, speaking the RSP subset RTEMS debugging
needs. The user-facing reference (packets, attach flow, signal mapping)
is the Debugging with GDB guide; this is the
module structure.
| File | Responsibility |
|---|---|
gdb_stub.hpp |
public API, RSP codec helpers, StopSignal, signal_from_tt, ResumeAction, rtems_layout::* |
gdb_codec.cpp |
encode_packet/decode_packet, hex parsing, run-length expansion |
gdb_stub.cpp |
packet dispatch + handlers (?, g/G, m/M, Z/z, H, qC, qfThreadInfo/qsThreadInfo, qSymbol, qThreadExtraInfo, c/s/D/k) |
gdb_stub_transport.cpp |
TCP listener (start_listening, wait_for_client, poll_accept), packet I/O, RTEMS RAM reads, report_error_mode, send_stop_reply |
The stub's lifetime and the shared BreakpointSet live on DebugServer
(debug_server.hpp:31); Emulator::initialize calls DebugServer::start
last, after the SoC and the engine are up (emulator.cpp:172).
Integration with the run loop (engine_run_loop.cpp):
run_until_unpacedcallspoll_accept()once per quantum for late-binding attaches; a fresh accept returnsHaltReason::Breakpointso the CLI can driveprocess_until_resume().- On a core entering error mode with a client attached,
report_error_mode(core)arms aT<sig>reply (signal_from_tt(TBR.tt)) and the loop returnsBreakpointinstead ofHaltedMode. A per-client latch prevents re-notifying. - Per-step
should_break(core, pc)is the hot-path hook: software breakpoint set membership, single-step completion, and a once-per-quantum Ctrl-C poll.
RTEMS thread-awareness (rtems_layout::*, gdb_stub.hpp:103-191) reads
thread state directly from guest RAM using offsets verified against RCC
1.3.2 / RTEMS 5.3 via DWARF. The two-round qSymbol handshake resolves
_Per_CPU_Information (executing-thread enumeration) and
_Objects_Information_table (full Classic-task enumeration via
enumerate_classic_tasks). Every read is validated; on a check failure
the stub degrades to the per-core thread model rather than reporting
garbage. All thread-awareness state lives on GdbStub and resets on
client detach.
GDB forces the Switch path
With a stub attached, the translation path stays native until a
breakpoint, then single-steps the breakpoint-bearing block via
run_ir_quantum. MultiThread dispatch is bypassed whenever a stub (or
a trace observer) is attached — mt_dispatch_now() returns false
(engine_mt.cpp:17-20) — so the debug control flow stays serial and
exact.
RunResult¶
enum class HaltReason : std::uint8_t { // run_result.hpp:10
DurationExpired, // run_for budget reached
DeadlineReached, // run_until deadline reached
HaltedMode, // a guest core took a trap with ET=0 (deliberate shutdown / fault)
ErrorMode, // internal emulator error (reserved; distinct from HaltedMode)
Breakpoint, // GDB stub stopped (breakpoint / single-step / attach)
};
struct RunResult { // run_result.hpp:29
std::uint64_t instructions_executed{0}; // across all cores
SimTimeNs time_elapsed{SimTimeNs{0}};
HaltReason reason{HaltReason::DurationExpired};
};
HaltedMode is the guest's doing, not a Tero failure
A SPARC core that takes a trap while ET=0 halts — which is what RTEMS
_CPU_Fatal_halt / _exit do via ta 0. The emulator behaved
correctly and simply has nothing more to run. ErrorMode is reserved
for an internal emulator error. The CLI loop pattern is
while (run_until(...).reason == Breakpoint) stub->process_until_resume();.
Execution-mode and multithread machinery¶
PacingMode (emulator_config.hpp:29) and ExecutionMode
(emulator_config.hpp:44) are runtime fields, not compile flags. Under
MultiThread the ExecutionEngine spawns one worker thread per core
1..N-1 (main thread runs core 0 and all serialized round-boundary work),
synchronised by two std::barriers, with per-core CpuBusBridge,
ir_cache_, tiered_jit_, and ir_interp_ so concurrent cores never
share mutable translation state (execution_engine.hpp:282-320).
start_workers / stop_workers / worker_loop (engine_mt.cpp) manage
the pool; request_code_flush latches a FLUSH for the main thread to drain
at the serial boundary. See
multicore and timing.
What is intentionally not in tero_runtime¶
- No CLI parsing — that is
tero_app. - No file format other than ELF (raw images go through
load_binary/load_ram_image). - No host-service implementations — the runtime uses the defaults, it does not define them.
See also¶
- Architecture: runtime decomposition — the Emulator/Soc/ExecutionEngine/DebugServer split, init order, teardown.
- Architecture: execution model — the run loop, quanta, pacing, idle-skip.
- Architecture: multicore and timing — round-robin vs thread-per-core, the up-counter, CPI.
- Core · Bus · Peripherals · Defaults — the layers the runtime composes.
- Configuration guide · CLI reference · Debugging with GDB.