Configuration¶
The Emulator is configured through a single plain-old struct,
tero::runtime::EmulatorConfig, defined in
src/runtime/include/tero/runtime/emulator_config.hpp.
There are no config files and no global state — the client (the CLI, your
program, or a future SMP2 wrapper) builds this struct from whatever
source it likes. This is the configuration-by-struct principle.
The recommended pattern is start from a kit, then override:
auto cfg = tero::compose::gr712rc_config(); // full GR712RC board
cfg.num_cores = 2;
cfg.cpu_clock_hz = 100'000'000; // 100 MHz
cfg.pacing = tero::runtime::PacingMode::Turbo;
auto emu = tero::runtime::Emulator::create(cfg); // validates the result
The kit pattern¶
A kit is a function returning an EmulatorConfig pre-populated for a
specific SoC: tero::compose::gr712rc_config() / gr740_config()
(header tero/compose/kits.hpp, target tero_compose). Each kit is a
~30-line loader that parses an embedded copy of the corresponding .tero
machine-script file (src/compose/machines/gr712rc.tero /
src/compose/machines/gr740.tero). Those files are also installed as
share/tero/machines/gr712rc.tero and share/tero/machines/gr740.tero
for copy-and-derive use. The old hand-written runtime recipes
(tero::runtime::gr712rc_config() etc.) were deleted; the kits are
their drop-in replacement. You can inspect, modify, replace, or extend
any entry of the returned config.
| Kit | Cores | RAM | Clock | PROM | Notes |
|---|---|---|---|---|---|
gr712rc_config() |
2 | 16 MiB @ 0x40000000 |
80 MHz | 32 MiB @ 0x0 |
Dual-core LEON3FT. Full silicon set: IRQMP, MemCtrl, GPTimer, 6× APBUART, 2× GRGPIO, 2× OCCAN, CANMUX, 6× GRSPW2, SPICTRL, B1553BRM, AHBSTAT, GRTIMER, clock-gate/GPR façades. |
gr712rc_uniprocessor_config() |
1 | as above | as above | as above | Same layout, single core. |
gr740_config() |
4 | 256 MiB @ 0x00000000 |
250 MHz | none | Quad-core LEON4FT. Full silicon set: IRQAMP, MemCtrl, 5× GPTIMER, 2× APBUART, 2× GRGPIO, 2× GRCAN, SPWROUTER, SPICTRL, GR1553B, 2× AHBSTAT, MEMSCRUB, L2CACHE/GRIOMMU façades, and more. |
gr740_uniprocessor_config() |
1 | as above | as above | none | Same layout, single core. |
For the GR712RC peripheral/IRQ map a kit builds, see GR712RC reference; the full per-kit instance tables are in the tero_compose module reference. For how peripherals are declared and wired, see the peripheral reference.
Other ways to build the config¶
The kit is one of four routes to an EmulatorConfig: kit as-is, kit +
modification, a .tero board script (tero-emu --machine), or a
hand-written struct. The four are compared, with examples, in
Assembling a machine.
Fields¶
Every field below is a member of EmulatorConfig. Defaults are the bare
struct defaults; the kits override some of them (noted where relevant).
SoC and memory¶
| Field | Type | Default | Meaning |
|---|---|---|---|
soc_family |
SocFamily |
Gr712rc |
Which SoC to model (Gr712rc / Gr740). Drives IRQ-controller selection and the kit-built peripheral map. |
num_cores |
uint32_t |
1 |
Number of LEON cores. GR712RC: 1–2; GR740: 1–4. Secondary cores boot in power-down (released later by software via the IRQ(A)MP). |
ram_base |
uint32_t |
0x40000000 |
Physical base address of RAM. GR740 kit uses 0x00000000. |
ram_size |
uint32_t |
16 MiB |
RAM size in bytes. GR740 kit uses 256 MiB. |
reset_pc |
uint32_t |
0x00000000 |
Initial PC for every core after reset (matches LEON3 reset). With PROM enabled, address 0 is in PROM. |
entry_point_override |
uint32_t |
0 |
If non-zero, set the PC here instead of the ELF e_entry. |
Timing and clock¶
| Field | Type | Default | Meaning |
|---|---|---|---|
cpu_clock_hz |
uint64_t |
50_000_000 |
System (CPU/AHB/peripheral) clock in Hz. Primary frequency knob — ns_per_insn and the GPTIMER prescaler both derive from it. The bare struct default is 50 MHz; the kits override it (GR712RC 80 MHz, GR740 250 MHz). |
cpi |
double |
1.0 |
Cycles per instruction (TEMU-style; must be > 0). 1.0 = one cycle each. Raise to slow the core in simulated time; set below 1 for an IPC target (cpi = 1/ipc). Only the CPU's sim-time advance scales — peripheral/timer clocks stay on the bus clock. |
ns_per_insn |
uint64_t |
20 |
Simulated ns advanced per executed instruction. Derived as cpi × ns_per_cycle(cpu_clock_hz) and recomputed by Emulator::create, so you normally set cpu_clock_hz/cpi and leave this alone. Consumed on the hot path. |
pacing |
PacingMode |
Realtime |
How simulated time tracks wall-clock time. See Pacing. |
pacing_slice_ns |
uint64_t |
10_000_000 |
In Realtime mode, the simulated-time chunk between two sleep_until calls (10 ms). Ignored in Turbo. |
time_advance |
TimeAdvance |
Concurrent |
How per-core simulated-time deltas fold into the global clock each round. Concurrent (max — shared timeline, each core at its full rated clock, ADR-005) or Sum (accumulate — the historical bit-exact model). Identical for N=1. See Multi-core and timing. |
Scheduling and execution¶
| Field | Type | Default | Meaning |
|---|---|---|---|
quantum |
uint32_t |
1000 |
Instructions each core executes per scheduling round. Smaller = finer cross-core interleaving at higher overhead. |
quantum_batch |
uint32_t |
1 |
MultiThread only: how many quanta a worker runs back-to-back before the cross-core barrier (IRQ delivery, scheduler, peripheral tick). Larger amortises the barrier (more throughput) at coarser inter-processor IRQ/event delivery. No effect in SingleThread. |
execution_mode |
ExecutionMode |
SingleThread |
How simulated cores map onto host threads (ADR-001). See Execution mode. |
translation |
bool |
true |
Execution method. true = binary translation (IR → tiered LLVM JIT, IR interpreter fallback); false = the core::step Switch interpreter (reference/oracle). Runtime-selected — no rebuild. See Execution method. |
arch |
CpuArch |
Sparc |
Guest ISA the execution engine translates and runs. Orthogonal to soc_family (both SoCs are Sparc). Sparc is the only enumerator today. |
arch_factory |
function<unique_ptr<IArchitecture>()> |
empty | Optional architecture override: when set, the engine builds the guest ir::IArchitecture from this factory instead of the in-tree make_architecture(arch) and treats it as a non-SPARC frontend (universal IR interpret path). Production leaves it empty. |
force_ir_interpret |
bool |
false |
Route the non-JIT interpret path through the arch-neutral IR interpreter instead of the SPARC core::step oracle. Only meaningful when translation is off (or an observer forces interpretation); a differential-validation tool. |
JIT tier knobs (ignored unless translation)¶
| Field | Type | Default | Meaning |
|---|---|---|---|
jit_baseline_threshold |
uint32_t |
32 |
Executions of a (pc, mode) block on the IR interpreter before it is Baseline-compiled (interpret-first tiering). 0 = compile on first sight. Higher keeps cold/rare blocks on the interpreter, avoiding compile latency they never amortise (the dominant cost on boot). |
jit_promotion_threshold |
uint32_t |
100 |
Executions before a block is promoted to the Optimised (O2) tier. Lower = faster steady state, more background compiles. 0 treated as 1. |
jit_background_opt |
bool |
true |
Enable the background O2 tier. false = Baseline-only (O0, no background thread): lowest, most deterministic compile latency, lower steady-state throughput. |
jit_max_region_blocks |
uint32_t |
8 |
Max basic blocks fused into one JIT region. 1 disables chaining beyond the self-loop; larger fuses longer paths (fewer dispatcher round-trips) at higher compile cost. |
GDB stub¶
| Field | Type | Default | Meaning |
|---|---|---|---|
gdb_stub_port |
uint16_t |
0 |
If non-zero, bind the GDB stub on 127.0.0.1:port during initialize(). 0 disables it (no socket; one null-pointer test on the hot path). |
gdb_stub_wait_for_client |
bool |
false |
If true and the stub is bound, initialize() blocks until a GDB client connects. Useful to break on the very first instruction. |
PROM (boot ROM)¶
| Field | Type | Default | Meaning |
|---|---|---|---|
prom_base |
uint32_t |
0x00000000 |
Physical base of the PROM region. |
prom_size |
uint32_t |
32 MiB |
PROM size in bytes. 0 disables PROM entirely (GR740 kit sets 0). Must be a multiple of 4 when non-zero. |
prom_image_path |
path |
empty | File loaded into PROM at offset 0 (binary or mkprom2 ELF). Must be ≤ prom_size. |
prom_image_blob |
vector<byte> |
empty | Inline alternative to prom_image_path. If both are set, the blob wins. |
prom_fill |
byte |
0x00 |
Fill byte for PROM space not covered by the image. |
Raw memory images¶
| Field | Type | Default | Meaning |
|---|---|---|---|
memory_images |
vector<MemoryImage> |
empty | Raw binaries copied to physical addresses at the end of initialize(), once every memory region is mounted. Applied in declaration order (later entries overwrite overlapping earlier ones). CLI: repeatable --bin <path>@<addr>. |
Each EmulatorConfig::MemoryImage carries path (file read at
initialize()), blob (inline alternative — wins when both are set,
the prom_image_* convention), and base (the physical address the
first byte lands on). The bytes are copied verbatim: no format
interpretation (an ELF magic only logs a warning — ELF guests go
through load_elf() or prom_image_*), and no effect on PC or any
other CPU state. Targets must be writable mapped memory — primary
RAM, FTAHBRAM, extra Ram entities modelling flash banks; the
read-only PROM window keeps its own prom_image_* pair. RAM content
survives reset() (warm-reset semantics), so images are applied
exactly once. This is the flight-configuration shape: bootloader in
PROM via prom_image_path, FSW images in flash banks via
memory_images.
Peripherals and devices¶
| Field | Type | Default | Meaning |
|---|---|---|---|
peripherals |
vector<PeripheralSpec> |
empty | Declarative peripheral list. The kits populate it with the full silicon set; you may modify, replace, or extend any entry. create validates it. An empty vector boots with only RAM and PROM. |
character_devices |
vector<ICharacterDevice*> |
empty | Non-owning pool referenced by PeripheralSpec::chardev_index. The kits size it to the UART count (filled with nullptr). Lifetime contract: each pointer must outlive the Emulator. In practice you set these via set_uart_character_device(i, ...), which owns the device for you. See UART and console. |
buses |
vector<BusSpec> |
empty | Declarative shared comms-bus media (CAN / SPI / MIL-STD-1553B), one flat list. Peripherals join a bus through their PeripheralSpec::connections edges. Build entries with the can_bus() / spi_bus() / mil_std_bus() helpers. The kits declare the silicon segments. |
plugins |
vector<PluginSpec> |
empty | Host-facing plugins (sniffers, monitors, connectors). They model no silicon and own no MMIO; attached after the buses are connected. Standalone-only. See Comms sniffers. |
pnp_placement |
optional<PnpPlacement> |
absent | Per-device AMBA Plug&Play placement for a composed machine. When set, the runtime derives the GRLIB PnP table generatively from it; when absent the historic hardcoded per-SoC layout is written, byte-identical. Produced by the tero_compose builder (the kits and .tero scripts set it). |
Execution method¶
Both methods are compiled into every build and chosen at runtime by
translation:
translation = true(default) — binary translation: each block is lowered to architecture-neutral IR and run through the tiered LLVM JIT (Baseline O0 immediately, hot blocks promoted to O2 on a background thread), with the IR interpreter as the fallback for anything the JIT cannot lower. The fast path.translation = false— thecore::stepSwitch interpreter: one instruction at a time. The reference path and correctness oracle; every translated path is validated bit-identical against it.
auto cfg = tero::compose::gr712rc_config();
cfg.translation = false; // force the interpreter (debugging)
// JIT tuning (only relevant when translation == true):
cfg.jit_background_opt = false; // Baseline-only, deterministic latency
cfg.jit_promotion_threshold = 50; // promote hot blocks to O2 sooner
LLVM (≥ 18) is a mandatory build dependency, so the JIT is always available — there is no build-time switch. A GDB stub keeps using the translation path (it single-steps only breakpoint-bearing blocks); a per-instruction observer (instruction trace) forces the interpreter. Deep dive: IR and LLVM JIT.
Execution mode (host threading)¶
Orthogonal to the execution method, execution_mode controls host
threading:
ExecutionMode::SingleThread(default) — all cores advance cooperatively in one host thread (round-robin quantum). The only SMP2-compatible mode; synchronisation primitives are inert, so it pays no locking overhead. TSO (SPARC Total Store Order — every core observes writes in program order) and atomic read-modify-write are correct by construction, since the single host thread serialises all memory operations.ExecutionMode::MultiThread— each simulated core runs on its own host thread, for standalone 1:1 throughput. Activates the thread-safe foundations (gated locks, atomic SPARC RMW, cross-core IRQ queue). CLI flag:--mt. Either execution method can run under either mode.
cfg.execution_mode = tero::runtime::ExecutionMode::MultiThread;
cfg.quantum_batch = 4; // amortise the cross-core barrier (tune per workload)
Clock frequency¶
GR712RC and GR740 are modelled with a single system-clock domain (one
clock drives the cores, the AMBA buses, and the on-chip peripherals).
cpu_clock_hz is that one knob. Both the per-instruction simulated time
(ns_per_insn, via cpi) and the GPTIMER prescaler period derive from it
via ns_per_cycle(hz) = round(1e9 / hz), so changing the frequency in one
place keeps the CPU and the timers consistent.
auto cfg = tero::compose::gr712rc_config();
cfg.cpu_clock_hz = 100'000'000; // 100 MHz
// ns_per_insn is recomputed by Emulator::create from cpu_clock_hz and cpi.
Direct-ELF guests need no rebuild on a clock change
Emulator::initialize() simulates the bootloader and re-derives the
GPTIMER scaler from cpu_clock_hz, so RTEMS still sees a 1 MHz timer
tick at any frequency — direct-ELF sptest/smptest images work
unchanged when you change the clock. Only mkprom2 guests bake a
fixed -freq into their bootloader and must be wrapped to match the
clock you configure. The kits model the real silicon frequencies
(80 MHz / 250 MHz).
Validation¶
Emulator::create(cfg) runs validate_emulator_config(cfg) and rejects
configurations that would produce an unbootable machine, returning an
ErrorCode:
num_cores == 0ornum_cores > 4→InvalidConfigram_size == 0→InvalidConfigquantum == 0→InvalidConfigcpi <= 0→InvalidConfig- A peripheral spec with an empty instance name, a null factory, a
duplicate name, an IRQ outside
[1, 31], an out-of-rangechardev_index, or a port connection to a non-existent peer →InvalidConfig - A
ram_base + ram_sizethat wraps the 32-bit address space →InvalidAddress
Port name resolution happens later (at the connect_ports lifecycle
phase, which needs the peripheral object). Fields not checked above are
accepted at face value: set ns_per_insn = 0 and the clock never advances
— that's your problem, not the validator's.
Mutating runtime services¶
Some settings live behind setters, not config fields, because they need a
service implementation rather than a value. Call them before
initialize() — peripherals cache the pointers at attach time.
| Setter | Replaces | Default |
|---|---|---|
set_logger(unique_ptr<ILogger>) |
All logging output | StdoutLogger |
set_character_device(unique_ptr<ICharacterDevice>) |
Console UART (UART0) chardev | StdoutCharDevice |
set_uart_character_device(i, unique_ptr<ICharacterDevice>) |
UART i's chardev |
nullptr for aux UARTs |
set_observer(unique_ptr<IEmulatorObserver>) |
Per-instruction/IRQ/trap observer (forces Switch) | none |
Pacing¶
pacing controls how the Emulator advances simulated time relative to
host wall-clock time:
PacingMode::Realtime(default): eachrun_for/run_untilis sliced intopacing_slice_nschunks (default 10 ms). Between chunks the emulator computes the wall-clock instant matchingcurrent_sim_time()andstd::this_thread::sleep_untils onstd::chrono::steady_clock. The simulated MHz followscpu_clock_hz/cpi, so a faster simulated clock means the host must execute more instructions per real second to stay on schedule.PacingMode::Turbo: free-running. The emulator never reads the host clock and runs the whole budget as fast as the host allows. The right choice for tests, batch runs, benchmarks, and the SMP2 wrapper (where an external scheduler dictates time). Mandatory for the SMP2 wrapper.
If the host can't keep up in Realtime, the simulation falls behind
silently — sleeping cannot make it faster. Tighten pacing_slice_ns for
finer pacing at higher syscall cost (1 ms is a sensible lower bound).
cfg.pacing = tero::runtime::PacingMode::Turbo; // free-running
// or:
cfg.pacing_slice_ns = 1'000'000; // tighter Realtime (1 ms)
Configuring through the CLI¶
tero-emu builds an EmulatorConfig from a SoC kit (or a .tero
board script via --machine — see Assembling a
machine) plus the flags in
the CLI reference. It exposes the fields you typically override
at run time (--ram, --cores, --mhz, --cpi, --turbo, --mt,
--gdb-port, --gdb-wait, --quantum, and the JIT diagnostic flags);
the rest stay at their kit/struct defaults. To tweak ram_base,
pacing_slice_ns, entry_point_override, the prom_* fields, or the
peripheral list, embed Tero as a library or extend
src/app/src/main.cpp.