Skip to content

Testing

Tero's primary goal is to run the RTEMS 5 / 6 leon3 BSP testsuite at high pass rates across uniprocessor and SMP configurations. Everything in the test strategy serves that goal: the C++ unit/integration tests guard the ISA and peripheral models, and the RTEMS guest-program suites measure the metric that actually matters — does real RTEMS finish its tests on Tero?

Coverage at a glance

The machine-generated per-ELF results live in Test results. The table below summarises what configurations are exercised on every run:

Suite SoC Cores Execution path What it validates
sptests GR712RC 1 JIT + Switch Uniprocessor SPARC ISA, peripherals, RTEMS scheduling
sptests GR740 1 JIT Same tests on LEON4 / IRQAMP
smptests GR712RC 2 JIT + Switch SMP IRQ wakeup, locks, atomic primitives
smptests GR740 4 JIT + Switch + MultiThread True multi-core concurrency
fptests GR712RC 1 JIT FPU correctness (SoftFloat 3e, hard gate)
fptests GR740 1 JIT FPU on LEON4

All results are scored by the official RTEMS algorithm (BEGIN + END markers required) and validated against Gaisler's SIS simulator as the oracle. See RTEMS known failures for the root-cause analysis of every non-PASS.


This page is the map of the test strategy: the test pyramid, the Gaisler/RTEMS SIS oracle, how to build the guest programs, how to run everything through CTest, the per-configuration matrix, and the scoring rules.

Testing philosophy (from CLAUDE.md)

  • Correctness over performance, always. The Switch interpreter is the naive correctness oracle; the JIT and IR interpreter must reproduce its behaviour bit-for-bit. Every optimisation phase exits with the testsuite pass rate the rate at phase entry.
  • Every handler needs ≥ 3 tests (normal, edge, flags/trap).
  • Test names describe behaviourTEST_CASE("ADD sets condition codes on overflow"), not test_add_3.
  • One PR = one module or one feature.
  • Two Emulator objects must coexist in one process — no singletons, no global mutable state. The unit suite enforces this implicitly by constructing many emulators per run.

The test pyramid

flowchart TD
    subgraph host["C++ test binaries (Catch2, ctest)"]
        U["Unit tests — tests/unit/<br/>fast, no I/O, one module each<br/>(decoder, handlers, FPU, peripherals, bus, …)"]
        I["Integration tests — tests/integration/<br/>full Emulator + hand-encoded or cross-compiled guests<br/>(bare-metal, DMA, GDB stub, SMP atomics)"]
    end
    subgraph guests["RTEMS guest-program suites (the pass-rate metric)"]
        SP["sptests — uniprocessor (sp*)"]
        SMP["smptests — SMP (smp*) N=2 / N=4 / MT"]
        FP["fptests — fptest01 (FPU)"]
    end
    ORACLE[["Gaisler/RTEMS SIS<br/>reference simulator (oracle)"]]

    U --> I --> SP & SMP & FP
    SP & SMP & FP -. "scored identically &<br/>diffed against" .-> ORACLE
  • Unit tests (tests/unit/) are the wide base: fast, deterministic, no I/O, one module or one concern per file. They feed instructions straight to CpuState through fake buses, or exercise a single peripheral's MMIO.
  • Integration tests (tests/integration/) build whole Emulator instances and run real or hand-encoded SPARC binaries end-to-end.
  • RTEMS guest-program suites are the apex: real RTEMS ELFs, scored the way upstream scores them, and validated against the SIS oracle.

Framework and build integration

  • Framework: Catch2 v3 (Catch2WithMain), fetched automatically via CMake FetchContent — the host need not provide it.
  • CMake option: TERO_BUILD_TESTS (default ON).
  • Two test executables (tests/CMakeLists.txt):
    • tero_tests — the main suite: every tests/unit/* and tests/integration/* file, linking the full stack (tero::core, tero::bus, tero::peripherals, tero::runtime, tero::ir, tero::arch_sparc, tero::jit, tero::defaults, plus demo_dma).
    • tero_jit_tests — a separate executable (test_llvm_smoke.cpp, test_tiered_jit.cpp, test_ir_jit.cpp) that links only tero::jit (not tero::core). The split keeps the JIT's background-compilation concurrency reachable under sanitizers without the rest of the stack in the way.
  • Discovery: catch_discover_tests() registers each Catch2 TEST_CASE as an individual CTest test. The main suite is registered with SKIP_REGULAR_EXPRESSION "SKIPPED:|tests? skipped" so SKIP(...) macros are reported as CTest skips, not failures.
  • -Werror and a strict warning set: 0 warnings / 0 errors is the bar.
  • Threads: the suite links Threads::Threads explicitly — the GatedMutex and Phase 13 MultiThread foundation tests spawn host threads.

The full tero_tests + tero_jit_tests discovery currently registers on the order of 700 individual CTest entries (one per TEST_CASE); the exact number drifts as tests are added — ctest --test-dir build -N prints the live count.

Running the suite

# Configure (tests enabled by default)
cmake -S . -B build -G Ninja

# Build
cmake --build build

# Run everything through CTest (one row per TEST_CASE)
ctest --test-dir build --output-on-failure

# List every discovered test without running (live count)
ctest --test-dir build -N

# Run a Catch2 binary directly with a tag filter
./build/tests/tero_tests "[unit]"
./build/tests/tero_tests "[integration]"
./build/tests/tero_jit_tests          # JIT-only executable

CTest understands each TEST_CASE separately, so -R / -E regex filters work at test granularity:

# Only the RTEMS guest suites
ctest --test-dir build -R "sptests|smptests|fptest" --output-on-failure

# Everything EXCEPT the slow SMP suites (minutes faster)
ctest --test-dir build -E "smptests"

Iterating on one RTEMS test

The RTEMS harnesses honour the TERO_ONLY_TEST environment variable: a comma-separated list of test stems restricts a whole-directory run to just those ELFs (see discover_elfs() in rtems_csv_harness.hpp). Unset in CI, so the full suite is always scored.

TERO_ONLY_TEST=sp04,smp08 ctest --test-dir build -R "sptests|smptests"

Test-harness environment variables

The library reads no environment variables — configuration is by struct, and the CLI tools take explicit flags (tero-emu --histogram-out, …). The test binary keeps three optional overrides, because a Catch2 test case cannot receive custom command-line arguments and the variables pass transparently through ctest. Each one has a committed default, so CI and plain local runs never set them; they exist for debugging workflows.

Variable Effect Default when unset
TERO_ONLY_TEST Comma-separated test stems; restricts the RTEMS sptests/smptests/fptests iterators to those guest ELFs (rtems_csv_harness.hpp, discover_elfs()) run every discovered ELF
TERO_RTEMS_HELLO_ELF Path to a replacement hello-world guest ELF, e.g. a locally rebuilt one (test_rtems_boot.cpp, test_gdb_stub_rtems.cpp, test_gdb_stub_protocol.cpp, test_jit_run_lockstep.cpp, test_ir_diff_lockstep.cpp) the committed tests/guest-programs/rtems/hello-world/hello-world.elf
TERO_RTEMS_FPTEST01_ELF Path to a replacement fptest01 guest ELF (test_rtems_fptests.cpp, test_ir_diff_lockstep.cpp) the committed tests/guest-programs/rtems/fptest01/fptest01.elf

Directory layout

tests/unit/ — Module tests

Fast, deterministic, no I/O. Each file targets one module or one concern. The complete list lives in tests/CMakeLists.txt; grouped by area:

Area Files
Strong types / utilities test_types.cpp, test_address_range.cpp, test_breakpoint_set.cpp, test_gated_mutex.cpp, test_module_versions.cpp, test_defaults.cpp
Bus / memory test_ram.cpp, test_system_bus.cpp, test_dma.cpp, test_cpu_bus_bridge.cpp, test_load_ram_image.cpp
CPU state / decoder test_cpu_state.cpp, test_cpu_state_fpu.cpp, test_decoder.cpp, test_fpu_decoder.cpp, test_cpi.cpp
Integer handlers test_handlers_alu.cpp, test_handlers_branch.cpp, test_handlers_loadstore.cpp, test_handlers_regwin.cpp, test_handlers_privileged.cpp
FPU test_softfloat_context.cpp, test_fpu_handlers_loadstore.cpp, test_fpu_handlers_moves.cpp, test_fpu_handlers_arith.cpp, test_fpu_handlers_compare.cpp, test_fpu_handlers_branch.cpp, test_fpu_traps.cpp
Traps / step test_traps.cpp, test_step.cpp
Arch-neutral IR / SPARC frontend test_ir_data_model.cpp, test_ir_interpreter.cpp, test_sparc_layout.cpp, test_sparc_lockstep.cpp, test_sparc_arch.cpp, test_sparc_state_aliasing.cpp
Peripherals test_memctrl.cpp, test_irqmp.cpp, test_irqamp.cpp, test_irq_concurrent.cpp, test_gptimer.cpp, test_grgpio.cpp, test_apbuart.cpp, test_prom.cpp, test_prom_config.cpp
Runtime / config / SoC recipes test_emulator.cpp, test_emulator_from_spec.cpp, test_emulator_observer.cpp, test_emulator_pacing.cpp, test_emulator_smp_irq_wakeup.cpp, test_peripheral_spec_validate.cpp, test_port_system.cpp, test_pnp_table.cpp, test_gr740_config.cpp, test_elf_loader.cpp, test_event_scheduler.cpp
GDB stub test_gdb_stub_codec.cpp (RSP framing/checksum, signal_from_tt map)

test_opcode_histogram.cpp is compiled only when TERO_OPCODE_HISTOGRAM is set (the Phase 9.4 instrumentation counter — otherwise CpuState::opcode_histogram() does not exist and the file would not compile).

The JIT-only executable tero_jit_tests adds: test_llvm_smoke.cpp (LLVM ORCv2 wiring), test_tiered_jit.cpp (baseline O0 + background O2 promotion, ADR-002), and test_ir_jit.cpp (IR → native lowering).

tests/integration/ — End-to-end tests

Full Emulator instances running real or hand-crafted binaries.

File Coverage
test_bare_metal.cpp Hand-encoded SPARC binaries (NOP sled, error-mode detection, APBUart MMIO)
test_demo_dma_device.cpp DemoDmaDevice attached via public add_peripheral, DMA + IRQ exercised
test_hello_uart_elf.cpp Cross-compiled hello_uart.elf; asserts 'A' appears on captured UART
test_regwin_roundtrip.cpp SAVE/RESTORE register-window round-trip against a cross-compiled guest
test_rtems_boot.cpp RTEMS 5 hello-world.elf boot (greeting on UART). Gated behind an operator ELF (TERO_RTEMS_HELLO_ELF) — SKIPs if absent
test_rtems_sptests.cpp RTEMS 5 sptests — GR712RC N=1 (JIT + Switch) and GR740 N=1. CSV-emitting harness
test_rtems_smptests.cpp RTEMS 5 smptests — N=2 (GR712RC) and N=4 (GR740), each JIT + Switch, plus N=4 MultiThread
test_rtems_fptests.cpp RTEMS FP test fptest01.elf on GR712RC (must pass) and GR740 (SKIP-if-absent)
test_rtems_hello_tero.cpp mkprom2-built PROM image boots and prints via APBUart through the PROM peripheral
test_smp_atomics.cpp SMP atomic primitives (smp_atomic.S, smp_swap.S, smp_casa.S) on multi-core configs (GR712RC + GR740)
test_ir_diff_lockstep.cpp IR interpreter vs Switch in lockstep on synthetic blocks (translation oracle)
test_jit_run_lockstep.cpp JIT run-loop vs Switch on real RTEMS code (needs the full stack; lives in tero_tests)
test_smpschededf02_trace.cpp Dispatch-frame probe that diagnosed the smpschededf02 stack-overflow root cause
test_gdb_stub_protocol.cpp In-process GDB RSP over TCP (scripted client): codec, late-binding attach, 2nd-client rejection, qSymbol handshake, stop-on-ErrorMode T0b/T0a mapping, RTEMS thread-awareness, [!mayfail] live-guest
test_gdb_stub_rtems.cpp Real sparc-gaisler-rtems5-gdb front-end against the Emulator (conditional — SKIPs if GDB binary missing/broken)
test_gdb_stub_dual_core_timer.cpp GDB stub against the dual-core-timer PROM guest (per-core thread enumeration)

tests/support/ — Test fixtures

File Purpose
dummy_peripheral.cpp/.hpp DummyPeripheral — an IPeripheral with MMIO registers and DMA triggers, used by bus/DMA unit tests
capturing_char_device.hpp Thread-safe ICharacterDevice recording all transmitted bytes into a std::string for assertions (the backbone of every UART-output test)
sparc_encoders.hpp Constexpr SPARC V8 instruction encoders (enc_add_imm, enc_bicc, enc_save_imm, enc_jmpl, enc_rett, …) so step-level tests plant valid words without raw hex
test_bus.hpp FakeBus / MemBus (BE RAM helper) / ErrorBus (always returns BusError) for feeding instructions directly to CpuState
test_config.hpp tero::testing::*_test_config()EmulatorConfig factories that wrap the production recipes but force PacingMode::Turbo and disable PROM, so a test can never accidentally drag itself out under wall-clock pacing

The RTEMS guest-program suites

The RTEMS suites are the pass-rate metric. They live under tests/guest-programs/rtems/ and are driven by a single shared harness.

The shared CSV harness

tests/integration/rtems_csv_harness.hpp (namespace tero::test_support::rtems_csv) does the heavy lifting for sptests, smptests and fptests alike:

  1. Discover every *.elf under a directory (discover_elfs()), sorted for stable CSV ordering, filtered by TERO_ONLY_TEST if set.
  2. Run one ELF on a fresh Emulator (run_one_elf()): a new instance per test (no cross-test state), a CapturingCharDevice on the UART, the logger forced to Error level (so a misbehaving guest can't bury the harness in [WARN] spam), then run_for() is sliced into short simulated chunks (sim_slice) so the loop can break the instant the END marker appears or a wall deadline passes — the common PASS path early-exits well under a second of host time.
  3. Score the captured console with official_outcome() (see Scoring below).
  4. Write one CSV row per ELF, one CSV file per directory, under tests/results/ (write_csv()). An empty directory yields a single _no_binaries_,SKIPPED sentinel row so the file is always well-formed — which is why the integration TEST_CASEs always SUCCEED even with no binaries present: the CSV is the ground truth, not the Catch2 assertion.

Budgets are calibrated to the official reference run (DefaultBudgets in the harness):

Budget Value Rationale
sim_budget 200 s simulated Matches the official -tlim 200 s from leon3-sis.ini — a test gets the same simulated-time budget here as on SIS
sim_slice 50 ms simulated Polling granularity for early-exit on the END marker
wall_budget 240 s host A Tero-side practicality: the functional core runs below real time, so a 200 s-simulated CPU-bound run needs more host seconds than SIS. PASS runs early-exit long before this

The slower Switch-interpreter reference runs use a tighter SwitchBudgets (60 s sim / 30 s wall) so the suite finishes in minutes; CPU-heavy tests may slow-timeout under it, which is interpreter speed, not a regression.

fptest01: the one hard REQUIRE

Every RTEMS harness TEST_CASE merely SUCCEEDs and lets the CSV record the truth — except the GR712RC fptest01 case in test_rtems_fptests.cpp, which REQUIREs a PASS. FPU correctness with SoftFloat 3e is a 100 % exit criterion, so it is a hard gate. The GR740 fptest case is SKIP-if-absent (header-only CSV) so CI hosts without the GR740 build stay green.

Building the guest programs (RCC toolchain)

The RTEMS ELFs are not committed (except the legacy hello-world.elf and fptest01.elf). They are produced by three build scripts under tests/guest-programs/rtems/, which compile the RCC-bundled testsuite sources against the matching BSP:

Script Output dir Suite
build_sptests.sh [gr712rc\|gr740\|both] sptests/bin/, sptests-gr740/bin/ sptests
build_smptests.sh [n2\|n4\|both] smptests-leon3-n2/bin/, smptests-gr740-n4/bin/ smptests
build_fptests.sh [gr712rc\|gr740\|both] fptest01/, fptest01-gr740/ fptest01

All three read the RCC tree at ${RCC_PREFIX:-/opt/rcc-1.3.2-gcc}/src/rcc-1.3.2/testsuites/ and the BSP libraries under ${RCC_PREFIX}/sparc-gaisler-rtems5/<bsp>/lib. RCC 1.3.2 ships RTEMS 5.3 (the aclocal macro says 5.0.0 but the authoritative CHANGELOG.RCC says 5.3). Tests that fail to compile (C++-only sources, missing helpers) are silently skipped — the harness simply never sees those binaries.

Why -msoft-float for sp/smp tests

Both build_sptests.sh and build_smptests.sh compile with -msoft-float. The reason is the "banner-then-die" cluster: newlib's hard-float vfprintf emits an LDDF even for %d/%s formats, and the RTEMS init task is created without RTEMS_FLOATING_POINT, so the FP op traps (TT=0x04) and RTEMS panics with INTERNAL_ERROR_ILLEGAL_USE_OF_FLOATING_POINT_UNIT. Soft-float lowers FP to __muldf3-style calls and avoids the trap. The fix is in the build flags, not in relaxing the emulator's (correct) FP-disabled trap.

Two deliberate exceptions:

  • spfatal30 / spfatal31 are built hard-float — they intend to provoke the fp_disabled trap, so a soft-float lowering would defeat the test (the END marker would never print). For GR712RC that means the plain leon3 BSP (hard-float sibling of leon3_sf); GR740 selects float mode via multilib.
  • smptests are built with -DCONFIGURE_MINIMUM_TASK_STACK_SIZE=16384 (up from the 4 KiB SPARC default). Diagnosed via test_smpschededf02_trace.cpp: the EDF worker threads overflow a 4 KiB stack and corrupt the next thread's saved dispatch frame. The 16 KiB bump took N=2 from 41→42 PASS.

fptest01 is built hard-float because it ships as RTEMS_FLOATING_POINT and genuinely needs the FPU.

The asm guest programs (tests/guest-programs/asm/) are auto-built by CMake when sparc-gaisler-rtems5-gcc is on PATH (-mcpu=leon3 -msoft-float -nostdlib -nostartfiles, .text at 0x40000000); absent toolchain → the dependent integration test SKIPs at runtime. Default toolchain locations: RCC at /opt/rcc-1.3.2-gcc/bin/, mkprom2 at /opt/mkprom2/mkprom2 (overridable with -DSPARC_CC= / -DMKPROM2=).

The per-configuration matrix

Each guest suite is run under several configurations so a single change is checked against every relevant SoC, core count and execution path. The default (JIT, Realtime overridden to Turbo by the harness) is the production path; the Switch runs are the correctness reference the JIT must reproduce; the MultiThread run is the faithful true-concurrency SMP model (ADR-001).

TEST_CASE SoC Cores Exec path Output CSV Tags
sptests run on GR712RC N=1 GR712RC 1 JIT sptests.csv [sptests][gr712rc]
sptests … under the Switch interpreter GR712RC 1 Switch sptests-switch.csv [sptests][switch]
sptests run on GR740 N=1 GR740 1 JIT sptests-gr740.csv [sptests][gr740]
smptests run on GR712RC N=2 GR712RC 2 JIT smptests-N2.csv [smptests][n2]
smptests … N=2 under the Switch interpreter GR712RC 2 Switch smptests-N2-switch.csv [smptests][n2][switch]
smptests run on GR740 N=4 GR740 4 JIT smptests-N4.csv [smptests][n4]
smptests … N=4 under MultiThread GR740 4 JIT, thread-per-core smptests-N4-mt.csv [smptests][n4][mt]
smptests … N=4 under the Switch interpreter GR740 4 Switch smptests-N4-switch.csv [smptests][n4][switch]
fptest01 on GR712RC GR712RC 1 JIT fptests.csv [fpu][gr712rc]
fptest01 on GR740 GR740 1 JIT fptests-gr740.csv [fpu][gr740]

The Switch-vs-JIT siblings exist so the two engines can be diffed: any row that differs between sptests.csv and sptests-switch.csv is a translation defect, since the Switch path is the oracle. The N=4 -mt sibling exists to prove the thread-per-core JIT reproduces the SingleThread N=4 pass set under true parallelism.

The SMP quantum is configuration-specific

test_rtems_smptests.cpp overrides the production 1000-instruction quantum for SingleThread SMP runs: q=200 for N≤2, q=400 for N=4. The cooperative round-robin can let a core ticket-spin at PIL=15 for a whole quantum waiting on a release that only happens in another core's later quantum — a finer quantum converges those handshakes (matching SIS and MultiThread). The optimum is non-monotonic and per-configuration; each value is a no-regression clean win against its own suite. There is no single quantum that converges everything under cooperative scheduling — that ceiling is exactly what MultiThread removes. MultiThread is unaffected (each core already has its own host thread).

Scoring: official RTEMS rules

Tero scores a captured console exactly as the upstream rtems-tools rt/report.py report.end() scores a SIS run. This is the single most important fidelity decision in the test strategy: a Tero result lands in the same category the reference assigns to the identical SIS run, so "Tero passes" means "Tero passes by the official rules", not by a relaxed home-grown criterion.

The scorer is official_outcome() in rtems_csv_harness.hpp (and its Python twin official_score() in scripts/oracle_compare.py). The rules:

Console contents *** TEST STATE: Outcome Failure?
*** BEGIN OF TEST and *** END OF TEST any PASS
BEGIN, no END, ran out the budget default / EXPECTED_PASS TIMEOUT Yes
BEGIN, no END, halted on its own default / EXPECTED_PASS FAIL Yes
No BEGIN at all default / EXPECTED_PASS INVALID No — excluded
BEGIN, no END EXPECTED_FAIL EXPECTED-FAIL No — excluded
BEGIN, no END INDETERMINATE INDETERMINATE No — excluded

Key consequences, mirrored on both sides:

  • A test PASSES only when both BEGIN and END banners appear — the end-of-test marker alone is the criterion (Decision 36).
  • INVALID / EXPECTED-FAIL / INDETERMINATE are not failures and are excluded from the denominator, exactly like upstream. An INVALID run fataled before the BSP debug UART came up (no BEGIN reaches the console) — correct silicon behaviour, not an emulator defect.
  • The minimum test is special-cased to PASS even with no BEGIN (it prints nothing by design) — matching rtems-tools' TEST_FAIL_EXCLUDES.
  • The official scorer needs to tell a budget TIMEOUT apart from a self-halted FAIL; the harness tracks broke_early to distinguish them.

Because both the C++ harness and the SIS-comparison script implement the same state machine, the emitted markers must match the official ones, including the BEGIN banner — see fptest01 emits BEGIN marker and the matching guest commit (656cdba) that added the BEGIN marker for conformance.

The SIS oracle

The Gaisler/RTEMS SIS (sparc-rtems5-sis, vendored at ../rtems-sis) is the authoritative reference for the SPARC ISA, IRQMP/GPTIMER/APBUART/GRETH and the GR712/GR740 configs. The RTEMS testsuite itself is validated on SIS — so the question "is Tero correct?" reduces to "does Tero agree with SIS?"

Two scripts answer that question at two granularities.

flowchart LR
    ELF["RTEMS test ELF"]
    ELF --> S1["SIS<br/>(reference)"]
    ELF --> L1["Tero<br/>(under test)"]

    subgraph oc["oracle_compare.py — per-test agreement"]
        S1 --> SC["official_score(console)"]
        L1 --> LC["official_score(console)"]
        SC --> CMP{"same<br/>outcome?"}
        LC --> CMP
        CMP -->|yes| OK["AGREE — Tero validated as SIS validates"]
        CMP -->|no| DIV["DIVERGE — a concrete, reviewable difference"]
    end

    subgraph ls["lockstep_compare.py — per-instruction divergence"]
        S2["SIS tra (per-core PC stream)"]
        L2["Tero --trace (per-core PC stream)"]
        S2 --> RS["resync_diff() per core<br/>(tolerate benign spin-loop counts)"]
        L2 --> RS
        RS --> FIRST["FIRST non-resyncing PC divergence = the bug"]
    end

scripts/oracle_compare.py — per-test agreement

Runs each test ELF on both SIS and Tero with matched configs and the same 200 s simulated-time limit, applies the identical official scoring to both consoles, and reports per-test agreement plus a divergence list and a by-category breakdown. Agreement = Tero is validated exactly as SIS validates; a divergence is a concrete, reviewable difference — not a harness artefact.

# Compare the GR712RC sptests against SIS (defaults: SIS at ../rtems-sis/sis,
# tero-emu at build/src/app/tero-emu)
scripts/oracle_compare.py sptests-gr712rc

# Only a couple of stems
scripts/oracle_compare.py smptests-n4 smp08 smpmulticast01

Profiles encode the matched SIS args (straight from leon3-sis.ini / a -gr740 equivalent) and the corresponding Tero args, including the SingleThread SMP quantum:

Profile ELF dir SIS args Tero args
sptests-gr712rc sptests/bin -leon3 -m 2 --soc gr712rc --cores 2 --quantum 50
smptests-n2 smptests-leon3-n2/bin -leon3 -m 2 --soc gr712rc --cores 2 --quantum 50
sptests-gr740 sptests-gr740/bin -gr740 --soc gr740 --quantum 400
smptests-n4 smptests-gr740-n4/bin -gr740 --soc gr740 --quantum 400
smptests-n2-mt smptests-leon3-n2/bin -leon3 -m 2 --soc gr712rc --cores 2 --mt
smptests-n4-mt smptests-gr740-n4/bin -gr740 --soc gr740 --mt

The -mt profiles exist because SIS already models true multi-core concurrency, so the faithful Tero counterpart is --mt (thread-per-core), not the cooperative round-robin — several tests that diverge in the SingleThread profiles converge under --mt.

scripts/lockstep_compare.py — per-instruction divergence

When a test diverges, this comparator finds where. It runs the same GR740 ELF on Tero (--trace, which installs a TraceObserver that emits cpu <N> <pc> per instruction to stderr and forces the deterministic Switch path) and on SIS (tra), parses each into a per-core PC sequence, and reports the first per-core divergence — the first instruction where a core's control flow on Tero differs from the oracle.

scripts/lockstep_compare.py path/to/smpfoo.elf --quantum 50 --insns 2000000

The comparison is per core (cross-core interleaving differs between Tero's round-robin and SIS's 50-clock fine interleave). resync_diff() tolerates benign spin-loop count differences — one side iterating a ticket lock a different number of turns because cross-core release timing differs — by skipping the extra iterations on whichever side looped more, confirmed by a window of consecutive matches. A spin loop resyncs within a few iterations; a real bug never resyncs, and that first non-resyncing PC divergence is the defect to investigate.

SIS MP-trace gotchas (from the lockstep work)

SIS only engages the MP interleaver when ncpu>1 and an instruction count is given (-m 4 + a clock budget), and the MP boot state is only set up by boot_init(), which tra skips but go runs. The script uses go 0 0 to boot without retiring an instruction, then tra <clocks>, so SIS's first traced PC aligns with Tero's instruction #0. The <clocks> argument is SIS simulated clocks, not instructions — the script sizes it from Tero's longest per-core stream (×6 margin).

The lockstep work has already produced concrete findings — e.g. the leon3_counter_initialize %asr23 timecounter probe picks GPTIMER on Tero vs %asr23 on SIS (a universal, benign difference, documented in known failures).

Recording results

Per-ELF outcomes are written to tests/results/*.csv by the harness on every run (always, even with no binaries). scripts/aggregate_rtems5.sh rolls them into rtems5-aggregate.csv, and the MkDocs evidence hook (scripts/mkdocs_evidence_hook.py) injects both into Test results at build time. The full how-it's-produced / how-to-regenerate story is on that page.

How to add a test

Add a unit test for a handler / module

  1. Create tests/unit/test_<thing>.cpp and add it to the tero_tests source list in tests/CMakeLists.txt.
  2. Write ≥ 3 TEST_CASEs (normal, edge, flags/trap) with behaviour-describing names and the right tags (e.g. [unit][handlers]).
  3. For ISA-level tests, feed instructions to CpuState through a fake bus (test_bus.hpp) and plant words with sparc_encoders.hpp — no magic hex.
  4. For UART-output tests, inject a CapturingCharDevice and assert on captured().
  5. cmake --build build && ctest --test-dir build -R <thing>.

Add an integration test

  1. Create tests/integration/test_<thing>.cpp, add it to tero_tests.
  2. Build a full Emulator from a tero::testing::*_test_config() factory (Turbo pacing, PROM off).
  3. If it needs a cross-compiled payload, add the source under tests/guest-programs/ and wire a tero_test_path_define(...) so the path is a compile definition (empty when the toolchain is absent → the test SKIPs cleanly). Follow the conditional-skip pattern.

Add an RTEMS guest test

Most RTEMS coverage grows by provisioning more ELFs, not new C++:

  1. The build scripts already iterate every sp* / smp* directory in the RCC tree — installing more of the RCC testsuite and re-running build_sptests.sh / build_smptests.sh makes the harness pick them up automatically (the directory scan is dynamic).
  2. To add a new SoC/core/exec-path configuration, add a TEST_CASE to test_rtems_sptests.cpp / test_rtems_smptests.cpp that calls run_sptest_directory / run_smptest_directory with the right config factory and a new CSV name, then register that CSV in scripts/mkdocs_evidence_hook.py SUITES and scripts/aggregate_rtems5.sh if it should appear in the published tables.
  3. For a brand-new hand-written guest (asm or RTEMS C), add a directory under tests/guest-programs/ with a README and a CMake rule following the existing patterns (asm/CMakeLists.txt or tero_add_rtems_app() in rtems/CMakeLists.txt).

Conditional skips

A handful of tests SKIP when an external artifact is absent — this keeps the build hermetic and CI green when toolchains aren't installed. They indicate an environment mismatch, not a library bug:

  • test_rtems_boot — RTEMS hello ELF not present (TERO_RTEMS_HELLO_ELF unset and no committed default).
  • GdbStub interoperates with sparc-gaisler-rtems5-gdb — GDB binary missing or broken.
  • Cross-compiled asm guests (hello_uart, smp_*) — RCC toolchain absent.
  • The [!mayfail] live-guest GDB test — nm missing or the hello ELF's RTEMS layout drifted from the constants in rtems_layout::* (see Debugging with GDB).
  • RTEMS guest suites with no provisioned ELFs — the harness writes a header-only CSV and the TEST_CASE SUCCEEDs.

Tag taxonomy

Catch2 tags split the suite into focused subsets:

./build/tests/tero_tests "[gdb_stub]"
./build/tests/tero_tests "[emulator],[rtems][boot]"
./build/tests/tero_tests "[rtems][smptests][n4][mt]"
Tag What it selects
[unit] / [integration] Coarse split between fast no-IO tests and full-emulator tests
[emulator] Emulator public API, lifecycle, pacing, SMP wake-up
[rtems] Any RTEMS test (boot, sptests, fptests, smptests, hello-tero PROM)
[rtems][boot] RTEMS boot test only
[rtems][sptests] / [smptests] / [fpu] The guest-program suites
[switch] / [mt] / [n2] / [n4] / [gr712rc] / [gr740] Per-config selectors on the RTEMS suites
[gdb_stub] (+ [codec], [protocol], [late-binding], [qsymbol], [error-mode], [rtems-aware]) GDB stub coverage
[mmio] MMIO word-path verifications (IRQMP, GPTimer, APBUart, MemCtrl)
[!mayfail] Tests that depend on toolchain/environment alignment. A failure means refresh the constants, not "library bug"

See also

  • Test results — how results are produced, recorded and regenerated; the live per-suite tables.
  • RTEMS known failures — root-cause analysis of every non-PASS, classified against the SIS oracle.
  • Execution model — Switch vs IR vs JIT paths the matrix exercises.
  • Multicore & timing — round-robin quantum, idle-skip, and the SMP scheduling the smptests stress.
  • Decisions — Decision 36 (end-of-test scoring) and the ADRs the matrix reflects.