Testing¶
Tero's primary goal is to run the RTEMS 5 / 6 leon3 BSP testsuite at
high pass rates across uniprocessor and SMP configurations. Everything in the
test strategy serves that goal: the C++ unit/integration tests guard the ISA
and peripheral models, and the RTEMS guest-program suites measure the metric
that actually matters — does real RTEMS finish its tests on Tero?
Coverage at a glance¶
The machine-generated per-ELF results live in Test results. The table below summarises what configurations are exercised on every run:
| Suite | SoC | Cores | Execution path | What it validates |
|---|---|---|---|---|
| sptests | GR712RC | 1 | JIT + Switch | Uniprocessor SPARC ISA, peripherals, RTEMS scheduling |
| sptests | GR740 | 1 | JIT | Same tests on LEON4 / IRQAMP |
| smptests | GR712RC | 2 | JIT + Switch | SMP IRQ wakeup, locks, atomic primitives |
| smptests | GR740 | 4 | JIT + Switch + MultiThread | True multi-core concurrency |
| fptests | GR712RC | 1 | JIT | FPU correctness (SoftFloat 3e, hard gate) |
| fptests | GR740 | 1 | JIT | FPU on LEON4 |
All results are scored by the official RTEMS algorithm (BEGIN + END markers required) and validated against Gaisler's SIS simulator as the oracle. See RTEMS known failures for the root-cause analysis of every non-PASS.
This page is the map of the test strategy: the test pyramid, the Gaisler/RTEMS SIS oracle, how to build the guest programs, how to run everything through CTest, the per-configuration matrix, and the scoring rules.
Testing philosophy (from CLAUDE.md)
- Correctness over performance, always. The Switch interpreter is the naive correctness oracle; the JIT and IR interpreter must reproduce its behaviour bit-for-bit. Every optimisation phase exits with the testsuite pass rate ≥ the rate at phase entry.
- Every handler needs ≥ 3 tests (normal, edge, flags/trap).
- Test names describe behaviour —
TEST_CASE("ADD sets condition codes on overflow"), nottest_add_3. - One PR = one module or one feature.
- Two
Emulatorobjects must coexist in one process — no singletons, no global mutable state. The unit suite enforces this implicitly by constructing many emulators per run.
The test pyramid¶
flowchart TD
subgraph host["C++ test binaries (Catch2, ctest)"]
U["Unit tests — tests/unit/<br/>fast, no I/O, one module each<br/>(decoder, handlers, FPU, peripherals, bus, …)"]
I["Integration tests — tests/integration/<br/>full Emulator + hand-encoded or cross-compiled guests<br/>(bare-metal, DMA, GDB stub, SMP atomics)"]
end
subgraph guests["RTEMS guest-program suites (the pass-rate metric)"]
SP["sptests — uniprocessor (sp*)"]
SMP["smptests — SMP (smp*) N=2 / N=4 / MT"]
FP["fptests — fptest01 (FPU)"]
end
ORACLE[["Gaisler/RTEMS SIS<br/>reference simulator (oracle)"]]
U --> I --> SP & SMP & FP
SP & SMP & FP -. "scored identically &<br/>diffed against" .-> ORACLE
- Unit tests (
tests/unit/) are the wide base: fast, deterministic, no I/O, one module or one concern per file. They feed instructions straight toCpuStatethrough fake buses, or exercise a single peripheral's MMIO. - Integration tests (
tests/integration/) build wholeEmulatorinstances and run real or hand-encoded SPARC binaries end-to-end. - RTEMS guest-program suites are the apex: real RTEMS ELFs, scored the way upstream scores them, and validated against the SIS oracle.
Framework and build integration¶
- Framework: Catch2 v3 (
Catch2WithMain), fetched automatically via CMakeFetchContent— the host need not provide it. - CMake option:
TERO_BUILD_TESTS(defaultON). - Two test executables (
tests/CMakeLists.txt):tero_tests— the main suite: everytests/unit/*andtests/integration/*file, linking the full stack (tero::core,tero::bus,tero::peripherals,tero::runtime,tero::ir,tero::arch_sparc,tero::jit,tero::defaults, plusdemo_dma).tero_jit_tests— a separate executable (test_llvm_smoke.cpp,test_tiered_jit.cpp,test_ir_jit.cpp) that links onlytero::jit(nottero::core). The split keeps the JIT's background-compilation concurrency reachable under sanitizers without the rest of the stack in the way.
- Discovery:
catch_discover_tests()registers each Catch2TEST_CASEas an individual CTest test. The main suite is registered withSKIP_REGULAR_EXPRESSION "SKIPPED:|tests? skipped"soSKIP(...)macros are reported as CTest skips, not failures. -Werrorand a strict warning set: 0 warnings / 0 errors is the bar.- Threads: the suite links
Threads::Threadsexplicitly — theGatedMutexand Phase 13 MultiThread foundation tests spawn host threads.
The full tero_tests + tero_jit_tests discovery currently registers on
the order of 700 individual CTest entries (one per TEST_CASE); the exact
number drifts as tests are added — ctest --test-dir build -N prints the
live count.
Running the suite¶
# Configure (tests enabled by default)
cmake -S . -B build -G Ninja
# Build
cmake --build build
# Run everything through CTest (one row per TEST_CASE)
ctest --test-dir build --output-on-failure
# List every discovered test without running (live count)
ctest --test-dir build -N
# Run a Catch2 binary directly with a tag filter
./build/tests/tero_tests "[unit]"
./build/tests/tero_tests "[integration]"
./build/tests/tero_jit_tests # JIT-only executable
CTest understands each TEST_CASE separately, so -R / -E regex filters
work at test granularity:
# Only the RTEMS guest suites
ctest --test-dir build -R "sptests|smptests|fptest" --output-on-failure
# Everything EXCEPT the slow SMP suites (minutes faster)
ctest --test-dir build -E "smptests"
Iterating on one RTEMS test
The RTEMS harnesses honour the TERO_ONLY_TEST environment variable: a
comma-separated list of test stems restricts a whole-directory run to just
those ELFs (see discover_elfs() in rtems_csv_harness.hpp). Unset in CI,
so the full suite is always scored.
Test-harness environment variables¶
The library reads no environment variables — configuration is by struct, and
the CLI tools take explicit flags (tero-emu --histogram-out, …). The test
binary keeps three optional overrides, because a Catch2 test case cannot
receive custom command-line arguments and the variables pass transparently
through ctest. Each one has a committed default, so CI and plain local runs
never set them; they exist for debugging workflows.
| Variable | Effect | Default when unset |
|---|---|---|
TERO_ONLY_TEST |
Comma-separated test stems; restricts the RTEMS sptests/smptests/fptests iterators to those guest ELFs (rtems_csv_harness.hpp, discover_elfs()) |
run every discovered ELF |
TERO_RTEMS_HELLO_ELF |
Path to a replacement hello-world guest ELF, e.g. a locally rebuilt one (test_rtems_boot.cpp, test_gdb_stub_rtems.cpp, test_gdb_stub_protocol.cpp, test_jit_run_lockstep.cpp, test_ir_diff_lockstep.cpp) |
the committed tests/guest-programs/rtems/hello-world/hello-world.elf |
TERO_RTEMS_FPTEST01_ELF |
Path to a replacement fptest01 guest ELF (test_rtems_fptests.cpp, test_ir_diff_lockstep.cpp) |
the committed tests/guest-programs/rtems/fptest01/fptest01.elf |
Directory layout¶
tests/unit/ — Module tests¶
Fast, deterministic, no I/O. Each file targets one module or one concern.
The complete list lives in tests/CMakeLists.txt; grouped by area:
| Area | Files |
|---|---|
| Strong types / utilities | test_types.cpp, test_address_range.cpp, test_breakpoint_set.cpp, test_gated_mutex.cpp, test_module_versions.cpp, test_defaults.cpp |
| Bus / memory | test_ram.cpp, test_system_bus.cpp, test_dma.cpp, test_cpu_bus_bridge.cpp, test_load_ram_image.cpp |
| CPU state / decoder | test_cpu_state.cpp, test_cpu_state_fpu.cpp, test_decoder.cpp, test_fpu_decoder.cpp, test_cpi.cpp |
| Integer handlers | test_handlers_alu.cpp, test_handlers_branch.cpp, test_handlers_loadstore.cpp, test_handlers_regwin.cpp, test_handlers_privileged.cpp |
| FPU | test_softfloat_context.cpp, test_fpu_handlers_loadstore.cpp, test_fpu_handlers_moves.cpp, test_fpu_handlers_arith.cpp, test_fpu_handlers_compare.cpp, test_fpu_handlers_branch.cpp, test_fpu_traps.cpp |
| Traps / step | test_traps.cpp, test_step.cpp |
| Arch-neutral IR / SPARC frontend | test_ir_data_model.cpp, test_ir_interpreter.cpp, test_sparc_layout.cpp, test_sparc_lockstep.cpp, test_sparc_arch.cpp, test_sparc_state_aliasing.cpp |
| Peripherals | test_memctrl.cpp, test_irqmp.cpp, test_irqamp.cpp, test_irq_concurrent.cpp, test_gptimer.cpp, test_grgpio.cpp, test_apbuart.cpp, test_prom.cpp, test_prom_config.cpp |
| Runtime / config / SoC recipes | test_emulator.cpp, test_emulator_from_spec.cpp, test_emulator_observer.cpp, test_emulator_pacing.cpp, test_emulator_smp_irq_wakeup.cpp, test_peripheral_spec_validate.cpp, test_port_system.cpp, test_pnp_table.cpp, test_gr740_config.cpp, test_elf_loader.cpp, test_event_scheduler.cpp |
| GDB stub | test_gdb_stub_codec.cpp (RSP framing/checksum, signal_from_tt map) |
test_opcode_histogram.cpp is compiled only when TERO_OPCODE_HISTOGRAM
is set (the Phase 9.4 instrumentation counter — otherwise
CpuState::opcode_histogram() does not exist and the file would not compile).
The JIT-only executable tero_jit_tests adds: test_llvm_smoke.cpp (LLVM
ORCv2 wiring), test_tiered_jit.cpp (baseline O0 + background O2 promotion,
ADR-002), and test_ir_jit.cpp (IR → native lowering).
tests/integration/ — End-to-end tests¶
Full Emulator instances running real or hand-crafted binaries.
| File | Coverage |
|---|---|
test_bare_metal.cpp |
Hand-encoded SPARC binaries (NOP sled, error-mode detection, APBUart MMIO) |
test_demo_dma_device.cpp |
DemoDmaDevice attached via public add_peripheral, DMA + IRQ exercised |
test_hello_uart_elf.cpp |
Cross-compiled hello_uart.elf; asserts 'A' appears on captured UART |
test_regwin_roundtrip.cpp |
SAVE/RESTORE register-window round-trip against a cross-compiled guest |
test_rtems_boot.cpp |
RTEMS 5 hello-world.elf boot (greeting on UART). Gated behind an operator ELF (TERO_RTEMS_HELLO_ELF) — SKIPs if absent |
test_rtems_sptests.cpp |
RTEMS 5 sptests — GR712RC N=1 (JIT + Switch) and GR740 N=1. CSV-emitting harness |
test_rtems_smptests.cpp |
RTEMS 5 smptests — N=2 (GR712RC) and N=4 (GR740), each JIT + Switch, plus N=4 MultiThread |
test_rtems_fptests.cpp |
RTEMS FP test fptest01.elf on GR712RC (must pass) and GR740 (SKIP-if-absent) |
test_rtems_hello_tero.cpp |
mkprom2-built PROM image boots and prints via APBUart through the PROM peripheral |
test_smp_atomics.cpp |
SMP atomic primitives (smp_atomic.S, smp_swap.S, smp_casa.S) on multi-core configs (GR712RC + GR740) |
test_ir_diff_lockstep.cpp |
IR interpreter vs Switch in lockstep on synthetic blocks (translation oracle) |
test_jit_run_lockstep.cpp |
JIT run-loop vs Switch on real RTEMS code (needs the full stack; lives in tero_tests) |
test_smpschededf02_trace.cpp |
Dispatch-frame probe that diagnosed the smpschededf02 stack-overflow root cause |
test_gdb_stub_protocol.cpp |
In-process GDB RSP over TCP (scripted client): codec, late-binding attach, 2nd-client rejection, qSymbol handshake, stop-on-ErrorMode T0b/T0a mapping, RTEMS thread-awareness, [!mayfail] live-guest |
test_gdb_stub_rtems.cpp |
Real sparc-gaisler-rtems5-gdb front-end against the Emulator (conditional — SKIPs if GDB binary missing/broken) |
test_gdb_stub_dual_core_timer.cpp |
GDB stub against the dual-core-timer PROM guest (per-core thread enumeration) |
tests/support/ — Test fixtures¶
| File | Purpose |
|---|---|
dummy_peripheral.cpp/.hpp |
DummyPeripheral — an IPeripheral with MMIO registers and DMA triggers, used by bus/DMA unit tests |
capturing_char_device.hpp |
Thread-safe ICharacterDevice recording all transmitted bytes into a std::string for assertions (the backbone of every UART-output test) |
sparc_encoders.hpp |
Constexpr SPARC V8 instruction encoders (enc_add_imm, enc_bicc, enc_save_imm, enc_jmpl, enc_rett, …) so step-level tests plant valid words without raw hex |
test_bus.hpp |
FakeBus / MemBus (BE RAM helper) / ErrorBus (always returns BusError) for feeding instructions directly to CpuState |
test_config.hpp |
tero::testing::*_test_config() — EmulatorConfig factories that wrap the production recipes but force PacingMode::Turbo and disable PROM, so a test can never accidentally drag itself out under wall-clock pacing |
The RTEMS guest-program suites¶
The RTEMS suites are the pass-rate metric. They live under
tests/guest-programs/rtems/ and are driven by a single shared harness.
The shared CSV harness¶
tests/integration/rtems_csv_harness.hpp (namespace
tero::test_support::rtems_csv) does the heavy lifting for sptests, smptests
and fptests alike:
- Discover every
*.elfunder a directory (discover_elfs()), sorted for stable CSV ordering, filtered byTERO_ONLY_TESTif set. - Run one ELF on a fresh
Emulator(run_one_elf()): a new instance per test (no cross-test state), aCapturingCharDeviceon the UART, the logger forced toErrorlevel (so a misbehaving guest can't bury the harness in[WARN]spam), thenrun_for()is sliced into short simulated chunks (sim_slice) so the loop can break the instant the END marker appears or a wall deadline passes — the common PASS path early-exits well under a second of host time. - Score the captured console with
official_outcome()(see Scoring below). - Write one CSV row per ELF, one CSV file per directory, under
tests/results/(write_csv()). An empty directory yields a single_no_binaries_,SKIPPEDsentinel row so the file is always well-formed — which is why the integration TEST_CASEs always SUCCEED even with no binaries present: the CSV is the ground truth, not the Catch2 assertion.
Budgets are calibrated to the official reference run
(DefaultBudgets in the harness):
| Budget | Value | Rationale |
|---|---|---|
sim_budget |
200 s simulated | Matches the official -tlim 200 s from leon3-sis.ini — a test gets the same simulated-time budget here as on SIS |
sim_slice |
50 ms simulated | Polling granularity for early-exit on the END marker |
wall_budget |
240 s host | A Tero-side practicality: the functional core runs below real time, so a 200 s-simulated CPU-bound run needs more host seconds than SIS. PASS runs early-exit long before this |
The slower Switch-interpreter reference runs use a tighter
SwitchBudgets (60 s sim / 30 s wall) so the suite finishes in minutes;
CPU-heavy tests may slow-timeout under it, which is interpreter speed, not a
regression.
fptest01: the one hard REQUIRE
Every RTEMS harness TEST_CASE merely SUCCEEDs and lets the CSV record the
truth — except the GR712RC fptest01 case in test_rtems_fptests.cpp,
which REQUIREs a PASS. FPU correctness with SoftFloat 3e is a 100 %
exit criterion, so it is a hard gate. The GR740 fptest case is
SKIP-if-absent (header-only CSV) so CI hosts without the GR740 build stay
green.
Building the guest programs (RCC toolchain)¶
The RTEMS ELFs are not committed (except the legacy hello-world.elf and
fptest01.elf). They are produced by three build scripts under
tests/guest-programs/rtems/, which compile the RCC-bundled testsuite sources
against the matching BSP:
| Script | Output dir | Suite |
|---|---|---|
build_sptests.sh [gr712rc\|gr740\|both] |
sptests/bin/, sptests-gr740/bin/ |
sptests |
build_smptests.sh [n2\|n4\|both] |
smptests-leon3-n2/bin/, smptests-gr740-n4/bin/ |
smptests |
build_fptests.sh [gr712rc\|gr740\|both] |
fptest01/, fptest01-gr740/ |
fptest01 |
All three read the RCC tree at
${RCC_PREFIX:-/opt/rcc-1.3.2-gcc}/src/rcc-1.3.2/testsuites/ and the BSP
libraries under ${RCC_PREFIX}/sparc-gaisler-rtems5/<bsp>/lib. RCC 1.3.2
ships RTEMS 5.3 (the aclocal macro says 5.0.0 but the authoritative
CHANGELOG.RCC says 5.3). Tests that fail to compile (C++-only sources, missing
helpers) are silently skipped — the harness simply never sees those binaries.
Why -msoft-float for sp/smp tests
Both build_sptests.sh and build_smptests.sh compile with
-msoft-float. The reason is the "banner-then-die" cluster: newlib's
hard-float vfprintf emits an LDDF even for %d/%s formats, and the
RTEMS init task is created without RTEMS_FLOATING_POINT, so the FP op
traps (TT=0x04) and RTEMS panics with
INTERNAL_ERROR_ILLEGAL_USE_OF_FLOATING_POINT_UNIT. Soft-float lowers FP
to __muldf3-style calls and avoids the trap. The fix is in the build
flags, not in relaxing the emulator's (correct) FP-disabled trap.
Two deliberate exceptions:
spfatal30/spfatal31are built hard-float — they intend to provoke thefp_disabledtrap, so a soft-float lowering would defeat the test (the END marker would never print). For GR712RC that means the plainleon3BSP (hard-float sibling ofleon3_sf); GR740 selects float mode via multilib.- smptests are built with
-DCONFIGURE_MINIMUM_TASK_STACK_SIZE=16384(up from the 4 KiB SPARC default). Diagnosed viatest_smpschededf02_trace.cpp: the EDF worker threads overflow a 4 KiB stack and corrupt the next thread's saved dispatch frame. The 16 KiB bump took N=2 from 41→42 PASS.
fptest01 is built hard-float because it ships as
RTEMS_FLOATING_POINT and genuinely needs the FPU.
The asm guest programs (tests/guest-programs/asm/) are auto-built by CMake
when sparc-gaisler-rtems5-gcc is on PATH
(-mcpu=leon3 -msoft-float -nostdlib -nostartfiles, .text at 0x40000000);
absent toolchain → the dependent integration test SKIPs at runtime. Default
toolchain locations: RCC at /opt/rcc-1.3.2-gcc/bin/, mkprom2 at
/opt/mkprom2/mkprom2 (overridable with -DSPARC_CC= / -DMKPROM2=).
The per-configuration matrix¶
Each guest suite is run under several configurations so a single change is checked against every relevant SoC, core count and execution path. The default (JIT, Realtime overridden to Turbo by the harness) is the production path; the Switch runs are the correctness reference the JIT must reproduce; the MultiThread run is the faithful true-concurrency SMP model (ADR-001).
| TEST_CASE | SoC | Cores | Exec path | Output CSV | Tags |
|---|---|---|---|---|---|
| sptests run on GR712RC N=1 | GR712RC | 1 | JIT | sptests.csv |
[sptests][gr712rc] |
| sptests … under the Switch interpreter | GR712RC | 1 | Switch | sptests-switch.csv |
[sptests][switch] |
| sptests run on GR740 N=1 | GR740 | 1 | JIT | sptests-gr740.csv |
[sptests][gr740] |
| smptests run on GR712RC N=2 | GR712RC | 2 | JIT | smptests-N2.csv |
[smptests][n2] |
| smptests … N=2 under the Switch interpreter | GR712RC | 2 | Switch | smptests-N2-switch.csv |
[smptests][n2][switch] |
| smptests run on GR740 N=4 | GR740 | 4 | JIT | smptests-N4.csv |
[smptests][n4] |
| smptests … N=4 under MultiThread | GR740 | 4 | JIT, thread-per-core | smptests-N4-mt.csv |
[smptests][n4][mt] |
| smptests … N=4 under the Switch interpreter | GR740 | 4 | Switch | smptests-N4-switch.csv |
[smptests][n4][switch] |
| fptest01 on GR712RC | GR712RC | 1 | JIT | fptests.csv |
[fpu][gr712rc] |
| fptest01 on GR740 | GR740 | 1 | JIT | fptests-gr740.csv |
[fpu][gr740] |
The Switch-vs-JIT siblings exist so the two engines can be diffed: any row
that differs between sptests.csv and sptests-switch.csv is a translation
defect, since the Switch path is the oracle. The N=4 -mt sibling exists to
prove the thread-per-core JIT reproduces the SingleThread N=4 pass set under
true parallelism.
The SMP quantum is configuration-specific
test_rtems_smptests.cpp overrides the production 1000-instruction quantum
for SingleThread SMP runs: q=200 for N≤2, q=400 for N=4. The
cooperative round-robin can let a core ticket-spin at PIL=15 for a whole
quantum waiting on a release that only happens in another core's later
quantum — a finer quantum converges those handshakes (matching SIS and
MultiThread). The optimum is non-monotonic and per-configuration; each
value is a no-regression clean win against its own suite. There is no
single quantum that converges everything under cooperative scheduling —
that ceiling is exactly what MultiThread removes. MultiThread is unaffected
(each core already has its own host thread).
Scoring: official RTEMS rules¶
Tero scores a captured console exactly as the upstream
rtems-tools rt/report.py report.end() scores a SIS run. This is the single
most important fidelity decision in the test strategy: a Tero result lands in
the same category the reference assigns to the identical SIS run, so
"Tero passes" means "Tero passes by the official rules", not by a relaxed
home-grown criterion.
The scorer is official_outcome() in rtems_csv_harness.hpp (and its Python
twin official_score() in scripts/oracle_compare.py). The rules:
| Console contents | *** TEST STATE: |
Outcome | Failure? |
|---|---|---|---|
*** BEGIN OF TEST and *** END OF TEST |
any | PASS | — |
| BEGIN, no END, ran out the budget | default / EXPECTED_PASS |
TIMEOUT | Yes |
| BEGIN, no END, halted on its own | default / EXPECTED_PASS |
FAIL | Yes |
| No BEGIN at all | default / EXPECTED_PASS |
INVALID | No — excluded |
| BEGIN, no END | EXPECTED_FAIL |
EXPECTED-FAIL | No — excluded |
| BEGIN, no END | INDETERMINATE |
INDETERMINATE | No — excluded |
Key consequences, mirrored on both sides:
- A test PASSES only when both BEGIN and END banners appear — the end-of-test marker alone is the criterion (Decision 36).
INVALID/EXPECTED-FAIL/INDETERMINATEare not failures and are excluded from the denominator, exactly like upstream. An INVALID run fataled before the BSP debug UART came up (no BEGIN reaches the console) — correct silicon behaviour, not an emulator defect.- The
minimumtest is special-cased to PASS even with no BEGIN (it prints nothing by design) — matchingrtems-tools'TEST_FAIL_EXCLUDES. - The official scorer needs to tell a budget TIMEOUT apart from a
self-halted FAIL; the harness tracks
broke_earlyto distinguish them.
Because both the C++ harness and the SIS-comparison script implement the same
state machine, the emitted markers must match the official ones, including
the BEGIN banner — see
fptest01 emits BEGIN marker and the
matching guest commit (656cdba) that added the BEGIN marker for conformance.
The SIS oracle¶
The Gaisler/RTEMS SIS (sparc-rtems5-sis, vendored at ../rtems-sis) is
the authoritative reference for the SPARC ISA, IRQMP/GPTIMER/APBUART/GRETH and
the GR712/GR740 configs. The RTEMS testsuite itself is validated on SIS — so
the question "is Tero correct?" reduces to "does Tero agree with SIS?"
Two scripts answer that question at two granularities.
flowchart LR
ELF["RTEMS test ELF"]
ELF --> S1["SIS<br/>(reference)"]
ELF --> L1["Tero<br/>(under test)"]
subgraph oc["oracle_compare.py — per-test agreement"]
S1 --> SC["official_score(console)"]
L1 --> LC["official_score(console)"]
SC --> CMP{"same<br/>outcome?"}
LC --> CMP
CMP -->|yes| OK["AGREE — Tero validated as SIS validates"]
CMP -->|no| DIV["DIVERGE — a concrete, reviewable difference"]
end
subgraph ls["lockstep_compare.py — per-instruction divergence"]
S2["SIS tra (per-core PC stream)"]
L2["Tero --trace (per-core PC stream)"]
S2 --> RS["resync_diff() per core<br/>(tolerate benign spin-loop counts)"]
L2 --> RS
RS --> FIRST["FIRST non-resyncing PC divergence = the bug"]
end
scripts/oracle_compare.py — per-test agreement¶
Runs each test ELF on both SIS and Tero with matched configs and the same 200 s simulated-time limit, applies the identical official scoring to both consoles, and reports per-test agreement plus a divergence list and a by-category breakdown. Agreement = Tero is validated exactly as SIS validates; a divergence is a concrete, reviewable difference — not a harness artefact.
# Compare the GR712RC sptests against SIS (defaults: SIS at ../rtems-sis/sis,
# tero-emu at build/src/app/tero-emu)
scripts/oracle_compare.py sptests-gr712rc
# Only a couple of stems
scripts/oracle_compare.py smptests-n4 smp08 smpmulticast01
Profiles encode the matched SIS args (straight from leon3-sis.ini / a -gr740
equivalent) and the corresponding Tero args, including the SingleThread SMP
quantum:
| Profile | ELF dir | SIS args | Tero args |
|---|---|---|---|
sptests-gr712rc |
sptests/bin |
-leon3 -m 2 |
--soc gr712rc --cores 2 --quantum 50 |
smptests-n2 |
smptests-leon3-n2/bin |
-leon3 -m 2 |
--soc gr712rc --cores 2 --quantum 50 |
sptests-gr740 |
sptests-gr740/bin |
-gr740 |
--soc gr740 --quantum 400 |
smptests-n4 |
smptests-gr740-n4/bin |
-gr740 |
--soc gr740 --quantum 400 |
smptests-n2-mt |
smptests-leon3-n2/bin |
-leon3 -m 2 |
--soc gr712rc --cores 2 --mt |
smptests-n4-mt |
smptests-gr740-n4/bin |
-gr740 |
--soc gr740 --mt |
The -mt profiles exist because SIS already models true multi-core
concurrency, so the faithful Tero counterpart is --mt (thread-per-core),
not the cooperative round-robin — several tests that diverge in the
SingleThread profiles converge under --mt.
scripts/lockstep_compare.py — per-instruction divergence¶
When a test diverges, this comparator finds where. It runs the same GR740 ELF
on Tero (--trace, which installs a TraceObserver that emits cpu <N> <pc>
per instruction to stderr and forces the deterministic Switch path) and on SIS
(tra), parses each into a per-core PC sequence, and reports the first
per-core divergence — the first instruction where a core's control flow on
Tero differs from the oracle.
The comparison is per core (cross-core interleaving differs between
Tero's round-robin and SIS's 50-clock fine interleave). resync_diff()
tolerates benign spin-loop count differences — one side iterating a ticket
lock a different number of turns because cross-core release timing differs —
by skipping the extra iterations on whichever side looped more, confirmed by a
window of consecutive matches. A spin loop resyncs within a few iterations; a
real bug never resyncs, and that first non-resyncing PC divergence is the
defect to investigate.
SIS MP-trace gotchas (from the lockstep work)
SIS only engages the MP interleaver when ncpu>1 and an instruction
count is given (-m 4 + a clock budget), and the MP boot state is only set
up by boot_init(), which tra skips but go runs. The script uses
go 0 0 to boot without retiring an instruction, then tra <clocks>, so
SIS's first traced PC aligns with Tero's instruction #0. The <clocks>
argument is SIS simulated clocks, not instructions — the script sizes it
from Tero's longest per-core stream (×6 margin).
The lockstep work has already produced concrete findings — e.g. the
leon3_counter_initialize %asr23 timecounter probe picks GPTIMER on Tero
vs %asr23 on SIS (a universal, benign difference, documented in
known failures).
Recording results¶
Per-ELF outcomes are written to tests/results/*.csv by the harness on every
run (always, even with no binaries). scripts/aggregate_rtems5.sh rolls them
into rtems5-aggregate.csv, and the MkDocs evidence hook
(scripts/mkdocs_evidence_hook.py) injects both into
Test results at build time. The full
how-it's-produced / how-to-regenerate story is on that page.
How to add a test¶
Add a unit test for a handler / module¶
- Create
tests/unit/test_<thing>.cppand add it to thetero_testssource list intests/CMakeLists.txt. - Write ≥ 3
TEST_CASEs (normal, edge, flags/trap) with behaviour-describing names and the right tags (e.g.[unit][handlers]). - For ISA-level tests, feed instructions to
CpuStatethrough a fake bus (test_bus.hpp) and plant words withsparc_encoders.hpp— no magic hex. - For UART-output tests, inject a
CapturingCharDeviceand assert oncaptured(). cmake --build build && ctest --test-dir build -R <thing>.
Add an integration test¶
- Create
tests/integration/test_<thing>.cpp, add it totero_tests. - Build a full
Emulatorfrom atero::testing::*_test_config()factory (Turbo pacing, PROM off). - If it needs a cross-compiled payload, add the source under
tests/guest-programs/and wire atero_test_path_define(...)so the path is a compile definition (empty when the toolchain is absent → the test SKIPs cleanly). Follow the conditional-skip pattern.
Add an RTEMS guest test¶
Most RTEMS coverage grows by provisioning more ELFs, not new C++:
- The build scripts already iterate every
sp*/smp*directory in the RCC tree — installing more of the RCC testsuite and re-runningbuild_sptests.sh/build_smptests.shmakes the harness pick them up automatically (the directory scan is dynamic). - To add a new SoC/core/exec-path configuration, add a
TEST_CASEtotest_rtems_sptests.cpp/test_rtems_smptests.cppthat callsrun_sptest_directory/run_smptest_directorywith the right config factory and a new CSV name, then register that CSV inscripts/mkdocs_evidence_hook.pySUITESandscripts/aggregate_rtems5.shif it should appear in the published tables. - For a brand-new hand-written guest (asm or RTEMS C), add a directory under
tests/guest-programs/with a README and a CMake rule following the existing patterns (asm/CMakeLists.txtortero_add_rtems_app()inrtems/CMakeLists.txt).
Conditional skips¶
A handful of tests SKIP when an external artifact is absent — this keeps the build hermetic and CI green when toolchains aren't installed. They indicate an environment mismatch, not a library bug:
test_rtems_boot— RTEMS hello ELF not present (TERO_RTEMS_HELLO_ELFunset and no committed default).GdbStub interoperates with sparc-gaisler-rtems5-gdb— GDB binary missing or broken.- Cross-compiled asm guests (
hello_uart,smp_*) — RCC toolchain absent. - The
[!mayfail]live-guest GDB test —nmmissing or the hello ELF's RTEMS layout drifted from the constants inrtems_layout::*(see Debugging with GDB). - RTEMS guest suites with no provisioned ELFs — the harness writes a header-only CSV and the TEST_CASE SUCCEEDs.
Tag taxonomy¶
Catch2 tags split the suite into focused subsets:
./build/tests/tero_tests "[gdb_stub]"
./build/tests/tero_tests "[emulator],[rtems][boot]"
./build/tests/tero_tests "[rtems][smptests][n4][mt]"
| Tag | What it selects |
|---|---|
[unit] / [integration] |
Coarse split between fast no-IO tests and full-emulator tests |
[emulator] |
Emulator public API, lifecycle, pacing, SMP wake-up |
[rtems] |
Any RTEMS test (boot, sptests, fptests, smptests, hello-tero PROM) |
[rtems][boot] |
RTEMS boot test only |
[rtems][sptests] / [smptests] / [fpu] |
The guest-program suites |
[switch] / [mt] / [n2] / [n4] / [gr712rc] / [gr740] |
Per-config selectors on the RTEMS suites |
[gdb_stub] (+ [codec], [protocol], [late-binding], [qsymbol], [error-mode], [rtems-aware]) |
GDB stub coverage |
[mmio] |
MMIO word-path verifications (IRQMP, GPTimer, APBUart, MemCtrl) |
[!mayfail] |
Tests that depend on toolchain/environment alignment. A failure means refresh the constants, not "library bug" |
See also¶
- Test results — how results are produced, recorded and regenerated; the live per-suite tables.
- RTEMS known failures — root-cause analysis of every non-PASS, classified against the SIS oracle.
- Execution model — Switch vs IR vs JIT paths the matrix exercises.
- Multicore & timing — round-robin quantum, idle-skip, and the SMP scheduling the smptests stress.
- Decisions — Decision 36 (end-of-test scoring) and the ADRs the matrix reflects.