Traps and interrupts¶
This page is the reference for how Tero models the SPARC V8 trap mechanism:
trap types and tt encoding, the trap base register (TBR), the
enter_trap / leave_trap (RETT) entry-exit sequence, the PSR
ET/S/PS/CWP pipeline, register windows (CWP/WIM, SAVE/RESTORE
over/underflow), WRPSR semantics, and how external interrupts from the
IRQ(A)MP are sampled and delivered. Every claim below is anchored to the
code that implements it.
It belongs to the Developer Manual. For the broader fetch-decode-execute loop see Execution model; for the scheduler and multi-core delivery see Multicore and timing; for the interrupt controllers themselves see IRQMP.
SPARC V8 trap model¶
A trap on SPARC V8 is a control transfer caused by an exceptional condition. Tero classifies traps the way the architecture manual (§7.1) does:
- Synchronous (precise) traps — caused by the instruction the core is
executing: illegal opcode, privileged instruction in user mode, FP
disabled, window overflow/underflow, misaligned access, divide-by-zero,
tag overflow,
Ticcsoftware traps. These are produced insidecore::stepfrom theExecStatusan instruction handler returns. - Asynchronous (interrupt) traps — external interrupt requests routed
through the IRQ(A)MP. These are sampled between instructions by the run
loop (
Emulator::sample_interrupts), never from inside a handler.
Both kinds funnel through the single architectural entry point
CpuState::enter_trap (src/core/src/cpu_state.cpp:107), which performs the
SPARC V8 §7.3 entry sequence. There is exactly one place that builds the trap
frame, so synchronous and asynchronous traps are byte-identical at the
window/PSR level.
Scope of the model
Tero models the integer-unit trap set the RTEMS leon3 testsuite
exercises plus the FP traps. The MMU-related traps
(data_access_MMU_miss, instruction_access_MMU_miss, etc.) are not
generated — the SRMMU is deferred (see the
roadmap). Watchpoint and r_register_access
traps are likewise out of scope. If a guest needs one of these, it does
not exist yet — do not assume it does.
Trap types (tt) and the dispatch mapping¶
The architectural tt constants live in
src/core/include/tero/core/trap.hpp as enum class TrapType. Only the
subset Tero can actually raise is enumerated:
TrapType |
tt |
SPARC V8 trap | Raised by |
|---|---|---|---|
InstructionAccessException |
0x01 |
instruction_access_exception | bus error on I-fetch (step.cpp:55) |
IllegalInstruction |
0x02 |
illegal_instruction | unknown/UNIMP encoding, bad WRPSR.CWP, RETT with ET==1 |
PrivilegedInstruction |
0x03 |
privileged_instruction | privileged op while S==0 |
FpDisabled |
0x04 |
fp_disabled | FPop while PSR.EF==0 |
WindowOverflow |
0x05 |
window_overflow | SAVE into a WIM-invalid window |
WindowUnderflow |
0x06 |
window_underflow | RESTORE/RETT into a WIM-invalid window |
MemAddressNotAligned |
0x07 |
mem_address_not_aligned | misaligned load/store/PC/RETT target |
FpException |
0x08 |
fp_exception | SoftFloat IEEE exception or invalid FP register |
DataAccessException |
0x09 |
data_access_exception | bus error on a load/store |
TagOverflow |
0x0A |
tag_overflow | TADDccTV/TSUBccTV overflow |
DivisionByZero |
0x2A |
division_by_zero | UDIV/SDIV with zero divisor |
InterruptLevelBase + L |
0x10 + L |
interrupt_level_L | IRQ at level L ∈ 1..15 |
SoftwareTrapBase + n |
0x80 + n |
trap_instruction | Ticc with n = (rs1 + op2) & 0x7F |
There are two ways a handler signals a trap, and core::step
(src/core/src/step.cpp:87-93) reconciles them after execute() returns:
- By status. Most handlers return an
ExecStatus(e.g.WinOverflow,PrivInsn).detail::status_to_tt(src/core/src/handlers_internal.hpp:245) is the single mapping fromExecStatustott. It returnsstd::nulloptfor non-trap statuses (Ok,Unsupported,ErrorMode,TrapInsn). - By explicit request. Handlers that must choose the
ttbyte themselves —Ticc(tt = 0x80 + imm7) and the FP handlers (tt = 0x08with a stampedFSR.ftt) — callCpuState::raise_trap(tt)(src/core/include/tero/core/cpu_state.hpp:614), which latchespending_trap/pending_tt. The step loop checkshas_pending_trap()first and uses thatttverbatim.
// src/core/src/step.cpp:87 — reconcile the two trap-signalling paths
std::optional<std::uint8_t> tt;
if (state.has_pending_trap()) {
tt = state.pending_tt(); // Ticc / FP: handler chose tt
state.clear_pending_trap();
} else {
tt = detail::status_to_tt(status); // status-derived tt
}
Software-trap composition
Ticc does not have a fixed tt. exec_ticc
(src/core/src/handlers_special.cpp:122) evaluates the condition, computes
tt_num = (rs1 + op2) & 0x7F, and calls
raise_trap(SoftwareTrapBase | tt_num). RTEMS uses these for system calls
and for the window-flush/ta 0 paths.
The trap base register (TBR)¶
TBR is laid out per SPARC V8 §7.4 (tbr:: constants in
src/core/include/tero/core/cpu_state.hpp:201):
31 12 11 4 3 0
+------------------------+-----------+--------+
| TBA | tt | 0000 |
+------------------------+-----------+--------+
trap base (writable) trap type always 0
- TBA [31:12] — the trap-table base, written by software with
WRTBR.CpuState::write_tba(:427) writes only this field; it preservesttand the zero field, whichWRTBRmust not disturb. - tt [11:4] — written only by trap dispatch (
enter_trapstep 1) and is read-only to software. - [3:0] — hardwired zero (each trap vector is 16 bytes = 4 instructions).
The full vector address is TBR = TBA | (tt << 4), giving each trap type a
4-instruction slot in the trap table.
enter_trap — the entry sequence¶
CpuState::enter_trap(saved_pc, saved_npc, tt)
(src/core/src/cpu_state.cpp:87) is the canonical SPARC V8 §7.3 trap-entry
sequence. Order matters and is commented step-by-step in the source:
TBR.tt ← tt— install the trap type, preserving TBA and the zero field (:91).CWP ← (CWP − 1) mod NWINDOWS(:97,NWINDOWS = 8). This must happen before writing the locals sor[17]/r[18]land in the handler's window, not the interrupted context's.- Save the return pair: new-window
%l1 (r17) ← saved_pc,%l2 (r18) ← saved_npc(:101,RegL1/RegL2). The handler reads these back and feeds them toRETT/JMPL. PS ← S,S ← 1,ET ← 0(:104-108). CopyingSintoPSbefore forcingS=1is the architectural order;ET=0disables further trap delivery so a nested fault becomes a hard error (see Error mode).- Jump to the vector:
PC ← entry,nPC ← entry + 4(:116). The vector isTBRnormally, butTBAalone when single-vector trapping is on — see Single-vector trapping. - Clear the pending annul flag (
exec_.annul_next = false,:122). SPARC V8 §5.1.2.2: the delay-slot annul mechanism is per-CTI and does not survive a trap, so the first handler instruction is never silently dropped. - Drop the pending-trap marker and wake the core (
:126-127) so the step loop does not re-dispatch and a powered-down core resumes.
enter_trap does not check ET
enter_trap assumes the caller has already verified ET == 1. The
precondition is enforced by core::step (step.cpp:96) and
Emulator::sample_interrupts (emulator.cpp:1324) — they enter error
mode or skip delivery when ET == 0. Leaving the check out of
enter_trap lets unit tests exercise the entry path in isolation
(src/core/include/tero/core/cpu_state.hpp:633).
Single-vector trapping (SVT)¶
LEON3/LEON4 support single-vector trapping via ASR17.SVT (bit 13,
Asr17SvtBit in cpu_state.hpp:470). When svt_enabled() is true
(:497), enter_trap enters at TBA (the tt offset is dropped) so a
single handler dispatches every trap:
// src/core/src/cpu_state.cpp:138 — vector selection
const std::uint32_t tbr_v = tbr();
const std::uint32_t entry = svt_enabled() ? (tbr_v & tbr::TbaMask) : tbr_v;
set_pc(entry);
set_npc(entry + 4U);
This matches the Gaisler SIS oracle and the LEON3 manual. tt is still
written to TBR so the single handler can read it back and branch.
RETT and leave_trap — the exit sequence¶
RETT is the only way out of a trap. The handler is
exec_rett (src/core/src/handlers_special.cpp:140); the state mutation
lives in CpuState::leave_trap (src/core/src/cpu_state.cpp:154).
Pre-checks (order matters)¶
exec_rett validates conditions in this exact order before touching state:
// src/core/src/handlers_special.cpp:145
if (s.et()) return ExecStatus::IllegalInsn; // (1) ET==1 → illegal
if (!s.s()) return ExecStatus::PrivInsn; // (2) S==0 → privileged
const std::uint32_t new_cwp = (s.cwp() + 1U) % NumWindows;
if (s.window_invalid(new_cwp)) return ExecStatus::WinUnderflow; // (3) WIM
const std::uint32_t target = s.read_r(insn.rs1) + alu_op2(s, insn);
if ((target & 0x3U) != 0U) return ExecStatus::AlignError; // (4) align
s.leave_trap(target);
Deliberate SIS-matching deviation from the manual
The SPARC V8 manual orders the checks as privileged-first when ET==1 &&
S==0. The Gaisler SIS reference simulator checks ET==1 → illegal
first, regardless of S. Tero follows SIS bit-for-bit
(handlers_special.cpp:141-144). The only state where the two differ —
RETT with traps already enabled and in user mode — is unreachable for a
real guest, so matching the oracle costs nothing and keeps the lockstep
comparator green.
leave_trap state mutation¶
leave_trap(target) (cpu_state.cpp:130) performs the SPARC V8 §B.26 exit:
CWP ← (CWP + 1) mod NWINDOWS— the inverse ofenter_trap's decrement.S ← PS(re-arm the prior supervisor bit;PSitself is preserved so a nested trap below thisRETTstill has the right snapshot) andET ← 1(re-enable traps).request_branch(target)— the step loop appliestargetas the newnPC(PC ← old nPC), giving the architecturalPC ← nPC; nPC ← targetupdate through the same branch mechanismCALL/Bicc/JMPLuse.
Trap-entry / RETT control flow¶
flowchart TD
A["core::step: execute() returns ExecStatus"] --> B{has_pending_trap?}
B -- yes --> C["tt = pending_tt (Ticc / FP)"]
B -- no --> D["tt = status_to_tt(status)"]
C --> E{tt has value?}
D --> E
E -- no --> N["advance PC/nPC<br/>(normal delay-slot rule)"]
E -- yes --> F{PSR.ET == 1?}
F -- "no" --> G["set_error_mode(true)<br/>halt, preserve PC/nPC"]
F -- "yes" --> H["enter_trap(pc, npc, tt)"]
H --> H1["TBR.tt = tt"]
H1 --> H2["CWP = CWP-1 mod 8"]
H2 --> H3["l1=pc, l2=npc"]
H3 --> H4["PS=S; S=1; ET=0"]
H4 --> H5["PC = SVT ? TBA : TBR<br/>nPC = entry+4"]
H5 --> K["handler runs (ET=0)"]
K --> L["RETT %l1/%l2-derived target"]
L --> M{"ET==1? S==0?<br/>WIM(new_cwp)? aligned?"}
M -- "any fails" --> P["raise illegal / priv /<br/>window_underflow / mem_not_aligned"]
M -- "all pass" --> Q["leave_trap(target)"]
Q --> Q1["CWP = CWP+1 mod 8"]
Q1 --> Q2["S=PS; ET=1"]
Q2 --> Q3["request_branch(target)"]
Q3 --> R["resume interrupted code"]
WRPSR is applied immediately¶
SPARC V8 §5.1.2.3 permits WRPSR's effect on S, ET, PS, and CWP to
be deferred up to three instructions (the ICC and PIL fields are always
immediate). That deferral is implementation latitude, and Tero does not
take it — it applies every writable PSR field at once, matching the Gaisler
SIS oracle. Trap entry and exit set the PSR the same direct way; there is
no pending-write buffer anywhere.
// src/core/src/cpu_state.cpp:69 — write_psr_writable (immediate)
void CpuState::write_psr_writable(std::uint32_t value) noexcept {
const std::uint32_t new_val = (psr() & psr::ReadOnlyMask)
| (value & psr::WritableMask);
set_psr(new_val);
}
Why WRPSR does not model the 3-instruction delay
Real SPARC software (RTEMS and every trap handler) pads WRPSR with three
NOPs, so the observable result is identical whether the write lands
immediately or after three instructions. The Gaisler SIS oracle
(sparc.c WRPSR) applies it immediately, and Tero must match the oracle:
an earlier pending_psr_ / commit_psr_pipeline() delay model diverged
from SIS whenever a trap was taken inside the three-instruction window —
trap entry dropped the still-pending CWP change, desyncing the
register-window state on trap-dense SMP paths (e.g. smpschededf03's
ISR-exit window reload read a stale frame). The verbatim rationale is in
the source comment at cpu_state.cpp:69-81.
write_psr_writable only touches the writable fields. psr::ReadOnlyMask
(cpu_state.hpp:128) protects IMPL, VER, the reserved field, and EC
(no coprocessor on LEON). EF is writable — Tero has an FPU and software
toggles PSR.EF to enable/disable it. The WRPSR handler
(exec_write_special, handlers_special.cpp:73) also rejects a CWP field
≥ NWINDOWS with illegal_instruction and requires S==1.
Register windows¶
Tero models 8 windows (NumWindows = 8, cpu_state.hpp:31). The register
file is one host-order blob (int_state_, an ir::GuestState) with the
windowed slots laid out per layout:: so the reference interpreter and the
IR/JIT share one representation (see Layers and modules).
CWP, the window file, and overlap¶
reg_offset(cwp, r) / window_slot(cwp, r) (cpu_state.hpp:63-80) map an
architectural register number to a byte offset:
r0..r7(globals) → the globals region, window-independent.r8..r23(outs + locals) → the current window.r24..r31(ins) → alias the outs of window(cwp + 1) mod 8.SAVEdecrementsCWP, so the caller sits atCWP+1; its outs are the callee's ins. This overlap is what makesSAVE/RESTOREcheap and is why window overflow/underflow detection is needed.
WIM and window-invalid detection¶
WIM (window invalid mask) has one bit per window.
CpuState::window_invalid(cwp) (cpu_state.hpp:414) is
(wim() & (1 << (cwp % 8))) != 0. set_wim (:405) masks the value to the
valid NWINDOWS bits. Software marks one window invalid to act as the
overflow/underflow tripwire.
SAVE / RESTORE over/underflow¶
SAVE (exec_save, handlers_regwin.cpp:11) and RESTORE
(exec_restore, :26) both compute the destination window, check WIM
before committing, and only then move CWP and write the result:
// src/core/src/handlers_regwin.cpp:11 — SAVE
const std::uint32_t new_cwp = (s.cwp() + NumWindows - 1U) % NumWindows;
if (s.window_invalid(new_cwp)) return ExecStatus::WinOverflow; // trap, no commit
const std::uint32_t sum = s.read_r(insn.rs1) + alu_op2(s, insn);
s.set_cwp(new_cwp);
s.write_r(insn.rd, sum);
RESTORE is the mirror image (+1 window, WinUnderflow). Because the WIM
check precedes the CWP move, a faulting SAVE/RESTORE leaves CWP
untouched — the trap handler sees the pre-instruction window, spills/refills,
and re-executes the instruction (the saved nPC points back at it).
End-to-end window roundtrip test
tests/integration/test_regwin_roundtrip.cpp loads a bare-metal SPARC
program (tests/guest-programs/asm/regwin-roundtrip/regwin_roundtrip.S)
that chains 7 SAVEs through one overflow trap and 7 RESTOREs through
one underflow trap, verifying every window's %l0 marker survives the
trap → handler-spill → RETT → retry loop. The asm uses RTEMS-style WIM
rotation (right for overflow, left for underflow), adapted from
bsps/sparc/leon3/start/win_ovflow.S.
Hardware interrupt sampling and delivery¶
External interrupts are not raised inside instruction handlers. The run
loop samples them between instructions through
Emulator::sample_interrupts(core_idx)
(src/runtime/src/emulator.cpp:1285). The flow is query → wake → gate →
ack → enter_trap:
- Query the controller:
pending_mask(core_idx)(emulator.cpp:1292) returns the per-core bitmap of asserted lines from the IRQ(A)MP. Empty mask → return. - Find the level: scan bits
MaxIrqLevel (15)down to1; the highest set bit is the candidate interrupt level (:1302-1308). - Wake a powered-down core (
:1320). A pending interrupt wakes the core regardless ofPSR.ET— RTEMS SMP boot relies on this, parking the secondary CPU withET=0and waking it with an IPI. - Gate by
ET(:1324): ifET==0, leave the IRQ pending and return — the core resumes at its current PC. - Gate by
PIL(:1330): SPARC V8 §7.1 — the interrupt is taken only whenlevel > PSR.PIL. Otherwise it stays pending in the controller. - Acknowledge:
acknowledge(core, decision.ack_mask)auto-clears the pending bit (or the force bit if forced) per GR712RC §8. The engine passes the controller bits through opaquely — theack_maskis formed by the architecture inevaluate_interrupt, where the GRLIBlevel == bitidentity (1u << level) now lives, so the engine never reconstructs it. See the force-precedence rule (Decision 39) in IRQMP. - Enter the trap (
:1347):enter_trap(pc, npc, 0x10 + level).
A lower-level interrupt stays pending until either the PIL drops or the
handler clears the source (writing ICR/the force register).
Extended interrupts (EIRQ, lines 16..31)¶
The SPARC tt field only encodes 16 levels, so IRQs 16..31 are redirected.
IrqMP::pending_mask (src/peripherals/src/irqmp.cpp:154) folds any
masked extended-IRQ bit up to the MPSTAT.EIRQ redirection level so
sample_interrupts (which scans only 1..15) sees it:
// src/peripherals/src/irqmp.cpp:161 — fold extended IRQs to the EIRQ level
const std::uint32_t eirq_level = (aload(mpstat_) >> MpstatEirqShift) & MpstatFieldMask;
if (eirq_level != 0 && (mask & ExtendedIrqMask) != 0U) {
mask |= (1U << eirq_level);
}
On acknowledge of the EIRQ level, the controller pops the actual extended
index into EID[cpu] so the handler can read which line fired
(irqmp.cpp:98). The GR740 sibling IrqAMP
(src/peripherals/src/irqamp.cpp:136) implements the same redirection — they
are distinct GRLIB IP cores; do not bolt GR740 features onto IrqMP.
Self-directed IPIs (instruction-boundary latency)¶
A core can interrupt itself (RTEMS uses this for some scheduler paths). A
self-IPI is just the core writing its own IFORCE/force bit, which
pending_mask then reports. Round-boundary sampling would delay it a full
quantum; instead Emulator::poll_self_interrupt(core_idx)
(emulator.cpp:1371) re-samples at each instruction boundary on both the
Switch path (emulator.cpp:648) and the JIT path (emulator.cpp:1007):
// src/runtime/src/emulator.cpp:1371
void Emulator::poll_self_interrupt(std::size_t core_idx) {
auto& state = cores_[core_idx];
// enter_trap saves PC/nPC as the return pair, well-defined only at an
// instruction boundary; in a delay slot (npc != pc+4) the IRQ waits one
// more instruction, exactly as on hardware.
if (state.npc() != state.pc() + 4U) return;
sample_interrupts(core_idx);
}
The delay-slot guard is load-bearing: taking a trap mid-delay-slot would save
a return pair that does not reconstruct the branch. This fixed
smpmulticast01 across all six SMP configs (see the campaign notes referenced
from MEMORY.md).
Error mode (ET=0)¶
If a trap condition arises while PSR.ET == 0, SPARC V8 §7.3 says the
processor enters error_mode and halts until reset. core::step detects
this before calling enter_trap:
// src/core/src/step.cpp:95
if (tt.has_value()) {
if (!state.et()) {
state.set_error_mode(true); // preserve PC/nPC for post-mortem
return {ExecStatus::ErrorMode};
}
state.enter_trap(pc, npc, *tt);
return {status};
}
error_mode() is a one-way latch (cpu_state.hpp:663). On the next step
the loop short-circuits at step.cpp:15 and returns ErrorMode without
fetching, so the caller sees a stable status. The run loop scans every core
for error mode each round and returns HaltReason::HaltedMode (or, if a GDB
client is attached, redirects through the stub with the appropriate stop
signal — SIGSEGV for an unclassified access fault — at the offending PC).
How the stop signal is chosen (architecture-neutral)
The GDB transport never reads the SPARC tt. The halted CPU entity reports
a coarse, ISA-neutral GdbFaultClass
(src/interfaces/include/tero/igdb_registers.hpp) through the
IGdbRegisters::gdb_fault_class() capability; the stub turns that into an
RSP stop signal via stop_signal_from_fault_class
(src/runtime/src/gdb_stub_transport.cpp). The SPARC tt → class decode
(gdb_fault_class_from_tt) lives in the SPARC CPU entity
(src/runtime/src/cpu.cpp), so a non-SPARC core supplies its own mapping
and the transport stays architecture-agnostic.
The CLI prints a full core-0 post-mortem (PC/nPC/TBR/tt/PSR/WIM and the
register file) when this happens. Library users can call emu->core(0) and
inspect any field directly.
Special cases and policies¶
Tagged-arithmetic trap commits its result¶
TADDccTV/TSUBccTV (SPARC V8 §B.30) compute the result and update icc
first, then trap if V is set (Decision 10). The handler writes rd and
icc before returning ExecStatus::TagOverflow. The manual leaves the result
"unpredictable" when the trap fires; deterministically committing it makes
test expectations reproducible.
FP-disabled vs FP-exception¶
op=10, op3 ∈ {0x34, 0x35}(FPop1/FPop2) decode asInsnKind::FpOp. WhenPSR.EF == 0the dispatch raisesFpDisabled(tt=0x04); when enabled, the SoftFloat-backed handler runs.- A SoftFloat IEEE exception or invalid FP-register access raises
fp_exception(tt=0x08) viaraise_fp_exception(src/core/src/fpu_handlers.cpp:30), which stampsFSR.fttbefore callingraise_trap. op=10, op3 ∈ {0x36, 0x37}(coprocessor) decode asInsnKind::Unknownand trap withIllegalInstruction— there is no coprocessor on LEON (PSR.ECstays 0).
Misaligned PC¶
A misaligned PC synthesizes mem_address_not_aligned against the fetch
(step.cpp:37) and flows through the same trap path as any other synchronous
trap — there is no separate fetch-alignment trap type.
How to add or debug a trap path¶
Adding a new synchronous trap:
- Add the
tttoTrapTypeinsrc/core/include/tero/core/trap.hppif it is not already there. - Either return a new
ExecStatusfrom the handler and extenddetail::status_to_tt(handlers_internal.hpp:245), or callraise_trap(tt)directly if the handler must choose thettbyte. - Per project rule, write at least three tests (normal, edge, trap) under
tests/unit/—tests/unit/test_traps.cppis the home for trap-entry tests.
Adding a new interrupt source: drive the IRQ(A)MP through the
IInterruptSource bridge handed to the peripheral (PeripheralContext::irqs).
Sampling, gating, and tt = 0x10 + level delivery are already handled by
sample_interrupts — you do not touch enter_trap.
Debugging: the --trace observer (IEmulatorObserver) fires
on_trap_taken(core, level, tt, pc) from sample_interrupts
(emulator.cpp:1342) for interrupt traps. For synchronous traps, run under
the lockstep comparator (scripts/lockstep_compare.py) against SIS — a
per-core PC divergence at a trap boundary localises a wrong tt, a wrong
window move, or a PSR-pipeline mismatch. The error-mode post-mortem dumps the
faulting frame directly.
When in doubt, read the manual
Trap semantics, register-window corner cases, and RETT ordering are
exactly the areas where guessing creates multi-day bugs. The authoritative
sources are the SPARC V8 Architecture Manual (§4.2 PSR, §7 traps, §B.26
RETT) and the SIS oracle. If a behaviour is not in the code or the
manuals, stop and ask — do not invent it.