tero_core¶
The SPARC V8 integer + floating-point unit. It owns the architectural
state of one core (CpuState), the pure decoder, the per-category
instruction handlers, the single-cycle step() driver, and the
SoftFloat-backed FPU. It is the correctness oracle — the naive,
verify-by-inspection reference path against which the IR/JIT is validated.
# src/core/CMakeLists.txt
target_link_libraries(tero_core
PUBLIC tero::interfaces tero::ir
PRIVATE tero::warnings tero::softfloat3e)
Responsibility
Execute one SPARC V8 instruction correctly against CpuState,
reaching the world only through the injected ICpuBus. No bus, no
peripheral, no I/O — not even logging.
Why core PUBLIC-links tero_ir
tero_core depends on tero::ir — surprising, since the core is
"below" the translation stack. The reason is state unification:
CpuState's integer register file is an ir::GuestState byte blob
(cpu_state.hpp:690). The Switch interpreter and the IR engine read
and write the same bytes, so there is no per-fallback sync. The core
knows the blob layout, not the IR ops or LLVM — it never links
tero_arch_sparc or tero_jit.
Source layout¶
src/core/
├── include/tero/core/
│ ├── cpu_state.hpp ← per-core architectural state (the blob)
│ ├── execution_flags.hpp ← step-loop scratchpad (branch/annul/trap/PSR pipe)
│ ├── decoder.hpp ← pure decode(uint32_t) → DecodedInsn
│ ├── decoded_insn.hpp ← InsnKind + DecodedInsn POD + sub-op enums
│ ├── handlers.hpp ← execute() dispatcher + ExecStatus
│ ├── step.hpp ← step() driver + StepResult
│ ├── trap.hpp ← TrapType (tt) constants + tt_of()
│ ├── fpu.hpp ← FpOp + FP decode helpers
│ ├── fpu_handlers.hpp ← FPop1/FPop2/FP load-store entry points
│ └── softfloat_context.hpp ← Berkeley SoftFloat 3e wrapper
└── src/
├── cpu_state.cpp decoder.cpp decoder_fpu.cpp
├── handlers.cpp ← execute() dispatch
├── handlers_alu.cpp handlers_branch.cpp handlers_loadstore.cpp
├── handlers_regwin.cpp handlers_special.cpp
├── handlers_internal.hpp ← shared bodies (exec_alu_op<Op>, status_to_tt, flags_*)
├── fpu_handlers.cpp softfloat_context.cpp
└── step.cpp
CpuState — the architectural state¶
CpuState (cpu_state.hpp:250) holds everything one SPARC V8 core
remembers between instructions. Its integer state lives in a single
host-order ir::GuestState blob laid out by namespace layout:
| Region | Offset | Size | Contents |
|---|---|---|---|
| Globals | GlobalsBase = 0 |
8 × 4 B | %g0..%g7 (%g0 hardwired zero) |
| Windowed | WindowedBase = 32 |
NumWindows*16 slots = 128 × 4 B |
for window w: 8 outs (0..7) + 8 locals (8..15); ins alias the outs of (w+1) mod 8 |
| Special | SpecialBase = 544 |
7 × 4 B | Y, PSR, WIM, TBR, PC, nPC, ASR17 |
StateSize = 572 bytes (cpu_state.hpp:55). The register-window overlap
(ins ≡ next window's outs) is what gives SAVE/RESTORE their zero-copy
parameter passing — window_slot() / reg_offset() (cpu_state.hpp:63,74)
encode it. NumWindows = 8 (fixed for LEON3/LEON4; SPARC V8 allows 2..32).
State unification: one representation, no sync
guest_state() (cpu_state.hpp:302) returns the same blob the
integer accessors read/write. The SPARC frontend emits IR
LdState/StState against the same layout:: offsets, so the
runtime hands this blob straight to the IR engine — no
CpuState↔GuestState copy. FP state is not in the blob: the IR
never lowers FP, so f_[] and fsr_ stay canonical in CpuState
(cpu_state.hpp:710-711).
Register and special-register accessors¶
- Windowed file:
read_r(rs)/write_r(rd, v)(window-aware;r0reads zero, writes discarded).globals_view()/windowed_view()/*_storage()give raw spans for bulk copy. - PSR:
psr()/set_psr(), plus typed viewscwp,s,ps,et,ef,ec,pil,icc(). PSR bit layout is innamespace psr(cpu_state.hpp:92);WRPSRsemantics go throughwrite_psr_writable()which respects the read-only mask (impl/ver/reserved/EC) and the 3-instruction delay (see PSR pipeline below). WIM(set_wimmasks to the lowNumWindowsbits),TBR(write_tbapreserves thettfield),Y,ASR17(LEON3 config register —write_asr17honours the0x0FFFE000writable mask;set_core_indexwrites the per-core index field;svt_enabled()reports single-vector trapping).- Cache control (ASI
0x2):cache_control_reg,icache_config,dcache_config— register state only, no real cache. Reset advertisesCCR.DS=1so RTEMS SMP boot's snoop check passes (cpu_state.hpp:162). - FPU:
f(i)/set_f,fsr()/fsr_mut()/set_fsr,pack_fsr()/unpack_fsr()(cpu_state.hpp:517-540).
%asr22:%asr23 global up-counter¶
up_counter() / set_up_counter_base() (cpu_state.hpp:394-401) model
the LEON3/LEON4 free-running cycle counter read via %asr22:%asr23. Per
GR740-UM §6.10.4 it is a single SoC-wide counter — the run loop sets
it from global simulated time once per scheduling round
(Emulator::sync_global_up_counter), so all cores read one monotone value
that advances through idle-skip and power-down. It is round-granular, so
leon3_up_counter_is_available() reports false and RTEMS falls back to
the (also round-granular) GPTIMER timecounter — keeping the counter and
the periodic clock tick consistent.
Step-loop micro-state (ExecutionFlags)¶
exec_ (cpu_state.hpp:716, type in execution_flags.hpp) carries the
transient state the fetch/execute loop needs but that is not
architectural:
| Field | Meaning |
|---|---|
branch_taken / branch_target |
CTI requested a branch; the loop applies it via the delay-slot rule |
annul_next |
Bicc,a not-taken: skip the delay slot's side effects |
pending_trap / pending_tt |
a handler synthesised a software trap (e.g. Ticc) |
error_mode |
trap taken while ET=0 — SPARC halts (one-way latch) |
is_powered_down |
%asr19 write (idle loop); cleared by IRQ/trap |
code_flush_pending |
a FLUSH retired — the runtime drains it to clear shared caches |
The transfer/trap helpers on CpuState operate on these: |
|
request_branch, clear_branch_request, set_annul_next, |
|
raise_trap(tt), enter_trap(saved_pc, saved_npc, tt) (the §7.3 entry |
|
sequence), leave_trap(target) (the §B.26 RETT exit), |
|
and request_code_flush / consume_code_flush. |
WRPSR is immediate
SPARC V8 §5.1.2.3 permits WRPSR's effect on S/ET/PS/CWP to be
deferred up to three instructions, but that is implementation latitude.
Tero applies every writable field at once (write_psr_writable,
cpu_state.cpp:69), matching the SIS oracle. The earlier pending_psr_
/ commit_psr_pipeline() delay model was removed — it desynced the
register windows when a trap fired inside the window (smpschededf03).
The per-core decode cache¶
CpuState carries a 1024-entry, direct-mapped decode cache (DecodeCacheEntry
= {pc_tag, DecodedInsn}, cpu_state.hpp:746). On each step, step()
indexes decode_cache_slot(pc) ((pc >> 2) & DecodeCacheMask) and reuses
the cached DecodedInsn when pc_tag == pc, skipping both the bus fetch
and the decoder. Hit rate is > 99 % on warm RTEMS workloads.
pc_tag = 0xFFFFFFFFmarks an empty slot (can never collide with a real 4-byte-aligned PC).- Cleared in full by
reset()andinvalidate_decode_cache(). AFLUSHdrops this core's cache immediately viarequest_code_flush(). - It stores nothing architectural —
copy_arch_state_from()(cpu_state.hpp:312) deliberately copies the blob, FP, cache-control, and step micro-state but not the decode cache (the oracle reuses one warm scratch core across millions of blocks).
Decoder — pure, stateless¶
decode(uint32_t word) → DecodedInsn (decoder.hpp:22) covers Format 1
(CALL), Format 2 (SETHI/Bicc/FBfcc), and the Format 3 opcode
space. The decoder is pure: no state, no allocation, no side effects;
it never touches CpuState. Unknown / reserved encodings decode to
InsnKind::Unknown (the handler maps that to the illegal-instruction
trap) — the decoder never throws.
DecodedInsn (decoded_insn.hpp:127) is a flat trivially-copyable POD:
raw, kind (an InsnKind discriminator — not the SPARC encoding),
register operands (rd/rs1/rs2), has_imm + simm13 + imm22,
pre-shifted displacements (disp_call, disp_branch), cond + annul,
and a sub-op union by kind (alu_op, shift_op, mul_op, div_op,
mem_op, sreg/asr_index, fp_op), plus raw op/op2/op3/asi
bits for diagnostics. The sub-op enums (AluOp, ShiftOp, MulOp,
DivOp, MemOp, CondCode, SpecialReg) let handlers switch on one
value without re-parsing op3.
is_fp_kind(InsnKind) (decoded_insn.hpp:64) and is_fp_class(raw)
(fpu.hpp:81) gate the fp_disabled trap (PSR.EF == 0).
Handlers and execute()¶
execute(CpuState&, const DecodedInsn&, ICpuBus* = nullptr)
(handlers.hpp:48) is the dispatcher. It mutates state and returns an
ExecStatus; it does not advance PC/nPC (that is step()'s job). A
null bus is legal for callers that never run memory instructions
(most unit tests); a Load/Store with null bus returns BusError.
| File | Covers |
|---|---|
handlers_alu.cpp |
ADD/SUB/AND/OR/XOR/ANDN/ORN/XNOR + cc/X variants, tagged-add/sub (TADDcc/TSUBcc(TV)), SLL/SRL/SRA, UMUL/SMUL, UDIV/SDIV, MULScc |
handlers_branch.cpp |
Bicc/BA, CALL, JMPL, Tcc, SETHI |
handlers_loadstore.cpp |
LD/LDUB/LDUH/LDSB/LDSH/LDD + ST/STB/STH/STD, LDA/STA (ASI; incl. cache-control ASI 0x2), LDSTUB, SWAP, CASA |
handlers_regwin.cpp |
SAVE, RESTORE, RETT (WIM-checked per §B.26) |
handlers_special.cpp |
RDY/WRY, RDPSR/WRPSR, RDWIM/WRWIM, RDTBR/WRTBR, RDASR/WRASR, FLUSH, STBAR |
fpu_handlers.cpp |
FPop1/FPop2, FP loads/stores, FBfcc |
Shared bodies — exec_alu_op<Op>, status_to_tt, the flags helpers —
live in handlers_internal.hpp, included only by the handler .cpp files
so they stay off the public surface.
ExecStatus (handlers.hpp:24) is the side-band result; status_to_tt
maps it to a SPARC tt:
ExecStatus |
→ tt |
Meaning |
|---|---|---|
Ok |
— | completed normally |
IllegalInsn |
0x02 |
unknown / UNIMP / reserved |
PrivInsn |
0x03 |
privileged op in user mode |
FpDisabled |
0x04 |
FP op with PSR.EF == 0 |
WinOverflow / WinUnderflow |
0x05 / 0x06 |
SAVE / RESTORE window check |
AlignError |
0x07 |
unaligned load/store or PC |
BusError |
0x09 |
bus fault on data access |
TagOverflow |
0x0A |
tagged-arithmetic trap |
DivZero |
0x2A |
UDIV/SDIV by zero |
TrapInsn |
0x80 + n |
Tcc software trap |
InsnFetchError |
0x01 |
bus fault on instruction fetch |
ErrorMode |
— | trap taken while ET=0 (bubbled up) |
TrapType (trap.hpp:26) names the architectural tt bytes; tt_of()
converts. Async interrupts use tt = 0x10 + irl (InterruptLevelBase),
software traps 0x80 + imm7 (SoftwareTrapBase); MaxIrqLevel = 15.
step() — the single-cycle driver¶
step(CpuState&, ICpuBus&) → StepResult (step.hpp:41) advances exactly
one instruction. Per call, in order (step.cpp):
- If
error_mode(), short-circuit withExecStatus::ErrorMode(SPARC stays in error mode until reset). clear_branch_request()— drop last cycle's CTI state.- Check PC alignment (misaligned PC →
AlignErroragainst the fetch). - Decode-cache lookup: on hit, reuse the cached
DecodedInsn; on miss,bus.read_u32(VirtAddr{pc})thendecode()and fill the slot. A fetch bus error →InsnFetchError. - If
annul_next()is set, drop the instruction's side effects (still advance PC/nPC); elseexecute(state, insn, &bus)(the FP-disabled gate is charged insideexecute, only for FP kinds). - Resolve the trap: a handler-raised
pending_trapwins, elsestatus_to_tt(status). If attis pending and!et(), latcherror_modeand return; otherwiseenter_trap(pc, npc, tt). - Non-trap path: apply the SPARC delay-slot update —
new_pc = npc,new_npc = branch_taken ? branch_target : npc + 4.
The function is ~110 LOC, deliberately verifiable by inspection.
Delay-slot model
CTIs (CALL, taken Bicc, JMPL, RETT) only record a
branch_target; they never touch PC/nPC. step() applies the two-PC
update after execute() returns, so SPARC's branch-delay semantics
fall out of one place. See
execution model.
FPU — Berkeley SoftFloat 3e¶
fpu.hpp / fpu_handlers.hpp / softfloat_context.hpp wrap the vendored
third-party/softfloat3e build (BSD-3-Clause). FpOp (fpu.hpp:25)
names the concrete operation inside an FP InsnKind; decode_fpop1 /
decode_fpop2 parse the 9-bit opf field. The implemented set covers FP
loads/stores (LDF/LDDF/LDFSR, STF/STDF/STFSR), the trivial moves
(FMOVs/FNEGs/FABSs), single- and double-precision add/sub, all four FCMP
variants, and FBfcc; anything outside it decodes to FpOp::Unimplemented
(→ fp_exception, with FTT = 3 — the unimplemented FPop code in the
floating-point status register FSR). The FSR's TEM (trap-enable mask)
selects which FP faults trap, and drives fp_exception (tt = 0x08); the
FP-disabled trap path (PSR.EF == 0) is retained for guests that ship without
FP enabled.
The wrapper is per-CpuState — each core has its own SoftFloat state
so rounding mode / exception flags do not leak between cores. The
specialization is 8086-SSE (upstream default; IEEE-754 default-NaN
handling), not the original plan's 8086. RTEMS fptest01 and the
FP-using sptests exercise the path end-to-end.
What is deliberately out of tero_core¶
- No bus access — fetch, load, store all go through the injected
ICpuBus. The core never namesSystemBus. - No peripherals, no interrupt controller, no scheduler.
- No I/O of any kind, not even logging — handlers return statuses, not messages.
- No hardware cache model, no MMU (both deferred). The decode cache holds nothing architectural.
- No binary translation. The arch-neutral IR and the LLVM JIT live in
tero_ir/tero_arch_sparc/tero_jit;tero_corestays the single-instruction reference and oracle.
See also¶
- Interfaces —
ICpuBus,IInterruptController, traps. - Runtime — what drives
step()andrun_ir_quantum. - Architecture: execution model — the run loop, delay slots, immediate PSR writes.
- Architecture: traps and interrupts — trap entry/exit, interrupt sampling, error mode.
- IR and LLVM JIT — the alternative execution path validated against this oracle.