Skip to content

Execution model

Lince is a fetch-decode-execute interpreter. Each core executes one SPARC instruction at a time; there is no decode cache, no block translator, no JIT. The cost of this simplicity is performance — the gain is determinism, debuggability, and a small attack surface for hardware-modelling bugs.

The step() cycle

A single iteration of lince::core::step() does the following:

flowchart TD
    A[Sample interrupts] --> B{Pending trap?}
    B -- yes --> H[enter_trap]
    B -- no  --> C[Fetch insn at PC]
    C --> D[Decode → DecodedInsn]
    D --> E[execute → ExecStatus]
    E --> F{ExecStatus}
    F -- Ok / Branch --> G[Advance PC/nPC]
    F -- Trap statuses --> H
    F -- ErrorMode --> X[Halt core]
    G --> Y[commit_psr_pipeline]
    H --> Y
    Y --> Z[Done]

The function returns a step count and an optional HaltReason so the outer round-robin loop can decide whether to continue, swap cores, or exit early.

Branch delay slots and annul

SPARC V8 has architectural delay slots: the instruction immediately following a control-transfer instruction (CTI) is always fetched and optionally executed before the branch takes effect.

Lince models this without a pipeline:

  1. JMPL, CALL, Bicc, RETT, etc. do not mutate PC and nPC directly. They set branch_taken_ and compute branch_target_ on CpuState.
  2. The next step() fetches the instruction at the current nPC (the delay slot) and executes it normally.
  3. After the delay slot, the loop adopts branch_target_ as the new PC instead of the post-incremented nPC.

For annulled branches (Bicc,a):

  • If the branch is taken, the delay slot is executed normally.
  • If the branch is not taken, CpuState::annul_next_ is set; the next step() skips execution but still advances PC and nPC.

A hardware interrupt clears annul_next_ on trap entry (SPARC V8 §5.1.2.2). This is implemented in CpuState::enter_trap().

PSR write pipeline

SPARC V8 §5.1.2.3 distinguishes between immediate and delayed fields of the Processor Status Register:

Field Semantics Implementation
ICC (n,z,v,c) Immediate Written directly.
PIL Immediate Written directly.
S (supervisor) Delayed by 3 cycles Buffered in pending_psr_.
ET (trap enable) Delayed idem
PS (previous-S) Delayed idem
CWP (current window pointer) Delayed idem

step() calls commit_psr_pipeline() once per cycle. Three back-to-back cycles must pass before a WRPSR to a delayed field becomes architecturally visible. The window-overflow / underflow logic relies on this: SAVE sees the committed CWP, not whatever the previous instruction may have stashed in pending_psr_.

Unit-test trap

Tests that drive execute() directly without going through step() must call commit_psr_pipeline() × 3 manually before asserting on delayed PSR fields. See tests/unit/test_handlers_special.cpp for the canonical pattern.

Trap dispatch

When a handler returns an ExecStatus other than Ok or Branch, the step loop:

  1. Clears annul_next_.
  2. Maps the ExecStatus to a SPARC tt (trap type) via status_to_tt().
  3. Calls enter_trap(tt):
    • Decrements CWP, saves PSR into the trap window.
    • Sets S=1, ET=0, PS=old_S.
    • Computes TBR = (TBA & 0xFFFFF000) | (tt << 4).
    • Sets PC = TBR, nPC = TBR + 4.
  4. If ET was already 0 at the moment the trap occurred, the core enters ErrorMode and the outer loop returns HaltReason::ErrorMode.

RETT undoes step 3: it restores S from PS, sets ET=1, increments CWP, and asks the loop to branch to the return target.

Full trap reference

ErrorMode

SPARC V8 §7.1: a trap that fires while PSR.ET == 0 causes the processor to halt and signal an exception to the outside world. Lince models this by setting error_mode_ = true on the offending core and returning HaltReason::ErrorMode from the next run_for / run_until boundary.

The CLI reacts by dumping a post-mortem of every register on core 0. Library users can call emu->core(idx) and inspect pc(), psr(), tbr(), wim(), the global registers, and the active window.

What the interpreter intentionally does not do

  • No decode caching (every instruction is re-decoded on every fetch).
  • No instruction-translation cache (no JIT, no IR).
  • No batch execution (no quantum-internal optimisation across instructions).
  • No speculative or out-of-order execution.

These are deliberate choices: see Design principles for the rationale.