Skip to content

Multicore and timing

Lince models multi-core execution deterministically on a single host thread. The design is intentionally simple: a cooperative round-robin scheduler with a configurable instruction quantum.

Why single-threaded round-robin?

The decision flows directly from the design principles:

  • TSO is satisfied trivially. Only one core executes at any host instant, so the SPARC Total Store Order memory model holds without any fences or atomics.
  • Atomics (CASA, LDSTUB, SWAP) are correct by construction. They run to completion within a quantum; no interleaving is possible.
  • Fully deterministic. Two runs of the same image produce the same trace, regardless of host load. This is essential for regression testing and trace comparison against reference outputs.
  • Trivially debuggable. No host-level race conditions can corrupt guest state.

The cost is performance: with N cores the scheduler is at best 1/N the throughput of a parallel implementation. For the MVP target (~50 MIPS aggregate, enough for RTEMS) this is acceptable.

The scheduling loop

sequenceDiagram
    participant E as Emulator
    participant S as Scheduler
    participant C as CpuState
    participant P as Peripherals

    loop while sim_time < deadline
        E->>S: fire_pending(sim_time)
        loop for each Core
            alt is_powered_down()
                E-->>E: skip quantum
            else active
                E->>E: sample_interrupts()
                loop quantum instructions
                    E->>C: step()
                end
            end
        end
        E->>P: tick(sim_time)
        alt all cores powered down
            E->>S: next_event_time()
            E->>E: sim_time = min(next, now + 1 ms)
        else
            E->>E: sim_time += quantum * ns_per_insn
        end
    end

The quantum

EmulatorConfig::quantum (default 1000) is the number of instructions each core executes before the scheduler hands the host thread to the next core. The choice trades off:

Quantum size Pros Cons
Small (≤ 100) Finer interrupt latency More context-switching overhead
Default (1000) Balanced Reasonable for RTEMS workloads
Large (≥ 10000) Lower scheduler overhead Worse cross-core latency

You almost never need to touch this; the default keeps RTEMS sptests within a few-percent variance of reference baselines.

Idle-time skipping

The most performance-sensitive piece of the loop is what happens when every core is idle. Naïvely running 1000 instructions per core per quantum on cores that are halted in asr19 would waste host cycles.

Lince detects this case explicitly:

  1. After a round-robin pass, the loop checks all_cores_powered_down().
  2. If true, simulated time jumps to min(scheduler.next_event_time(), now + kMaxIdleNs).
  3. The 1 ms cap (kMaxIdleNs) bounds the jump because the GPTimer raises interrupts via IInterruptSource::raise() rather than the EventScheduler, so its next tick is invisible to next_event_time(). Without the cap, time would jump straight to the deadline and the OS clock would never advance.

This was Decision 25 (see Design decisions) and is what allowed RTEMS sp04 (an explicit idle-loop test) to pass in finite real time.

Powering cores up and down

A SPARC V8 LEON3 core enters power-down by writing a non-zero value to asr19:

wr %g0, %asr19   ! halt this CPU

The handler sets CpuState::is_powered_down_ = true. The core remains parked until any of:

  • An IRQ is asserted on its interrupt input (via the IRQMP).
  • A debug command from the GDB stub forces a wake.

On the GR712RC, only CPU 0 starts running at reset. CPUs 1–N are parked at asr19 and woken by CPU 0 once the OS is ready (Decision 28 in Design decisions). ElfLoader mirrors this: it sets is_powered_down = true for every core except core 0.

Time as a parameter

The emulator never reads the host clock. All time flows from outside:

  • EmulatorConfig::ns_per_insn (default 20) sets the simulated speed.
  • Emulator::run_for(SimTimeNs duration) runs until `current_sim_time()
  • duration` is reached.
  • Emulator::run_until(SimTimeNs deadline) runs until exactly deadline.
  • Emulator::current_sim_time() is a snapshot of the simulated clock.

If your wrapper wants wall-clock alignment (e.g. SMP2 real-time scheduler), it adjusts how often it calls run_for — Lince itself never sleeps on the host.