Multicore and timing¶
Lince models multi-core execution deterministically on a single host thread. The design is intentionally simple: a cooperative round-robin scheduler with a configurable instruction quantum.
Why single-threaded round-robin?¶
The decision flows directly from the design principles:
- TSO is satisfied trivially. Only one core executes at any host instant, so the SPARC Total Store Order memory model holds without any fences or atomics.
- Atomics (CASA, LDSTUB, SWAP) are correct by construction. They run to completion within a quantum; no interleaving is possible.
- Fully deterministic. Two runs of the same image produce the same trace, regardless of host load. This is essential for regression testing and trace comparison against reference outputs.
- Trivially debuggable. No host-level race conditions can corrupt guest state.
The cost is performance: with N cores the scheduler is at best
1/N the throughput of a parallel implementation. For the MVP target
(~50 MIPS aggregate, enough for RTEMS) this is acceptable.
The scheduling loop¶
sequenceDiagram
participant E as Emulator
participant S as Scheduler
participant C as CpuState
participant P as Peripherals
loop while sim_time < deadline
E->>S: fire_pending(sim_time)
loop for each Core
alt is_powered_down()
E-->>E: skip quantum
else active
E->>E: sample_interrupts()
loop quantum instructions
E->>C: step()
end
end
end
E->>P: tick(sim_time)
alt all cores powered down
E->>S: next_event_time()
E->>E: sim_time = min(next, now + 1 ms)
else
E->>E: sim_time += quantum * ns_per_insn
end
end
The quantum¶
EmulatorConfig::quantum (default 1000) is the number of
instructions each core executes before the scheduler hands the host
thread to the next core. The choice trades off:
| Quantum size | Pros | Cons |
|---|---|---|
| Small (≤ 100) | Finer interrupt latency | More context-switching overhead |
| Default (1000) | Balanced | Reasonable for RTEMS workloads |
| Large (≥ 10000) | Lower scheduler overhead | Worse cross-core latency |
You almost never need to touch this; the default keeps RTEMS sptests within a few-percent variance of reference baselines.
Idle-time skipping¶
The most performance-sensitive piece of the loop is what happens when
every core is idle. Naïvely running 1000 instructions per core per
quantum on cores that are halted in asr19 would waste host cycles.
Lince detects this case explicitly:
- After a round-robin pass, the loop checks
all_cores_powered_down(). - If true, simulated time jumps to
min(scheduler.next_event_time(), now + kMaxIdleNs). - The 1 ms cap (
kMaxIdleNs) bounds the jump because the GPTimer raises interrupts viaIInterruptSource::raise()rather than theEventScheduler, so its next tick is invisible tonext_event_time(). Without the cap, time would jump straight to the deadline and the OS clock would never advance.
This was Decision 25 (see Design decisions) and is what
allowed RTEMS sp04 (an explicit idle-loop test) to pass in finite
real time.
Powering cores up and down¶
A SPARC V8 LEON3 core enters power-down by writing a non-zero value to
asr19:
The handler sets CpuState::is_powered_down_ = true. The core remains
parked until any of:
- An IRQ is asserted on its interrupt input (via the IRQMP).
- A debug command from the GDB stub forces a wake.
On the GR712RC, only CPU 0 starts running at reset. CPUs 1–N are
parked at asr19 and woken by CPU 0 once the OS is ready (Decision
28 in Design decisions). ElfLoader mirrors this: it
sets is_powered_down = true for every core except core 0.
Time as a parameter¶
The emulator never reads the host clock. All time flows from outside:
EmulatorConfig::ns_per_insn(default 20) sets the simulated speed.Emulator::run_for(SimTimeNs duration)runs until `current_sim_time()- duration` is reached.
Emulator::run_until(SimTimeNs deadline)runs until exactlydeadline.Emulator::current_sim_time()is a snapshot of the simulated clock.
If your wrapper wants wall-clock alignment (e.g. SMP2 real-time
scheduler), it adjusts how often it calls run_for — Lince itself never
sleeps on the host.