Skip to content

Memory and bus

This page is the reference for Tero's memory fabric: the SystemBus router, Ram backing store, address-range MMIO dispatch, parameterised byte handling (default big-endian), the CPU-side ICpuBus bridge, the public physical-memory API on Emulator, the per-core bus bridge under MultiThread, the GRLIB Plug & Play table, and how to map a new region or peripheral.

It belongs to the Developer Manual. See Peripheral system for the IPeripheral interface MMIO dispatches into, Multicore and timing for the per-core bridge under MultiThread, and IR and LLVM JIT for how the JIT inlines RAM access.


Architecture at a glance

flowchart LR
    subgraph core["Core (per core)"]
      ST["core::step / JIT block"]
    end
    ST -->|"VirtAddr"| BR["CpuBusBridge (ICpuBus)<br/>+ RamFastCache"]
    BR -->|"PhysAddr"| BUS["SystemBus (IBusMaster)"]
    DMA["Peripheral DMA"] -->|"PhysAddr"| BUS
    API["Emulator public API<br/>read/write_physical"] --> BUS
    BUS -->|"contains?"| RAM["Ram blocks<br/>(raw bytes)"]
    BUS -->|"contains?"| MMIO["MMIO regions<br/>IPeripheral*"]
    MMIO --> P1["APBUart / IRQMP / GPTimer / ..."]

The bus has three classes of client — the CPU (via a per-core bridge), peripheral DMA masters, and the external public API — and they all share one SystemBus and one physical address map. There is no separate DMA or debug address space.


SystemBus

tero::bus::SystemBus (src/bus/include/tero/bus/system_bus.hpp) is the central physical-address router. It owns:

  • Zero or more Ram regions — owning (map_ram, via unique_ptr<Ram>) or non-owning (map_ram_backing): a memory entity (peripherals::Ram / Rom) owns the storage and the bus mounts it as a direct fast region through the bus::IRamBacking capability. A backing mounted with writable = false gives ROM semantics at the bus layer: typed, bulk, atomic and DMA writes fail with BusError, and the region is excluded from the u32 fast cache and from ram_view_at (both are shared with write paths that do not re-check writability).
  • Zero or more MMIO regions, each a non-owning IPeripheral* plus a per-region GatedMutex (map_peripheral).

It is an IBusMaster, so peripheral DMA goes through the same instance.

Non-copyable, non-movable

SystemBus owns RAM via unique_ptr<Ram> and other objects cache raw pointers into its regions (the per-core fast cache caches a Ram*; the JIT caches a RamView.host pointer). Moving the bus would invalidate those, so copy and move are = deleted (system_bus.hpp:39-42, Decision 1). The cached pointers are stable for the bus's lifetime: RAM regions are mapped once during setup and never moved or reallocated (system_bus.hpp:130). If you need a fresh bus, construct a new one.

Routing rules

Every access resolves a target by walking the region tables (find_ram_region / find_mmio_region, system_bus.cpp:138,147):

  1. RAM regions are checked first, then MMIO regions, each in registration order; the first region whose half-open [base, base+size) contains the address wins (AddressRange::contains, src/interfaces/include/tero/address_range.hpp:16).
  2. No region matchesErrorCode::BusError (system_bus.cpp:201,233).
  3. The access straddles two regionsErrorCode::BusError: the tail bounds check (!range.contains(addr + size - 1), system_bus.cpp:174) fails when the access runs past the region it started in. Real hardware latches one transaction against one target; the bus does not silently split (Decision 3).
  4. Overlapping regions are rejected at map time, not access time: map_ram / map_peripheral return InvalidConfig if the new range overlaps any existing region (any_region_overlaps, system_bus.cpp:80). So routing never has to disambiguate two matching regions — there can be at most one.

Access surfaces

Surface Methods Used by
Untyped byte span read_physical / write_physical external API, DMA fallback
Typed (bus byte order) read_physical_u8/u16/u32, write_physical_* CPU bridge, callers
Atomic RMW atomic_swap_u32, atomic_cas_u32, atomic_ldstub SPARC SWAP/CASA/LDSTUB
DMA (IBusMaster) dma_read / dma_write peripherals
Introspection ram_region_count, mmio_region_count, ram_view_at tests, JIT

RAM

tero::bus::Ram (src/bus/include/tero/bus/ram.hpp) is a pure raw-byte store over std::vector<std::byte>: it holds the bytes exactly as they sit on the bus and never decides a byte order on its own. The byte order is a parameter it carries — a MemEndian that defaults to Big (SPARC V8), so the typed accessors are byte-identical to the previous big-endian-only code. The swap is applied at Ram's own typed/atomic access boundary (via swap_for), not in SystemBus. This keeps Ram trivially snapshot-able (Save/Restore) and mirrors a real memory controller, which sees raw bus bytes, not architectural ints. Ram exposes read_u32 / write_u32 (in its configured order) and the atomic helpers for the typed fast paths; the _be-suffixed names (read_u32_be, atomic_swap_u32_be, …) are retained as thin aliases for callers that name the SPARC big-endian order directly — identical to the unsuffixed methods on a default-Big RAM.


Endianness: a bus parameter (default big-endian) with byte-swap at the typed boundary

The bus stores raw bytes and applies a byte order only at the typed-access / MMIO-marshalling boundary. That order is a parameter, not a hardcoded fact: both SystemBus and every Ram it allocates carry a MemEndian (src/bus/include/tero/bus/endian.hpp) that defaults to Big. The neutral bus therefore no longer encodes SPARC big-endianness as a fact — wiring the order from the architecture at region setup is a small future follow-up; today the default makes it SPARC. (MemEndian is a small local mirror of ir::MemEndian / ir::swap_for, kept local because tero_bus does not link tero_ir.)

SPARC V8 is big-endian: a uint32_t at address A has its MSB at A. With the default Big order Tero stores RAM in wire order and converts at the typed access points (system_bus.cpp:56,66):

// encode_native + swap_for: serialize the low n bytes of `value` for `endian`;
// for Big this is MSB-first, host-endianness-independent.
encode_native(swap_for(value, out.size(), endian), out);

// decode_native + swap_for: fold n bytes back into a uint32 for `endian`.
return swap_for(decode_native(in), in.size(), endian);

swap_for is the identity for Big and a byte-reverse for Little; the host-LE __builtin_bswap lives underneath. So the same code is correct on LE and BE hosts. The integer register file is the opposite: it is stored host-order, because it is state, not memory (see Layers and modules). Only guest memory carries the bus byte order on the wire.

Typed u32 RAM access skips the byte buffer

read_physical_u32 / write_physical_u32 have a RAM fast path that calls Ram::read_u32 directly (system_bus.cpp:320-334), bypassing the std::array<std::byte,4> + decode round trip the byte-span path takes. Ram applies its own byte order there. MMIO and the PROM bulk path still go through the byte-shaped slow path so their side effects (UART RX FIFO pop, IRQMP clear-on-read) are preserved (system_bus.cpp:336).


MMIO dispatch

When an access lands in an MMIO region, read_physical/write_physical classifies it (system_bus.cpp:180-198):

  • CPU-shaped (½/4 bytes, naturally aligned)mmio_read_bytes / mmio_write_bytes, which call IPeripheral::mmio_read / mmio_write (system_bus.cpp:240,266). Every register side effect happens here.
  • Bulk, side-effect-free banks → if the peripheral exposes an IMemoryRegion (memory_region()), the bus bounds-checks and copies bytes directly (future descriptor RAM, packet buffers). The peripheral only has to copy. (The boot PROM no longer uses this path: as a Rom entity it is mounted read-only via IRamBacking, so its fetches and reads take the direct fast-region path.)
flowchart TD
    A["read/write_physical(addr, span)"] --> B{find_ram_region?}
    B -- yes --> C{"tail in region?<br/>(no straddle)"}
    C -- no --> CE["BusError"]
    C -- yes --> RW["Ram::read / write (wire bytes)"]
    B -- no --> D{find_mmio_region?}
    D -- no --> DE["BusError (unmapped)"]
    D -- yes --> E{is_cpu_shaped?<br/>1/2/4B, aligned}
    E -- yes --> F["lock region.lock (GatedMutex)<br/>IPeripheral::mmio_read/write"]
    E -- no --> G{memory_region?}
    G -- yes --> H["bounds-check + bulk copy"]
    G -- no --> I["mmio_read/write_bytes<br/>(validates size/alignment)"]

MMIO access constraints

Two stricter-than-hardware rules apply at the bus boundary:

  • Power-of-two ½/4-byte size only. A non-power-of-two MMIO access is BusError; a power-of-two access not naturally aligned is AlignmentError (mmio_read_bytes, system_bus.cpp:244-249). CPU alignment traps live in the instruction handlers, not the bus (Decision 4).
  • Peripheral MMIO is effectively word-only. Byte/half-word accesses to APBUart/IRQMP/GPTimer/MemCtrl return AlignmentError in practice — those peripherals only accept word accesses (see the APBUart word-only note in MEMORY.md). Use st/ld, not stb/ldub. This is stricter than GR712RC (which allows byte writes to the UART data register) but matches the MVP approach: defer narrowing until RTEMS demands it (Decision 20).

Per-peripheral MMIO lock

Each MmioRegion carries a std::unique_ptr<GatedMutex> lock (system_bus.hpp:150). mmio_read_bytes/mmio_write_bytes take std::scoped_lock guard{*region.lock} around the peripheral call (system_bus.cpp:257,282). The gate is inactive in SingleThread (a no-op branch) and engaged by set_thread_safe(true) under MultiThread (system_bus.cpp:129), which serialises concurrent MMIO to the same peripheral. Regions mapped after set_thread_safe inherit the current state.


Atomic read-modify-write

SPARC SWAP, CASA, and LDSTUB map to SystemBus::atomic_swap_u32, atomic_cas_u32, and atomic_ldstub (system_bus.cpp:390+):

  • RAM targets use a true atomic on the backing store (Ram::atomic_*, in the RAM's byte order), correct under MultiThread.
  • MMIO / unmapped targets fall back to a non-atomic read+write — atomics on MMIO are not meaningful and RTEMS never issues them; the per-region lock still serialises each access.

In SingleThread the round-robin model makes these correct by construction (only one core runs at a time); the real atomics matter only under MultiThread (see Multicore and timing).


The CPU-side bridge (ICpuBus)

Core handlers do not talk to SystemBus directly — they go through ICpuBus (src/interfaces/include/tero/icpu_bus.hpp), which speaks VirtAddr (anticipating an MMU). The implementation is tero::runtime::CpuBusBridge (src/runtime/src/cpu_bus_bridge.cpp). Today it is a pass-through: VirtAddr is reinterpreted as PhysAddr (no MMU yet), and each typed accessor forwards to the matching SystemBus method:

// src/runtime/src/cpu_bus_bridge.cpp:15 — u32 read forwards with the fast cache
Result<std::uint32_t> CpuBusBridge::read_u32(VirtAddr addr) {
    return bus_.read_physical_u32(PhysAddr{to_underlying(addr)}, &fast_cache_);
}

The bridge owns a SystemBus::RamFastCache fast_cache_ — a one-entry cache of the last RAM region that satisfied a u32 access. On a sequential .text / .data stream it hits >99% of the time, skipping the region walk. A stale entry is self-correcting: the range check misses and refreshes (system_bus.cpp:310-334).

Per-core bus bridge

Emulator holds a std::vector<CpuBusBridge> bus_bridge_, one per core, created in initialize() before any RAM is mapped (emulator.cpp:288-291):

// src/runtime/src/emulator.cpp:288
bus_bridge_.reserve(config_.num_cores);
for (std::uint32_t i = 0; i < config_.num_cores; ++i)
    bus_bridge_.emplace_back(bus_);   // each owns its own RamFastCache

This is required under MultiThread (Hazard A): a single shared RamFastCache would tear when two core threads update it concurrently. Each bridge wraps the same SystemBus& — the address map is shared; only the per-core fast cache is private. core::step(state, bus_bridge_[core_idx]) passes the right bridge to each core (emulator.cpp:667).


Public physical-memory API

Emulator republishes a curated subset of bus operations for external callers — debuggers, the SMP2 wrapper, test harnesses (src/runtime/include/tero/runtime/emulator.hpp:217):

[[nodiscard]] Result<void>     read_physical (PhysAddr, std::span<std::byte>);
[[nodiscard]] Result<void>     write_physical(PhysAddr, std::span<const std::byte>);
[[nodiscard]] Result<std::uint32_t> read_physical_u32 (PhysAddr);
[[nodiscard]] Result<void>     write_physical_u32(PhysAddr, std::uint32_t);

Reads forward straight to the bus (emulator.cpp:1711,1726). Writes also flush the code caches (flush_code_caches(), emulator.cpp:1721,1733): an external writer issues no guest FLUSH, so the emulator must assume the write may have overwritten translated code and drop the IR/JIT blocks and every core's decode cache. This is coarse by design.

No internal lock — callers drive from one thread

The public API carries no internal mutex. The execution model is single-threaded (round-robin cooperative), so callers must drive run_* and the memory accessors from one thread. Cross-thread synchronisation (a runtime-gated mutex over the public API) lands with MultiThread mode and is not implemented for the public API yet. (An earlier version of this page claimed a mutex was held for a quantum's duration — that was never true; CLAUDE.md's "Thread safety" note is authoritative.)

read_virtual / write_virtual are not implemented

The frozen API sketch in CLAUDE.md lists read_virtual(CoreId, VirtAddr, ...) / write_virtual(...), but they do not exist on Emulator today — there is no MMU, so VirtAddr == PhysAddr and the physical accessors suffice. They will be added with the same access pattern, routed through the selected core's MMU context, if and when the SRMMU is brought in-tree (currently deferred — see the roadmap).


DMA via IBusMaster

A peripheral DMAs through the same SystemBus it received in its PeripheralContext::bus:

void DemoDmaDevice::do_dma() {
    std::array<std::byte, 16> buf{};
    ctx_.bus->dma_read(PhysAddr{0x40000100}, buf);
    /* …mutate buf… */
    ctx_.bus->dma_write(PhysAddr{0x40000200}, buf);
}

Because it is the same bus the CPU uses, DMA shares the exact same memory map — including any custom peripherals you mapped. There is no separate DMA address space (GR712RC/GR740 have a single AHB fabric).

Endianness in DMA payloads

dma_read/dma_write move raw std::byte. Since SPARC is BE, re-composing a uint32_t from a DMA buffer needs the standard shift-and-or pattern (or use the typed read_physical_u32 when the target is aligned):

uint32_t word =
      (uint32_t(std::to_integer<uint8_t>(buf[0])) << 24)
    | (uint32_t(std::to_integer<uint8_t>(buf[1])) << 16)
    | (uint32_t(std::to_integer<uint8_t>(buf[2])) <<  8)
    |  uint32_t(std::to_integer<uint8_t>(buf[3]));

The reference implementation in examples/demo-dma/demo_dma_device.cpp shows both styles. See Demo DMA peripheral.


GRLIB Plug & Play table

GRLIB SoCs publish a Plug & Play (PnP) area that the BSP scans at boot to discover peripherals, their addresses, and IRQ lines. RTEMS leon3 reads it, so Tero must build a faithful one. src/runtime/src/pnp_table.cpp constructs the AHB and APB PnP records from the live PeripheralSpec list:

  • AHB PnP at 0xFFFFF000, slaves at +0x800, 32 bytes/slot, BAR0 at slot+0x10 (pnp_table.cpp:24-27).
  • APB PnP records, 8 bytes/slot, BAR at slot+0x04 (:28-29).
  • Vendor/device IDs from the GRLIB IP manual §3.4 — e.g. Gaisler vendor 0x01, APBUART 0x00C, IRQMP 0x00D, GPTIMER 0x011, LEON3 0x003, LEON4 0x048 (pnp_table.cpp:34-41).

The BAR encoders (ahb_bar, apb_bar) pack address/mask/type per GRLIB §4.3. The frequency RTEMS reads for tc_frequency comes from the same PnP path and matches cpu_clock_hz, which is why direct-ELF guests need no rebuild on a clock change (see Multicore and timing).


How to map a new region or peripheral

A new RAM region (e.g. a second SRAM bank):

auto r = bus.map_ram(PhysAddr{0x60000000}, 0x00100000);  // 1 MiB
// returns InvalidConfig if it overlaps an existing region

A new peripheral — the preferred, declarative way is a PeripheralSpec in EmulatorConfig::peripherals (the recipes do this); the bus then calls map_peripheral(p) for you. The peripheral advertises its own MMIO window via IPeripheral::mmio_range(), so you only set the base/size there. For REPL/test usage after initialize(), Emulator::add_peripheral(p, irq) is the sugar form. See Peripheral system and Custom peripherals.

A memory bank (RAM/ROM) — instantiate the generic memory entities peripherals::Ram / peripherals::Rom (src/peripherals/.../ram.hpp, rom.hpp): they own the storage and expose bus::IRamBacking, which the Soc mounts as a direct fast region (map_ram_backing). In a machine script that is just new Ram name=x + map .... For a peripheral with an internal side-effect-free bank, implement IPeripheral::memory_region() returning an IMemoryRegion and the bus uses the bulk copy path for it.

Validate before you run

validate_emulator_config(cfg) runs at Emulator::create and catches duplicate names, null factories, out-of-range IRQs, and bad chardev indices. Address-overlap is caught later at map_* time (returns InvalidConfig). A BusError at runtime almost always means an unmapped address or a straddling access — check your mmio_range() base/size first.