Memory and bus¶
This page is the reference for Tero's memory fabric: the SystemBus router,
Ram backing store, address-range MMIO dispatch, parameterised byte handling
(default big-endian), the CPU-side ICpuBus bridge, the public physical-memory
API on Emulator,
the per-core bus bridge under MultiThread, the GRLIB Plug & Play table, and
how to map a new region or peripheral.
It belongs to the Developer Manual. See Peripheral system
for the IPeripheral interface MMIO dispatches into,
Multicore and timing for the per-core bridge under
MultiThread, and IR and LLVM JIT for how the JIT inlines RAM access.
Architecture at a glance¶
flowchart LR
subgraph core["Core (per core)"]
ST["core::step / JIT block"]
end
ST -->|"VirtAddr"| BR["CpuBusBridge (ICpuBus)<br/>+ RamFastCache"]
BR -->|"PhysAddr"| BUS["SystemBus (IBusMaster)"]
DMA["Peripheral DMA"] -->|"PhysAddr"| BUS
API["Emulator public API<br/>read/write_physical"] --> BUS
BUS -->|"contains?"| RAM["Ram blocks<br/>(raw bytes)"]
BUS -->|"contains?"| MMIO["MMIO regions<br/>IPeripheral*"]
MMIO --> P1["APBUart / IRQMP / GPTimer / ..."]
The bus has three classes of client — the CPU (via a per-core bridge),
peripheral DMA masters, and the external public API — and they all share one
SystemBus and one physical address map. There is no separate DMA or
debug address space.
SystemBus¶
tero::bus::SystemBus (src/bus/include/tero/bus/system_bus.hpp) is the
central physical-address router. It owns:
- Zero or more
Ramregions — owning (map_ram, viaunique_ptr<Ram>) or non-owning (map_ram_backing): a memory entity (peripherals::Ram/Rom) owns the storage and the bus mounts it as a direct fast region through thebus::IRamBackingcapability. A backing mounted withwritable = falsegives ROM semantics at the bus layer: typed, bulk, atomic and DMA writes fail withBusError, and the region is excluded from the u32 fast cache and fromram_view_at(both are shared with write paths that do not re-check writability). - Zero or more MMIO regions, each a non-owning
IPeripheral*plus a per-regionGatedMutex(map_peripheral).
It is an IBusMaster, so peripheral DMA goes through the same instance.
Non-copyable, non-movable¶
SystemBus owns RAM via unique_ptr<Ram> and other objects cache raw
pointers into its regions (the per-core fast cache caches a Ram*; the JIT
caches a RamView.host pointer). Moving the bus would invalidate those, so
copy and move are = deleted (system_bus.hpp:39-42, Decision 1). The cached
pointers are stable for the bus's lifetime: RAM regions are mapped once
during setup and never moved or reallocated (system_bus.hpp:130). If you
need a fresh bus, construct a new one.
Routing rules¶
Every access resolves a target by walking the region tables
(find_ram_region / find_mmio_region, system_bus.cpp:138,147):
- RAM regions are checked first, then MMIO regions, each in registration
order; the first region whose half-open
[base, base+size)contains the address wins (AddressRange::contains,src/interfaces/include/tero/address_range.hpp:16). - No region matches →
ErrorCode::BusError(system_bus.cpp:201,233). - The access straddles two regions →
ErrorCode::BusError: the tail bounds check (!range.contains(addr + size - 1),system_bus.cpp:174) fails when the access runs past the region it started in. Real hardware latches one transaction against one target; the bus does not silently split (Decision 3). - Overlapping regions are rejected at map time, not access time:
map_ram/map_peripheralreturnInvalidConfigif the new range overlaps any existing region (any_region_overlaps,system_bus.cpp:80). So routing never has to disambiguate two matching regions — there can be at most one.
Access surfaces¶
| Surface | Methods | Used by |
|---|---|---|
| Untyped byte span | read_physical / write_physical |
external API, DMA fallback |
| Typed (bus byte order) | read_physical_u8/u16/u32, write_physical_* |
CPU bridge, callers |
| Atomic RMW | atomic_swap_u32, atomic_cas_u32, atomic_ldstub |
SPARC SWAP/CASA/LDSTUB |
DMA (IBusMaster) |
dma_read / dma_write |
peripherals |
| Introspection | ram_region_count, mmio_region_count, ram_view_at |
tests, JIT |
RAM¶
tero::bus::Ram (src/bus/include/tero/bus/ram.hpp) is a pure raw-byte
store over std::vector<std::byte>: it holds the bytes exactly as they sit on
the bus and never decides a byte order on its own. The byte order is a
parameter it carries — a MemEndian that defaults to Big (SPARC V8), so
the typed accessors are byte-identical to the previous big-endian-only code. The
swap is applied at Ram's own typed/atomic access boundary (via swap_for),
not in SystemBus. This keeps Ram trivially snapshot-able (Save/Restore) and
mirrors a real memory controller, which sees raw bus bytes, not architectural
ints. Ram exposes read_u32 / write_u32 (in its configured order) and the
atomic helpers for the typed fast paths; the _be-suffixed names
(read_u32_be, atomic_swap_u32_be, …) are retained as thin aliases for
callers that name the SPARC big-endian order directly — identical to the
unsuffixed methods on a default-Big RAM.
Endianness: a bus parameter (default big-endian) with byte-swap at the typed boundary¶
The bus stores raw bytes and applies a byte order only at the typed-access /
MMIO-marshalling boundary. That order is a parameter, not a hardcoded fact:
both SystemBus and every Ram it allocates carry a MemEndian
(src/bus/include/tero/bus/endian.hpp) that defaults to Big. The neutral
bus therefore no longer encodes SPARC big-endianness as a fact — wiring the
order from the architecture at region setup is a small future follow-up; today
the default makes it SPARC. (MemEndian is a small local mirror of
ir::MemEndian / ir::swap_for, kept local because tero_bus does not link
tero_ir.)
SPARC V8 is big-endian: a uint32_t at address A has its MSB at A. With the
default Big order Tero stores RAM in wire order and converts at the typed
access points (system_bus.cpp:56,66):
// encode_native + swap_for: serialize the low n bytes of `value` for `endian`;
// for Big this is MSB-first, host-endianness-independent.
encode_native(swap_for(value, out.size(), endian), out);
// decode_native + swap_for: fold n bytes back into a uint32 for `endian`.
return swap_for(decode_native(in), in.size(), endian);
swap_for is the identity for Big and a byte-reverse for Little; the
host-LE __builtin_bswap lives underneath. So the same code is correct on LE
and BE hosts. The integer register file is the opposite: it is stored
host-order, because it is state, not memory (see
Layers and modules). Only guest memory carries the bus byte
order on the wire.
Typed u32 RAM access skips the byte buffer
read_physical_u32 / write_physical_u32 have a RAM fast path that calls
Ram::read_u32 directly (system_bus.cpp:320-334), bypassing the
std::array<std::byte,4> + decode round trip the byte-span path
takes. Ram applies its own byte order there. MMIO and the PROM bulk path
still go through the byte-shaped slow path so their side effects (UART RX
FIFO pop, IRQMP clear-on-read) are preserved (system_bus.cpp:336).
MMIO dispatch¶
When an access lands in an MMIO region, read_physical/write_physical
classifies it (system_bus.cpp:180-198):
- CPU-shaped (½/4 bytes, naturally aligned) →
mmio_read_bytes/mmio_write_bytes, which callIPeripheral::mmio_read/mmio_write(system_bus.cpp:240,266). Every register side effect happens here. - Bulk, side-effect-free banks → if the peripheral exposes an
IMemoryRegion(memory_region()), the bus bounds-checks and copies bytes directly (future descriptor RAM, packet buffers). The peripheral only has to copy. (The boot PROM no longer uses this path: as aRomentity it is mounted read-only viaIRamBacking, so its fetches and reads take the direct fast-region path.)
flowchart TD
A["read/write_physical(addr, span)"] --> B{find_ram_region?}
B -- yes --> C{"tail in region?<br/>(no straddle)"}
C -- no --> CE["BusError"]
C -- yes --> RW["Ram::read / write (wire bytes)"]
B -- no --> D{find_mmio_region?}
D -- no --> DE["BusError (unmapped)"]
D -- yes --> E{is_cpu_shaped?<br/>1/2/4B, aligned}
E -- yes --> F["lock region.lock (GatedMutex)<br/>IPeripheral::mmio_read/write"]
E -- no --> G{memory_region?}
G -- yes --> H["bounds-check + bulk copy"]
G -- no --> I["mmio_read/write_bytes<br/>(validates size/alignment)"]
MMIO access constraints¶
Two stricter-than-hardware rules apply at the bus boundary:
- Power-of-two ½/4-byte size only. A non-power-of-two MMIO access is
BusError; a power-of-two access not naturally aligned isAlignmentError(mmio_read_bytes,system_bus.cpp:244-249). CPU alignment traps live in the instruction handlers, not the bus (Decision 4). - Peripheral MMIO is effectively word-only. Byte/half-word accesses to
APBUart/IRQMP/GPTimer/MemCtrl return
AlignmentErrorin practice — those peripherals only accept word accesses (see the APBUart word-only note inMEMORY.md). Usest/ld, notstb/ldub. This is stricter than GR712RC (which allows byte writes to the UART data register) but matches the MVP approach: defer narrowing until RTEMS demands it (Decision 20).
Per-peripheral MMIO lock¶
Each MmioRegion carries a std::unique_ptr<GatedMutex> lock
(system_bus.hpp:150). mmio_read_bytes/mmio_write_bytes take
std::scoped_lock guard{*region.lock} around the peripheral call
(system_bus.cpp:257,282). The gate is inactive in SingleThread (a no-op
branch) and engaged by set_thread_safe(true) under MultiThread
(system_bus.cpp:129), which serialises concurrent MMIO to the same
peripheral. Regions mapped after set_thread_safe inherit the current state.
Atomic read-modify-write¶
SPARC SWAP, CASA, and LDSTUB map to SystemBus::atomic_swap_u32,
atomic_cas_u32, and atomic_ldstub (system_bus.cpp:390+):
- RAM targets use a true atomic on the backing store (
Ram::atomic_*, in the RAM's byte order), correct under MultiThread. - MMIO / unmapped targets fall back to a non-atomic read+write — atomics on MMIO are not meaningful and RTEMS never issues them; the per-region lock still serialises each access.
In SingleThread the round-robin model makes these correct by construction (only one core runs at a time); the real atomics matter only under MultiThread (see Multicore and timing).
The CPU-side bridge (ICpuBus)¶
Core handlers do not talk to SystemBus directly — they go through
ICpuBus (src/interfaces/include/tero/icpu_bus.hpp), which speaks
VirtAddr (anticipating an MMU). The implementation is
tero::runtime::CpuBusBridge (src/runtime/src/cpu_bus_bridge.cpp). Today
it is a pass-through: VirtAddr is reinterpreted as PhysAddr (no MMU yet),
and each typed accessor forwards to the matching SystemBus method:
// src/runtime/src/cpu_bus_bridge.cpp:15 — u32 read forwards with the fast cache
Result<std::uint32_t> CpuBusBridge::read_u32(VirtAddr addr) {
return bus_.read_physical_u32(PhysAddr{to_underlying(addr)}, &fast_cache_);
}
The bridge owns a SystemBus::RamFastCache fast_cache_ — a one-entry cache of
the last RAM region that satisfied a u32 access. On a sequential .text /
.data stream it hits >99% of the time, skipping the region walk. A stale
entry is self-correcting: the range check misses and refreshes
(system_bus.cpp:310-334).
Per-core bus bridge¶
Emulator holds a std::vector<CpuBusBridge> bus_bridge_, one per core,
created in initialize() before any RAM is mapped (emulator.cpp:288-291):
// src/runtime/src/emulator.cpp:288
bus_bridge_.reserve(config_.num_cores);
for (std::uint32_t i = 0; i < config_.num_cores; ++i)
bus_bridge_.emplace_back(bus_); // each owns its own RamFastCache
This is required under MultiThread (Hazard A): a single shared RamFastCache
would tear when two core threads update it concurrently. Each bridge wraps the
same SystemBus& — the address map is shared; only the per-core fast cache
is private. core::step(state, bus_bridge_[core_idx]) passes the right bridge
to each core (emulator.cpp:667).
Public physical-memory API¶
Emulator republishes a curated subset of bus operations for external callers
— debuggers, the SMP2 wrapper, test harnesses
(src/runtime/include/tero/runtime/emulator.hpp:217):
[[nodiscard]] Result<void> read_physical (PhysAddr, std::span<std::byte>);
[[nodiscard]] Result<void> write_physical(PhysAddr, std::span<const std::byte>);
[[nodiscard]] Result<std::uint32_t> read_physical_u32 (PhysAddr);
[[nodiscard]] Result<void> write_physical_u32(PhysAddr, std::uint32_t);
Reads forward straight to the bus (emulator.cpp:1711,1726). Writes also
flush the code caches (flush_code_caches(), emulator.cpp:1721,1733): an
external writer issues no guest FLUSH, so the emulator must assume the write
may have overwritten translated code and drop the IR/JIT blocks and every
core's decode cache. This is coarse by design.
No internal lock — callers drive from one thread
The public API carries no internal mutex. The execution model is
single-threaded (round-robin cooperative), so callers must drive run_*
and the memory accessors from one thread. Cross-thread synchronisation (a
runtime-gated mutex over the public API) lands with MultiThread mode and is
not implemented for the public API yet. (An earlier version of this page
claimed a mutex was held for a quantum's duration — that was never true;
CLAUDE.md's "Thread safety" note is authoritative.)
read_virtual / write_virtual are not implemented
The frozen API sketch in CLAUDE.md lists read_virtual(CoreId, VirtAddr,
...) / write_virtual(...), but they do not exist on Emulator
today — there is no MMU, so VirtAddr == PhysAddr and the physical
accessors suffice. They will be added with the same access pattern, routed
through the selected core's MMU context, if and when the SRMMU is brought
in-tree (currently deferred — see the roadmap).
DMA via IBusMaster¶
A peripheral DMAs through the same SystemBus it received in its
PeripheralContext::bus:
void DemoDmaDevice::do_dma() {
std::array<std::byte, 16> buf{};
ctx_.bus->dma_read(PhysAddr{0x40000100}, buf);
/* …mutate buf… */
ctx_.bus->dma_write(PhysAddr{0x40000200}, buf);
}
Because it is the same bus the CPU uses, DMA shares the exact same memory map — including any custom peripherals you mapped. There is no separate DMA address space (GR712RC/GR740 have a single AHB fabric).
Endianness in DMA payloads¶
dma_read/dma_write move raw std::byte. Since SPARC is BE, re-composing a
uint32_t from a DMA buffer needs the standard shift-and-or pattern (or use
the typed read_physical_u32 when the target is aligned):
uint32_t word =
(uint32_t(std::to_integer<uint8_t>(buf[0])) << 24)
| (uint32_t(std::to_integer<uint8_t>(buf[1])) << 16)
| (uint32_t(std::to_integer<uint8_t>(buf[2])) << 8)
| uint32_t(std::to_integer<uint8_t>(buf[3]));
The reference implementation in examples/demo-dma/demo_dma_device.cpp shows
both styles. See Demo DMA peripheral.
GRLIB Plug & Play table¶
GRLIB SoCs publish a Plug & Play (PnP) area that the BSP scans at boot to
discover peripherals, their addresses, and IRQ lines. RTEMS leon3 reads it,
so Tero must build a faithful one. src/runtime/src/pnp_table.cpp constructs
the AHB and APB PnP records from the live PeripheralSpec list:
- AHB PnP at
0xFFFFF000, slaves at+0x800, 32 bytes/slot, BAR0 atslot+0x10(pnp_table.cpp:24-27). - APB PnP records, 8 bytes/slot, BAR at
slot+0x04(:28-29). - Vendor/device IDs from the GRLIB IP manual §3.4 — e.g. Gaisler vendor
0x01, APBUART0x00C, IRQMP0x00D, GPTIMER0x011, LEON30x003, LEON40x048(pnp_table.cpp:34-41).
The BAR encoders (ahb_bar, apb_bar) pack address/mask/type per GRLIB §4.3.
The frequency RTEMS reads for tc_frequency comes from the same PnP path and
matches cpu_clock_hz, which is why direct-ELF guests need no rebuild on a
clock change (see Multicore and timing).
How to map a new region or peripheral¶
A new RAM region (e.g. a second SRAM bank):
auto r = bus.map_ram(PhysAddr{0x60000000}, 0x00100000); // 1 MiB
// returns InvalidConfig if it overlaps an existing region
A new peripheral — the preferred, declarative way is a PeripheralSpec in
EmulatorConfig::peripherals (the recipes do this); the bus then calls
map_peripheral(p) for you. The peripheral advertises its own MMIO window via
IPeripheral::mmio_range(), so you only set the base/size there. For
REPL/test usage after initialize(), Emulator::add_peripheral(p, irq) is the
sugar form. See Peripheral system and
Custom peripherals.
A memory bank (RAM/ROM) — instantiate the generic memory entities
peripherals::Ram / peripherals::Rom (src/peripherals/.../ram.hpp,
rom.hpp): they own the storage and expose bus::IRamBacking, which the
Soc mounts as a direct fast region (map_ram_backing). In a machine script
that is just new Ram name=x + map .... For a peripheral with an internal
side-effect-free bank, implement IPeripheral::memory_region() returning an
IMemoryRegion and the bus uses the bulk copy path for it.
Validate before you run
validate_emulator_config(cfg) runs at Emulator::create and catches
duplicate names, null factories, out-of-range IRQs, and bad chardev
indices. Address-overlap is caught later at map_* time (returns
InvalidConfig). A BusError at runtime almost always means an unmapped
address or a straddling access — check your mmio_range() base/size first.