EmuGen — declarative frontend generation¶
Status: designed and approved, not implemented
EmuGen is the build-time generator planned by ADR-006
(plans/multiarch-emugen-frontends.md, approved 2026-06-10). Of its
staged plan, only E0 is implemented (the IR interpreter as a full
reference path — Decision 79). Stages E1–E3 are gated on the project
committing to two new guest architectures. Nothing on this page
beyond §"E0" describes committed code.
The problem it solves¶
After the arch-neutral IR (primer §3), the only
per-ISA artifact in the execution stack is the frontend: a decoder
plus a translate_block that emits TERO IR, plus a state-layout header.
Writing one by hand has two cost centres:
- Opcode tables and decoders — mechanical, voluminous, and transcription-error-prone. A bit-field typo produces a wrong instruction that may execute plausibly for millions of cycles.
- Instruction semantics — intellectually irreducible (someone must
state what
ADDdoes), but currently expressed as C++ builder calls, which are harder to audit against the ISA manual's pseudocode than a declarative form would be.
EmuGen targets both: a single declarative description per ISA, from which the frontend artifacts are generated at build time. Prior art exists in table-driven emulator generators (TableGen-style pipelines used by industrial full-system simulators); ADR-006 adopts the generation idea while rejecting the part that conflicts with TERO's validation architecture (§"The hard rule" below).
What it will consume and emit¶
Input — one ISA description with three sections:
| Section | Content | Generated output |
|---|---|---|
| State | registers, widths, layout | the GuestState offset header (the per-arch *_layout.hpp) |
| Encodings | opcode tables, bit fields | the decoder |
| Semantics | per-instruction compositions of TERO IR ops | the IArchFrontend::translate_block bodies |
Output is C++ source, generated as a CMake build step, reproducible and reviewable — to every downstream consumer (block cache, IR interpreter, tiered JIT, GDB integration) a generated frontend is indistinguishable from a hand-written one.
What it will NOT generate¶
- Interpreters or JITs. TERO has one IR interpreter and one LLVM backend, already written and validated; the generator only fabricates their per-ISA feeder. (This is the rejected half of the prior art, which generates whole interpreter cores per ISA.)
- The switch interpreter. Frozen as the SPARC oracle (Decision 80); hand-written permanently.
- Peripherals or SoC composition. Owned by the entity model and the
.teromachine files (Peripheral system).
The hard rule: the generator emits TERO IR, never LLVM IR¶
Generating LLVM IR directly would bypass the project's own IR and orphan
everything keyed on it: the IR interpreter (the reference path every
IR-only architecture depends on for trace, single-step, and lockstep —
Decision 79), the (PC, mode) block cache, the tiered compilation
pipeline, and the per-op guest metadata GDB consumes. The full argument
is the primer's §6;
ADR-006 freezes the conclusion: LLVM remains exactly what it is today —
the runtime backend, fed by TERO IR.
ADR-006 in brief¶
Recorded as Decision 80; full text in
plans/multiarch-emugen-frontends.md.
- The switch interpreter is frozen — SPARC-only, no new features, kept indefinitely as oracle and trace path.
- A new guest architecture is IR-only — it implements
IArchFrontend+IArchitectureand nothing else; its reference path is the IR interpreter, its fast path the tiered JIT. - EmuGen generates frontends, nothing deeper — and emits TERO IR.
- Each new architecture gets an external oracle — lockstep against an independent emulator plays the role SIS plays for SPARC.
- Candidates (revisited when the gate opens): RISC-V RV32 (NOEL-V class) as architecture #2, ARM AArch32 as #3.
Staged plan and status¶
| Stage | Content | Status |
|---|---|---|
| E0 | IR interpreter as full reference path: per-instruction step hook, exact single-step, interior breakpoints, instruction-level lockstep | Done 2026-06-10 (Decision 79, plans/e0-ir-reference-path.md) |
| E1 | Architecture #2 with a hand-written frontend + the decoder-table generator (the mechanical 80% of the win, at 20% of the risk) | Deferred — gated |
| E2 | Full EmuGen: the semantics DSL, designed only once two hand-written frontends (SPARC + arch #2) exist as its corpus | Deferred — gated |
| E3 | SPARC retrofit: describe SPARC V8 in the DSL, compare generated IR block-by-block against the hand-written frontend over the RTEMS corpus; once validated, the generated frontend becomes production (amended 2026-06-10 from optional to mandatory) | Deferred — gated |
The gate ("rule of three"): the DSL is not designed until two real frontends exist to inform it — a generator designed from one example (SPARC) would encode abstractions no second architecture has verified. E0 ran early because it is the capability everything else assumes, and it was provable on SPARC against the existing oracle.
Validation per architecture¶
| Layer | SPARC | New ISAs (E1+) |
|---|---|---|
| Semantic redundancy | Switch oracle vs JIT, bit-exact | — (single semantic source; accepted by ADR-006) |
| Lowering cross-check | IR interpreter vs JIT | IR interpreter vs JIT |
| Instruction-level lockstep | run_ir_diff(per_insn) vs oracle (E0) |
same harness, vs the external oracle |
| External oracle | SIS | per-ISA independent emulator |
Cross-cutting exit criterion for every stage: SPARC RTEMS pass rate ≥ the rate at stage entry, and no new mandatory runtime dependencies (the generator is a build-time tool; external oracles are test-time tools).
Pointers¶
- Binary translation — a primer — the stack this generator feeds.
- Adding a frontend — the manual procedure EmuGen
automates; its contract (builder calls,
set_cur_pcstamping,no_stop_tail) is exactly what generated code must satisfy. plans/multiarch-emugen-frontends.md— ADR-006 full text and stage details.