# zisp **Repository Path**: mirrors_floatdrop/zisp ## Basic Information - **Project Name**: zisp - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-10-07 - **Last Updated**: 2026-05-09 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # zisp: Compile-Time PEG Experiments in Zig `zisp` is a proof of concept that asks how far Zig's compile-time machinery and the new labeled `switch` `continue` syntax can push parser generation. The project starts from high-level PEG (Parsing Expression Grammar) declarations and lowers them, at compile time, into tightly-specialized VM loops that read more like hand-written interpreters than generic parser combinators. ## Why this exists The repository doubles as a playground for a few ideas: - **`comptime`-driven codegen** – Grammar rules are analysed and expanded during compilation, producing concrete bytecode tables and AST layouts before the program ever runs. - **Switch-label `continue`** – The VM core relies on Zig 0.15's ability to `continue :vm next_ip` directly from inside nested control flow, giving a threaded-interpreter style loop without manual `goto`s. - **Runtime that still feels ergonomic** – Even with all the specialization, the public API stays close to "declare a grammar, parse a buffer, walk a typed AST". - **Transparency of the generated code** – We want to be able to inspect the lowered form easily (LLVM IR, assembly, AST dumps) and reason about the cost model. ## Repo layout - `src/peg.zig` – Grammar DSL, compile-time compilation of PEG rules, and AST helpers. - `src/vm.zig` – The bytecode interpreter/VM with loop-mode execution using labeled `switch` `continue`. - `src/main.zig` – CLI harness that exercises the parser and prints traces/ASTs. - `docs/vm-loop-llvm.md` – Walkthrough of how to force Zig/LLVM to emit the specialized loop for `demoGrammar`. - `vm_loop_demo.zig` – Minimal driver used by the docs to instantiate the VM in isolation. ## Getting started You need Zig 0.15.1 or newer (the build script uses the labeled-`continue` feature). The usual workflow: ```bash zig build run # build the CLI and run it zig build test # run the grammar + VM unit tests ``` The CLI parses a miniature Zig subset (`src/zigmini`). Today that grammar still rides on the older `pegvm.zig` backend simply because it hasn't been ported over yet, but the shape mirrors the new `peg.zig` + `vm.zig` pipeline. For a quick feel of the existing system, run `zig run src/peg.vm`—that’s the main entry point that prints the bytecode, step trace, and AST using the original VM. Try passing `--dump-pegcode` for a readable dump of the generated bytecode. #### Sample `zig run src/peg.zig` Running the grammar module directly prints the compiled bytecode, a step-by-step trace for a demo input, and the resulting typed forest: ``` $ zig run src/peg.zig &Value: 0 push ->3 1 call ->5 2 drop ->4 3 call ->15 4 done &Integer: 5 open 6 read 1..9 7 next 8 open 9 read 0..9* 10 shut ... Parsing: "[[1] [2]]" [ | 0000 push ->3 | 0001 call ->5 |-| 0005 open ... ✓ (156 steps) Array [0..16) "[[1] [2] [4096]]" └─values: 3 items ├─[0] Value: .array -> Integer d='1' ├─[1] Value: .array -> Integer d='2' └─[2] Value: .array -> Integer d='4', ds="096" ``` ### Forest shape The VM builds a "typed forest": every grammar rule owns a dedicated growable array, and siblings for a rule end up stored contiguously. That layout makes it cheap to gather a rule’s results and to reinterpret slices as strongly-typed structs/unions when you walk the AST later. In the demo run the root rule is `Array`, whose `values` field is emitted as a `Kleene` list of `Value` nodes; each `Value` lowers to either an `Integer` or another `Array`, and you can see the nesting clearly in the forest dump: ``` Array: └─values: 3 items ├─[0] Value: .array │ └─Array: │ └─values: 1 items │ └─[0] Value: .integer │ └─Integer: │ ├─d: '1' [2] │ └─ds: (empty) ├─[1] Value: .array │ └─Array: │ └─values: 1 items │ └─[0] Value: .integer │ └─Integer: │ ├─d: '2' [6] │ └─ds: (empty) └─[2] Value: .array └─Array: └─values: 1 items └─[0] Value: .integer └─Integer: ├─d: '4' [10] └─ds: "096" [11..14) ``` The full trace (with detailed stack annotations and AST layout) is available any time you want to sanity-check how a grammar runs. ### Inspecting the generated code To look directly at the loop-mode codegen for the included `demoGrammar`, follow the steps in `docs/vm-loop-llvm.md`. The short version: ```bash zig build-exe vm_loop_demo.zig \ -O ReleaseFast -fllvm \ -femit-llvm-ir=zig-out/vm_loop_demo.ll \ -femit-asm=zig-out/vm_loop_demo.s ``` The emitted `.ll` and `.s` highlight how the interpreter turns into a computed-goto state machine with literal bitsets for character classes. ### How specialization actually looks Because the VM bytecode is baked during `comptime`, the “interpreter” that ships in the binary already knows the exact instruction stream. `VM(G).next` gets monomorphized for the grammar, the opcode array becomes a constant, and the main loop lowers to one giant `switch`/jump-table keyed on the instruction pointer. In other words we don’t even switch on an opcode enum at runtime; we switch on the literal IP and jump straight to the inlined code for that specific instruction. A toy sketch of the shape you get looks like this: ```zig // Pseudocode, but this is the flavour LLVM ends up with. vm: switch (ip) { 0 => { // read '[' if (self.text[self.sp] != '[') return error.ParseFailed; self.sp += 1; continue :vm 1; }, 1 => { // call Skip rule try self.calls.append(.{ .return_ip = 2, .target_ip = 31, ... }); continue :vm 31; }, 2 => { // next field, etc. ...; continue :vm 3; }, else => return; } ``` Every case carries the rule metadata, call targets, character sets, and struct bookkeeping as compile-time constants. In release builds the control flow resembles an assembler hand-written threaded interpreter for a program that was known when you built the binary. The deep dive in `docs/vm-loop-llvm.md` shows the LLVM view, but even at the Zig level you can reason about the VM as a tightly unrolled state machine specialized to the grammar you compiled. ## Project status This is intentionally exploratory code. Expect breakage, rapid refactors, and plenty of TODOs around: - Enriching the grammar DSL with more PEG operators. - Experimenting with alternative backends (direct threaded code vs VM bytecode). - Measuring performance against other PEG implementations. - Refining the AST representation to reduce allocations. If you're curious about a specific angle—memoization strategies, labelled-switch ergonomics, or further `comptime` tricks—open an issue or hack on a branch. The more weird experiments, the better. ## License MIT. See `LICENSE` for details.