Kore has ~140 tokens and 142 opcodes. Programs are sequences of stack operations — push values, apply functions, compose by concatenation. There's a proof checker that statically verifies stack effects and a capability system that gates side effects.
How I got here: I wanted to explore whether a compiler could serve as the reward function for training an LLM to write code. Most code-gen benchmarks use test suites or human eval. But if the compiler can verify correctness before execution — accept or reject in microseconds — you don't need a test harness at all.
That meant I needed two things: a language small enough for an LLM to learn the full vocabulary, and a security model that makes generated programs safe by construction.
The security model (P4: capabilities attenuate): Every side effect in Kore is gated by a capability — io for print/rand, fs for file access, exec for shell commands. The compiler auto-detects what a program needs and tags the binary. At runtime, you grant permissions explicitly (--allow io) or it won't run. When you spawn a sandboxed child, it gets parent_caps & requested_caps — it can never escalate. Pure programs (no capabilities) are deterministic and run on any backend.
This matters because an AI agent writing Kore tools gets compile-time proof that the tool can't touch the filesystem or exec shell commands. Not sandboxing — the proof checker rejects it statically.
I fine-tuned DeepSeek-R1-14B (QLoRA, Unsloth) on 478 curriculum tasks, it worked pretty well. Will probably try a MCTS+RL experiment to see if it can learn to write programs to solve games and optimization problems. The ultimate goal would be having it write its own compiler and self-host.