It looks like quite a lot of complexity for such gain. 30-40% is roughly what context-threading would buy you [1]. It takes relatively little code to implement - only do honest assembly for jumps and conditional branches, for other opcodes just emit a call to interpreter's handler. Reportedly, it took Apple just 4k LOC to ship first JIT like that in JavaScriptCore [2].
Also, if you haven't seen it, musttail + preserve_none is a cool new dispatch technique to get more mileage out of plain C/C++ before turning to hand-coded assembly/JIT [3]. A step up from computed goto.
[1] https://webdocs.cs.ualberta.ca/~amaral/cascon/CDP05/slides/C...
[2] https://webkit.org/blog/214/introducing-squirrelfish-extreme...
https://github.com/dan4thewin/FreeForth2/tree/master
This is a Forth with a few tricks, namely using flow control instead of a compilation switch flag. This, always compiling into an eval buffer before execution, and use of macros, allows you to unroll a function/word/expression before execution, which makes it fast.
Macros can be used to do stack caching (though it doesn't here) and cross compilation etc.
Lastly, Freeforth caches the top two stack items in registers, so at compile time it avoids swap by register renaming.
This all is quite a different approach and somewhat language specific. Just wanted to highlight the variety, as uxn is not actually that far from forth and has such a different approach.
Rochus•3d ago