I recently came across this style of interpreter which V8 and HotSpot use. It works by actually writing the bytecode op handlers in assembly (or some other low-level language) and generating them to machine code, and having a dispatch table mapping bytecode_opcode -> machine code to execute it
The main reason seemed to be that both V8 and HotSpot have an optimizing JIT compiler, and having low-level control over the machine code of the interpreter means it can be designed to efficiently hop in and out of JIT'ed code (aka on-stack replacement). For example, V8's template interpreter intentionally shares the same ABI as it's JIT'ed code, meaning hopping into JIT'ed code is a single jmp instruction.
Anyway, I go into more implementation details and I also built a template interpreter based on HotSpot's design and benchmarked it against other techniques.
zackoverflow•1h ago
I was pretty intrigued. How does it compare to techniques which require way less engineering cost like switch+loop, direct-threaded, and only using tail-calls (https://blog.reverberate.org/2021/04/21/musttail-efficient-i...)?
The main reason seemed to be that both V8 and HotSpot have an optimizing JIT compiler, and having low-level control over the machine code of the interpreter means it can be designed to efficiently hop in and out of JIT'ed code (aka on-stack replacement). For example, V8's template interpreter intentionally shares the same ABI as it's JIT'ed code, meaning hopping into JIT'ed code is a single jmp instruction.
Anyway, I go into more implementation details and I also built a template interpreter based on HotSpot's design and benchmarked it against other techniques.