On function-calling determinism: you're right that small-model JSON adherence drifts even at temp 0. localaik does not fully solve this. Two paths that help:
1. Constrained decoding. llama.cpp supports GBNF grammars for shape-strict generation. Passing a grammar via the upstream `grammar` field gives you JSON that parses and matches the schema by construction. It does not guarantee field VALUES (the model can still pick "wrong" entity names), but it eliminates the parsing-flake class of failures. Adding first-class support for this in the translation layer is on the roadmap.
2. Assertion strategy. Test the SHAPE of the tool call (was the right tool invoked? did it receive an arg named X?), not exact arg values. This is good practice for any LLM CI regardless of localaik. The "rerun on flake" anti-pattern is what I'm trying to avoid by being deterministic in the proxy layer, not the model layer.
GHA service container: glad it's useful. The 5-min cold-start budget (--health-retries 30) is the one tuning knob most folks miss. Steal away, that's why it's MIT.
jeremyfelps•17m ago
Curious about the function-calling determinism. Even at temp 0, small-model JSON schema adherence drifts, especially on rare function shapes. Are you stabilizing it somehow for CI assertions, or just 'rerun on flake'?
The PDF-to-images path is interesting. Multimodal agent tests are where I keep hitting walls. Out of scope for v1, or queued?
GHA service container pattern is the right call. Stealing this idea.