I’m an academic working on reliability for high-stakes LLM use (coding + scientific/medical workflows).
This repo proposes a “fail-closed” certification gate: an output only ships if it passes published checks; otherwise it rejects. The benchmark emphasis is on false-ship rate (shipped-but-wrong), not just accuracy.
Looking for critique and real failure cases: where do LLMs most often produce plausible outputs that are silently wrong (C#/.NET, SQL, Python notebooks, data extraction, etc.)? What validation checks would you consider non-negotiable?
mahmood726•53m ago
Just open-sourced Burhan (TruthCert): a fail-closed “ship gate” for LLM outputs.
Goal: cut false-ships (shipped-but-wrong) in coding + research workflows using multi-witness verification + validators + provenance.
Repo: https://github.com/mahmood726-cyber/Burhan
mahmood726•1h ago