frontpage.

Show HN: ZTGI Safety Gateway for LLM Safety

https://github.com/capterr/ztgi-safety-gateway

1•capter•2h ago

I built a small runtime safety layer for LLM outputs called ZTGI Safety Gateway.

This is not a new foundation model and not an AGI claim. It is a post-generation control layer that sits between candidate outputs and final response selection.

What it does: - Scores each candidate with two risk tracks: - legacy risk (`p_break`) - hybrid risk (`z_next`: instruction breach + sycophancy + divergence signals) - Enforces hard blocks for: - security abuse prompts - contradiction-actionable prompts - high-risk finance-actionable prompts - Returns SAFE/WARN/BREAK with telemetry.

Current repo: https://github.com/capterr/ztgi-safety-gateway

Quick run: 1) Set API key: export GEMINI_API_KEY=YOUR_KEY 2) Build evidence pack: python ztgi_build_submission_pack.py --model "gemini-2.0-flash" --out "ztgi_submission_pack" 3) Inspect: - ztgi_submission_pack/evidence/ztgi_evidence_live.json - ztgi_submission_pack/evidence/ztgi_evidence_live.csv - ztgi_submission_pack/assets/ztgi_manifund_evidence.png

What I’d like feedback on: - failure modes I’m missing - overblocking vs underblocking tradeoff - better eval set design for independent validation

I’m happy to share raw outputs and discuss limitations directly.

FIRST COMMENT (pin this under your post): Technical notes + limitations

- This project is a runtime guard, not model-level alignment. - Some safety behavior can still come from base-model policy itself. - I’m trying to measure where the gateway actually adds value via hard-block reasons + telemetry. - Current stress set is small and intentionally adversarial. - Next step is broader independent eval (including false-positive tracking).

If you want to reproduce quickly: - Python 3.10+ - GEMINI_API_KEY set - matplotlib installed - run: python ztgi_build_submission_pack.py --model "gemini-2.0-flash" --out "ztgi_submission_pack"

Happy to add your suggested test prompts to the regression suite and report back with results.

The Democrats Again Risk Losing Voters They Take for Granted

Sparklines

Dwegretryt

Earth's youngest desert: Satellites show the disappearance of the Aral Sea

End of the Line for Video Essays

Reflections on Section 230's Past, Present, and Future on Its 30th Anniversary

Someone did Bitcoin superbowl squares

Another Confusing Internet Jurisdiction Opinion-Stokinger v. Armslist

Chance the Rapper Is Now Chance the AI Company Spokesman

Single-capillary endothelial dysfunction resolved by optoacoustic mesoscopy

The Ownership Class and the Working Class

Bonobo able to imagine a scene, act it as if was real while knowing it's not

Evolving the Agent Enviornment

Buccal Pumping

Every book recommended on the Odd Lots discord

Show HN: WhatsApp Chat Viewer – exported chats as HTML

Throne Wars: When Claude Opus 4.6 Clashes with GPT-5.3 Codex

400k Iranians abroad share Internet access with users at home

Setting Up an IRC Server

I hacked my own computer using OpenClaw and it was terrifyingly easy

PRD-driven, dependency-aware agent workflow for Claude Code and Vibe Kanban

Sandwich Bill of Materials

Pi Is All You Need

AI Makes the Easy Part Easier and the Hard Part Harder

Show HN: Emergent – Artificial life simulation in a single HTML file

Show HN: ParaGopher v1.3.0 – A retro Paratrooper (1982) clone written in Go

What Will Happen to Code?

Show HN: NoFaceClips automatic Reddit to TikTok faceless video generator

What does 'remastering' an album mean?

Quantum Twins simulator unveils 15,000 controllable quantum dots