frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: ZTGI Safety Gateway for LLM Safety

https://github.com/capterr/ztgi-safety-gateway
1•capter•2h ago
I built a small runtime safety layer for LLM outputs called ZTGI Safety Gateway.

This is not a new foundation model and not an AGI claim. It is a post-generation control layer that sits between candidate outputs and final response selection.

What it does: - Scores each candidate with two risk tracks: - legacy risk (`p_break`) - hybrid risk (`z_next`: instruction breach + sycophancy + divergence signals) - Enforces hard blocks for: - security abuse prompts - contradiction-actionable prompts - high-risk finance-actionable prompts - Returns SAFE/WARN/BREAK with telemetry.

Current repo: https://github.com/capterr/ztgi-safety-gateway

Quick run: 1) Set API key: export GEMINI_API_KEY=YOUR_KEY 2) Build evidence pack: python ztgi_build_submission_pack.py --model "gemini-2.0-flash" --out "ztgi_submission_pack" 3) Inspect: - ztgi_submission_pack/evidence/ztgi_evidence_live.json - ztgi_submission_pack/evidence/ztgi_evidence_live.csv - ztgi_submission_pack/assets/ztgi_manifund_evidence.png

What I’d like feedback on: - failure modes I’m missing - overblocking vs underblocking tradeoff - better eval set design for independent validation

I’m happy to share raw outputs and discuss limitations directly.

FIRST COMMENT (pin this under your post): Technical notes + limitations

- This project is a runtime guard, not model-level alignment. - Some safety behavior can still come from base-model policy itself. - I’m trying to measure where the gateway actually adds value via hard-block reasons + telemetry. - Current stress set is small and intentionally adversarial. - Next step is broader independent eval (including false-positive tracking).

If you want to reproduce quickly: - Python 3.10+ - GEMINI_API_KEY set - matplotlib installed - run: python ztgi_build_submission_pack.py --model "gemini-2.0-flash" --out "ztgi_submission_pack"

Happy to add your suggested test prompts to the regression suite and report back with results.

The Democrats Again Risk Losing Voters They Take for Granted

https://www.nytimes.com/2026/02/08/opinion/ai-democrats-jobs-economy.html
1•doener•1m ago•0 comments

Sparklines

https://indieweb.org/sparkline
1•spacebuffer•3m ago•0 comments

Dwegretryt

https://gist.github.com/ebaerakhan
1•faresfa•4m ago•0 comments

Earth's youngest desert: Satellites show the disappearance of the Aral Sea

https://universemagazine.com/en/earths-youngest-desert-satellites-show-the-disappearance-of-the-a...
3•stared•9m ago•0 comments

End of the Line for Video Essays

https://pluralistic.net/2026/02/07/aimsters-revenge/
1•hn_acker•10m ago•0 comments

Reflections on Section 230's Past, Present, and Future on Its 30th Anniversary

https://blog.ericgoldman.org/archives/2026/02/reflections-on-section-230s-past-present-and-future...
1•hn_acker•17m ago•0 comments

Someone did Bitcoin superbowl squares

https://sbsqr.vercel.app/
1•sfffs•18m ago•0 comments

Another Confusing Internet Jurisdiction Opinion-Stokinger v. Armslist

https://blog.ericgoldman.org/archives/2026/02/another-confusing-internet-jurisdiction-opinion-thi...
1•hn_acker•19m ago•1 comments

Chance the Rapper Is Now Chance the AI Company Spokesman

https://stereogum.com/2488412/chance-the-rapper-is-now-an-ai-company-spokesman/news
1•throwoutway•24m ago•0 comments

Single-capillary endothelial dysfunction resolved by optoacoustic mesoscopy

https://www.nature.com/articles/s41377-025-02103-6
2•PaulHoule•33m ago•0 comments

The Ownership Class and the Working Class

https://satisologie.substack.com/p/the-ownership-class-and-working-class
3•sova•34m ago•1 comments

Bonobo able to imagine a scene, act it as if was real while knowing it's not

https://arstechnica.com/science/2026/02/watch-kanzi-the-bonobo-pretend-to-have-a-tea-party/
1•i-blis•35m ago•2 comments

Evolving the Agent Enviornment

https://github.com/harivansh-afk/agentikube
1•rathiharivansh•39m ago•1 comments

Buccal Pumping

https://en.wikipedia.org/wiki/Buccal_pumping
2•thunderbong•41m ago•0 comments

Every book recommended on the Odd Lots discord

https://odd-lots-books.netlify.app/
1•muggermuch•41m ago•0 comments

Show HN: WhatsApp Chat Viewer – exported chats as HTML

https://github.com/rodrigodesalvobraz/whatsapp-chat-viewer
1•rodrigobraz•42m ago•0 comments

Throne Wars: When Claude Opus 4.6 Clashes with GPT-5.3 Codex

http://yeasy.blogspot.com/2026/02/throne-wars-when-claude-opus-46-clashes.html
1•yeasy•43m ago•0 comments

400k Iranians abroad share Internet access with users at home

https://www.iranintl.com/en/202602084487
1•ukblewis•46m ago•0 comments

Setting Up an IRC Server

https://www.neatnik.net/setting-up-an-irc-server/
3•rickcarlino•47m ago•0 comments

I hacked my own computer using OpenClaw and it was terrifyingly easy

https://www.androidauthority.com/openclaw-ai-prompt-injection-3636904/
2•jrmg•48m ago•1 comments

PRD-driven, dependency-aware agent workflow for Claude Code and Vibe Kanban

https://github.com/ericblue/claude-vibekanban
2•ericblue•52m ago•1 comments

Sandwich Bill of Materials

https://nesbitt.io/2026/02/08/sandwich-bill-of-materials.html
2•zdw•53m ago•0 comments

Pi Is All You Need

https://p10q.com/pi_is_all_you_need/
2•tmsh•53m ago•0 comments

AI Makes the Easy Part Easier and the Hard Part Harder

https://www.blundergoat.com/articles/ai-makes-the-easy-part-easier-and-the-hard-part-harder
26•weaksauce•55m ago•5 comments

Show HN: Emergent – Artificial life simulation in a single HTML file

https://emergent-ivory.vercel.app/
2•usernameis42•55m ago•0 comments

Show HN: ParaGopher v1.3.0 – A retro Paratrooper (1982) clone written in Go

https://github.com/ystepanoff/ParaGopher
1•ystepanoff•56m ago•0 comments

What Will Happen to Code?

https://registerspill.thorstenball.com/p/joy-and-curiosity-70-d85
1•kristianp•58m ago•0 comments

Show HN: NoFaceClips automatic Reddit to TikTok faceless video generator

https://nofaceclips.com
1•TallSession9532•59m ago•0 comments

What does 'remastering' an album mean?

https://www.popsci.com/science/what-does-remastering-an-album-mean/
3•wjb3•59m ago•0 comments

Quantum Twins simulator unveils 15,000 controllable quantum dots

https://phys.org/news/2026-02-quantum-twins-simulator-unveils-dots.html
1•rbanffy•1h ago•0 comments