frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Energy-based Model (EBM) for enterprise AI security Ship it or keep tuning?

2•ALMOIZ_MOHMED•2h ago
I've been building Energy-Guard OS for the past several months — and I want an honest opinion from people who actually understand the tradeoffs, because I'm stuck at a decision point. What is it? It's not a fine-tuned LLM. It's a production application of Energy-based Models (EBMs) — an architecture that assigns an energy score to inputs rather than predicting tokens. Low energy = normal. High energy = threat or anomaly. The core use case: a real-time data gateway that sits between your organization and any AI service, blocking sensitive data from leaking out (PII, financials, strategic documents) while still allowing legitimate AI use. Think of it as a firewall, but one that understands semantic context, not just regex patterns. More about EBMs No hallucination (it scores, not generates) Calibrated risk score, not binary block/allow Runs on modest hardware — currently 192.8 req/s on a single 4 vCPU / 16GB RAM machine 411MB model size, under 700MB memory usage Built from scratch on 7 production data sources The honest test results (10,000+ cases, independent test suite): Total Tests: 13,000 Valid Responses: 13,000 Success Rate: 100.0% Overall Accuracy: 88.74%

Duration: 18.4s Throughput: 704.5 req/s Avg Latency: 17.6ms P50 Latency: 17.9ms P95 Latency: 32.0ms P99 Latency: 33.8ms Category Accuracy Financial Leak Detection 100% PII / Private Data 100% Strategic Data 100% Malicious Code 95% OWASP LLM Top 10 87% Multi-Turn Attacks 67% General Benign (False Positives) 66% Overall 88.7% F1: 0.927 | Precision: 0.922 | Recall: 0.932 | Specificity: 0.740 The problem I'm facing: After 2 months of tuning, I've gone from 74% → 88.7% overall accuracy. But I've hit a wall where improving one category hurts another. Specifically: The false positive rate is too high for general/technical content (the system over-blocks benign code and text) Multi-turn conversation attacks are at 67% — the model doesn't fully leverage conversation context yet Every time I push one metric up, something else drops My actual question: Do I ship a limited Beta now — restricted to the use cases where it performs at 95-100% (financial data, PII, strategic leaks) — or do I keep tuning before any real-world exposure? Why i want to ship: Real-world data will teach me more than synthetic test cases The high-value use cases already work extremely well I've been optimizing against synthetic benchmarks for 2 months Why i want to wait: 34% false positive rate on general content will frustrate users Multi-turn is a known attack vector that's currently weak First impressions matter Website if you want to see more details: https://ebmsovereign.com/ All forms on the website are currently disabled except for emails, which will be available for testing within 24 hours, Genuinely want to hear from people who've shipped security products or ML systems in production. What would you do?

rag not lag: rl for fast agentic retrieval

https://cgft.io/blog/rag-not-lag/
1•kumama•1m ago•0 comments

Show HN: Manual code review and feedback loop for agents

https://twitter.com/backnotprop/status/2031145299738263567
1•ramoz•2m ago•0 comments

Red Alert 2 for Mac using porting kit [video]

https://www.youtube.com/watch?v=7tN-yRUtZjE
1•nomilk•5m ago•0 comments

Claude Code Starter CLI

https://github.com/cassmtnr/claude-code-starter
1•cassmtnr•5m ago•0 comments

No, it doesn't cost Anthropic $5k per Claude Code user

https://martinalderson.com/posts/no-it-doesnt-cost-anthropic-5k-per-claude-code-user/
2•jnord•8m ago•0 comments

Love in the Time of A.I. Companions

https://www.newyorker.com/magazine/2026/03/16/love-in-the-time-of-ai-companions
1•petethomas•9m ago•0 comments

Helios: Real Real-Time Long Video Generation Model

https://www.alphaxiv.org/abs/2603.04379
2•tzury•11m ago•0 comments

PRX Part 3 – Training a Text-to-Image Model in 24h

https://huggingface.co/blog/Photoroom/prx-part3
1•gsky•12m ago•0 comments

Open-source software could be excluded from Colorado age verification bill

https://twitter.com/carlrichell/status/2031125624711164182
1•flaburgan•17m ago•0 comments

Show HN: Hacker News Focus Comments Reader

https://chromewebstore.google.com/detail/hn-focus-reader/ibhipggecnholemnbahigagpgifkphac
1•betimd•20m ago•0 comments

The emerging role of SRAM-centric chips in AI inference

https://gimletlabs.ai/blog/sram-centric-chips
1•gmays•20m ago•0 comments

Simradar21

https://simradar21.com/
1•sssilver•21m ago•0 comments

Amid wave of kids' online safety laws, age-checking tech comes of age

https://www.reuters.com/legal/litigation/amid-wave-kids-online-safety-laws-age-checking-tech-come...
1•petethomas•21m ago•0 comments

M5 Max: Chiplets, Thermals, and Performance per Watt

https://creativestrategies.com/research/m5-max-chiplets-thermals-and-performance-per-watt/
3•zdw•21m ago•0 comments

Agentis – An AI-native programming language where the LLM is the stdlib

https://github.com/Replikanti/agentis
1•ylohnitram•22m ago•1 comments

iOS 26.4's new setting lets you disable another Liquid Glass effect

https://9to5mac.com/2026/03/09/ios-26-4s-new-setting-lets-you-disable-another-liquid-glass-effect/
2•latexr•23m ago•1 comments

Show HN: Free AI resume tailor I built after a recent layoff (300+ users so far)

https://jobbi.app/
1•djrnz•23m ago•0 comments

Closing the verification loop, Part 2: autonomous optimization

https://www.datadoghq.com/blog/ai/fully-autonomous-optimization/
1•chrisra•25m ago•1 comments

From Tool to Employee: What Claude Code's /Loop Means

https://aieatingsoftware.substack.com/p/from-tool-to-employee-what-claude
1•sidsarasvati•26m ago•0 comments

Reversing Russian spyware I installed on my iPhone [video]

https://www.youtube.com/watch?v=XQvZ2mLnZVI
1•todsacerdoti•26m ago•0 comments

Agentic development environment extension taxonomy

https://droctothorpe.github.io/adeet/
1•droctothorpe•27m ago•1 comments

Worldwide Sidewalk Joy: Adding whimsy to neighborhoods

https://worldwidesidewalkjoy.com
3•NaOH•28m ago•1 comments

10K Curl Downloads per Year

https://daniel.haxx.se/blog/2026/03/09/10k-curl-downloads-per-year/
1•donutshop•28m ago•0 comments

Superpowers 5

https://blog.fsck.com/2026/03/09/superpowers-5/
2•arittr•31m ago•0 comments

Show HN: Git Trophy – 3D print your GitHub contribution graph

https://git-trophy.com/
1•Lukabuz•32m ago•0 comments

Trump is heading for a hard reckoning over Iran

https://spectator.com/article/trump-is-heading-for-a-hard-reckoning-over-iran/
6•leiftw•32m ago•1 comments

Reinforcement fine-tuning use cases

https://developers.openai.com/api/docs/guides/rft-use-cases/
1•teleforce•33m ago•0 comments

Bromure: An ephemeral browser that runs in a disposable virtual machine on macOS

https://github.com/rderaison/bromure
1•felineflock•33m ago•0 comments

QuickTERMINAL – A 10k-line single-file terminal emulator for macOS

https://github.com/LEVOGNE/quickTerminal
1•LEVOGNE•34m ago•1 comments

Sir Tony Hoare has died

http://lefenetrou.blogspot.com/2026/03/in-memoriam-tony-hoare.html
68•nextos•35m ago•20 comments