frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Perstack – Containerized harness, 5 tests with full logs and API cost

https://github.com/perstack-ai/perstack
1•FL4TLiN3•1h ago
I burned out after 2 years of building agentic apps for clients. I'd become the single point of failure with no backup. Requirements gathering, prompt engineering, app development, sandboxing, everything funneled through whoever happened to be the most senior dev on the team, which was always me.

The root cause wasn't the team or clients. It was how we designed the agent: there were no clear boundaries unless you adopted a well-known agent framework.

I started this project because drawing clear boundaries that developers are already familiar with felt like the right thing to do.

To dogfood it, I defined a game-dev expert with a simple topology (plan → build → verify + coordinator) and ran the same task across 5 models.

Here are the results: https://github.com/perstack-ai/demo-catalog

The query was simple: "Create a Wizardry-like dungeon crawler..."

For evaluation, I focused on just three things. (1) Does the expert adhere to my instructions? (2) Is the outcome verified and actually working? (3) Is the API cost affordable?

Why these three? Because even if the harness architecture is solid, an agent needs to be evaluated on instruction adherence, minimum quality assurance, and cost efficiency. That's what I learned from working with clients.

What I noticed:

- 3 out of 5 models followed the full plan → build → verify pipeline and produced verified working output, with no provider-specific tuning. The topology was defined once and ran as-is.

- Claude (4.6 Opus + 4.6 Sonnet) produced the richest output with flawless instruction adherence. It also achieved the highest cache hit rate (96%) among all providers, but pricing still pushed the total to 8× the nearest competitor.

- Kimi K2.5 produced excellent output at $3.43 and was the most faithful to delegation. In this test, it outperformed GPT and Gemini in both instruction adherence and quality.

- Gemini (3.1 Pro + 3.0 Flash) followed the full pipeline and produced a verified working game. But its output is buggier than GPT's and almost unplayable.

- GPT (5.4 + 5-mini) was the fastest and cheapest, but skipped the verify step entirely. It called build three times instead of following the pipeline.

- MiniMax M2.5 ignored instructions entirely and made a browser-based HTML game. Instruction adherence is a challenge, but the newest version, M2.7, was recently announced with adherence improvements, so I'm looking forward to it.

It's one task from a demo catalog. But the full execution logs for every run are in the repo, so you can see exactly what each model did and reproduce it yourself.

NYC ends criminal summonses for cyclists, e-bike riders

https://gothamist.com/news/nyc-ends-criminal-summonses-for-cyclists-e-bike-riders-in-policy-shift
1•geox•37s ago•0 comments

We Spoke to Game Devs and All of Them Hate DLSS 5

https://kotaku.com/we-spoke-to-game-devs-and-all-of-them-hate-dlss-5-what-the-f-nvidia-2000680059
1•tastyface•1m ago•0 comments

Rethinking open source mentorship in the AI era

https://github.blog/open-source/maintainers/rethinking-open-source-mentorship-in-the-ai-era/
1•mikece•1m ago•0 comments

Bifrost CLI and Codex CLI: One Command to Set Up OpenAI Agent with Any Model

https://github.com/maximhq/bifrost
1•aanthonymax•1m ago•0 comments

Artifact Production Just Got Cheap – What remains when code costs nothing

https://dekodiert.de/en/articles/artefaktproduktion
1•sdoering•2m ago•0 comments

Apple Urges iPhone Users Running Outdated iOS Versions to Update Immediately

https://www.macrumors.com/2026/03/19/apple-outdated-ios-update-warning/
1•tosh•2m ago•0 comments

New technology will help satellites avoid collisions in space

https://www.lanl.gov/media/news/0220-satellites-avoid-collisions
1•LAsteNERD•2m ago•0 comments

Country Budget Allocation Simulator – EconoSIM

https://econosim.burduja.me/
1•FrozenSynapse•3m ago•0 comments

NanoGPT Slowrun: 10x Data Efficiency with Infinite Compute

https://qlabs.sh/10x
1•sdpmas•3m ago•0 comments

Europe sleepwalked into yet another energy crisis

https://www.bbc.com/news/articles/c24de9e97vno
3•asplake•3m ago•0 comments

I found a hidden library of podcasts (and it's brilliant)

https://coleslaw.bearblog.dev/i-found-a-hidden-library-of-podcasts-and-its-brilliant/
2•speckx•4m ago•0 comments

What Is Swarm Intelligence? An Explainer

https://www.aptiv.com/en/insights/article/what-is-swarm-intelligence
3•ohjeez•4m ago•0 comments

Instagram to remove end-to-end encryption for private messages in May

https://www.theguardian.com/technology/2026/mar/18/instagram-to-remove-end-to-end-encryption-for-...
2•finnlab•4m ago•1 comments

New Version of the Elixir Language Tour

https://elixir-language-tour.swmansion.com/introduction
2•mirzap•5m ago•0 comments

Study pinpoints when bow and arrow came to North America

https://arstechnica.com/science/2026/03/study-pinpoints-when-bow-and-arrow-came-to-north-america/
3•pseudolus•6m ago•0 comments

Affordable Passive Income Course

https://millionairepartnership.com/webclass-d24-new
2•Longknocker•6m ago•0 comments

Show HN: Monte Carlo simulator for March Madness bracket pools

https://madness.imranh.org/
2•imran3740•7m ago•0 comments

Sovereign V4: A Cleaner, Stronger Approach to Cryptography

https://bastion-enclave.vercel.app
2•KevinChasse•7m ago•0 comments

Show HN: Run Claude Code with –dangerously-skip-permissions in a Docker sandbox

https://github.com/sayil/dangerously
2•sayil•8m ago•0 comments

European municipalities leak citizen data to US companies

3•sam_lowry_•9m ago•0 comments

Ask HN: How to Find a Job in the UK

3•0x3444ac53•10m ago•0 comments

Iran revives 'Zionist sorcery' claims in propaganda against Israel

https://www.jpost.com/middle-east/iran-news/article-890527
3•pinewurst•11m ago•0 comments

Don't Call Me Francis

https://www.persuasion.community/p/dont-call-me-francis
3•colonCapitalDee•11m ago•1 comments

Introducing DoorDash Tasks

https://about.doordash.com/en-us/news/introducing-doordash-tasks
2•ChrisArchitect•12m ago•0 comments

Freedom in the World 2026 [pdf]

https://freedomhouse.org/sites/default/files/2026-03/FIW2026_final_digital%20%281%29.pdf
2•doener•13m ago•0 comments

The quadratic problem nobody fixed

https://iev.ee/blog/the-quadratic-problem-nobody-fixed/
2•lalitmaganti•14m ago•0 comments

Steer your Waymo with your phone

https://waymo.fraud.llc/
2•bradleybuda•18m ago•1 comments

Equality Saturation and Symbolic Regression

https://egraphs.org/meeting/2026-03-19-symbolic-regression
2•matt_d•18m ago•0 comments

Clipping Parts of Videos from Plex

https://git.sr.ht/~jacky/plexclip
2•jackyalcine•18m ago•0 comments

Breaking Signals, Breaking Systems

https://guille.site/posts/breaking-signals/
3•LolWolf•21m ago•0 comments