frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

In Defense of Pen and Paper

https://blog.calebjay.com/posts/in-defense-of-pen-and-paper/
1•speckx•1m ago•0 comments

Study: Human brain processes language similarly to AI models

https://www.afhu.org/2025/12/22/how-the-human-brain-understands-language-may-be-more-like-ai-than...
1•giuliomagnifico•1m ago•0 comments

Is that allowed? Authentication and authorization in Model Context Protocol

https://stackoverflow.blog/2026/01/21/is-that-allowed-authentication-and-authorization-in-model-c...
1•mooreds•2m ago•0 comments

Everyone Has a Boss

https://jonpauluritis.com/articles/everyone-has-a-boss/
1•jppope•2m ago•0 comments

A Professional Proposal

https://thismightnotmatter.com/a-professional-proposal/
1•ozzyphantom•2m ago•1 comments

Show HN: What unicorns have in common – Lessons from a VC

https://drive.google.com/file/d/1E_Kbwm0-gvu9lxMxNO6BtLALFzN7Jww8/view?usp=sharing
1•igor_ryabenkiy•2m ago•0 comments

Hypergrowth Isn't Always Easy

https://tailscale.com/blog/hypergrowth-isnt-always-easy
1•SteveHawk27•5m ago•0 comments

A dead fish moves upstream when its body resonates with vortices in water [pdf]

https://liaolab.com/wp-content/uploads/2020/10/2006Beal_etal.pdf
2•rdgthree•6m ago•0 comments

Patrick Collison: "You shouldn't compete in markets you can't win."

https://twitter.com/ahmetbuilds/status/2013962872708112648
1•ahmetd•6m ago•0 comments

PicoPCMCIA – a PCMCIA development board for retro-computing enthusiasts

https://www.yyzkevin.com/picopcmcia/
5•rbanffy•7m ago•0 comments

Show HN: Guava Range Parser – Parse "[0..100)" strings into Guava Range object

https://github.com/neewrobert/guava-range-parser
1•neewrobert•9m ago•0 comments

Show HN: Patchli.st – Bug bounties for indie SaaS founders

https://www.patchli.st/
1•massi24•9m ago•0 comments

Share Your Website at Events

https://jamesg.blog/2026/01/21/share-your-website-at-events
1•speckx•9m ago•0 comments

Show HN: Analyze binary capabilities in-browser with capa and Pyodide

https://surfactant.readthedocs.io/en/latest/capa/
1•rmast•9m ago•0 comments

European lawmakers suspend U.S. trade deal

https://www.cnbc.com/2026/01/21/european-lawmakers-suspend-us-trade-deal-amid-greenland-tariff-te...
4•belter•11m ago•2 comments

Show HN: S2-lite, an open source Stream Store

https://github.com/s2-streamstore/s2
1•shikhar•11m ago•0 comments

Your prod code should have bugs

https://lucaspauker.com/articles/your-prod-code-should-have-bugs/
1•lucaspauker•11m ago•0 comments

JPEG XL Demo Page

https://tildeweb.nl/~michiel/jxl/
5•roywashere•13m ago•0 comments

Dressing Blade Runner: Interview with Set Decorator Linda DeScenna (2001)

https://media.bladezone.com/contents/film/production/Linda-DeScenna/index.html
2•exvi•13m ago•0 comments

Citigroup to boost Japan investment banking team on deal boom

https://www.japantimes.co.jp/business/2025/12/23/companies/citigroup-investment-banking-boost/
3•PaulHoule•14m ago•0 comments

ZScript

https://github.com/zscriptlang/zscript
3•ziyaadsaqlain•14m ago•1 comments

The long painful history of (re)using login to log people in

https://utcc.utoronto.ca/~cks/space/blog/unix/LoginProgramReuseFootgun
3•rkta•14m ago•0 comments

Self-hosted AI data workflow: DB and Ollama and SQL

https://exasol.github.io/developer-documentation/main/gen_ai/ai_text_summary/index.html
2•exasol_nerd•15m ago•2 comments

Show HN: Prometheus – Give LLMs memory, dreams, and contradiction detection

https://github.com/panosbee/PROMETHEUS
1•panossk•16m ago•0 comments

Xgotop: Realtime Go Runtime Visualizer

https://devpost.com/software/xgotop-go-runtime-observer
2•tanelpoder•16m ago•0 comments

Show HN: MedDiscovery – Autonomous hypothesis generator for dead-end diseases

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5973574
1•panossk•19m ago•0 comments

Show HN: Reproduce and benchmark ML papers in your terminal before implementing

https://github.com/danishm07/tomea
1•_danish•20m ago•1 comments

Avoid Cerebras if you are a founder

4•remusomega•22m ago•1 comments

Find U.S. Manufacturers in Seconds – CNC, sheet metal, molding, etc.

https://build.link/
4•donewithfuess•23m ago•0 comments

AI future will be nothing like present

https://distantprovince.by/posts/ai-future-will-be-nothing-like-present/
1•distantprovince•24m ago•2 comments
Open in hackernews

Show HN: yolo-cage – AI coding agents that can't exfiltrate secrets

https://github.com/borenstein/yolo-cage
21•borenstein•1h ago
I made this for myself, and it seemed like it might be useful to others. I'd love some feedback, both on the threat model and the tool itself. I hope you find it useful!

Backstory: I've been using many agents in parallel as I work on a somewhat ambitious financial analysis tool. I was juggling agents working on epics for the linear solver, the persistence layer, the front-end, and planning for the second-generation solver. I was losing my mind playing whack-a-mole with the permission prompts. YOLO mode felt so tempting. And yet.

Then it occurred to me: what if YOLO mode isn't so bad? Decision fatigue is a thing. If I could cap the blast radius of a confused agent, maybe I could just review once. Wouldn't that be safer?

So that day, while my kids were taking a nap, I decided to see if I could put YOLO-mode Claude inside a sandbox that blocks exfiltration and regulates git access. The result is yolo-cage.

Also: the AI wrote its own containment system from inside the system's own prototype. Which is either very aligned or very meta, depending on how you look at it.

Comments

dfajgljsldkjag•57m ago
Seeing "Fix security vulnerabilities found during escape testing" as a commit message is not reassuring. Of course testing is good but it hints that the architecture hasn't been properly hardened from the start.
borenstein•49m ago
Hi, thanks for your feedback! I can see this from a couple of different perspectives.

On the one hand, you're right: those commit messages are proof positive that the security is not perfect. On the other hand, the threat model is that most threats from AI agents stem from human inattention, and that agents powered by hyperscaler models are unlikely to be overtly malicious without an outside attacker.

There are some known limitations of the security model, and they are limitations that I can accept. But I do believe that yolo-cage provides security in depth, and that the security it provides is greater than what is achieved through permission prompts that pop up during agent turns in Claude Code.

catlifeonmars•36m ago
I don’t think that’s quite fair. What would you infer from the absence of such a commit message?
seg_lol•31m ago
Vibe with it, it is YOLO all the way down.
fnoef•55m ago
I wonder why everyone seems to go with Vagrant VMs rather than simple docker containers.
ajb•47m ago
Theoretically, they have a smaller attack surface. The programs inside the VM can't interact directly with the host kernel.
borenstein•45m ago
Thank you, good question! My original implementation was actually a bunch of manifests on my own microk8s cluster. I was finding that this meant a lot of ad-hoc adjustments with every little tweak. (Ironic, given the whole "pets vs cattle" thing.) So I started testing the changes in a VM.

Then I was talking to a security engineer at my company, who pointed out that a VM would make him feel better about the whole thing anyway. And it occurred to me: if I packaged it as a VM, then I'd get both isolation and determinism. It would be easier to install and easier to debug.

So that's why I decided to go with a Vagrant-based installation. The obvious downside is that it's harder now to integrate it with external systems or to use the full power of whatever environment you deploy it in.

m-hodges•41m ago
See: A field guide to sandboxes for AI¹ on the threat models.

> I want to be direct: containers are not a sufficient security boundary for hostile code. They can be hardened, and that matters. But they still share the host kernel. The failure modes I see most often are misconfiguration and kernel/runtime bugs — plus a third one that shows up in AI systems: policy leakage.

¹ https://www.luiscardoso.dev/blog/sandboxes-for-ai

snowmobile•52m ago
Wait, so you don't trust the AI to execute code (shell commands) on your own computer, so therefore need a safety guardrail, in order to facilitate it writing code that you'll execute on your customers' computers (the financial analysis tool)?

And adding the fact that you used AI to write the supposed containment system, I'm really not seeing the safety benefits here.

The docs also seem very AI-generated (see below). What part did you yourself play in actually putting this together? How can you be sure that filtering a few specific (listed) commands will actually give any sort of safety guarantees?

https://github.com/borenstein/yolo-cage/blob/main/docs/archi...

borenstein•33m ago
You are correct both that the AI wrote 100% of the code (and 90% of the raw text). You are also correct that I want a safety guardrail for the process by which I build software that I believe to be safe and reliable. Let's take a look at each of these, because they're issues that I also wrestled with throughout 2025.

What's my role here? Over the past year, it's become clear to me that there are really two distinct activities to the business of software development. The first is the articulation of a process by which an intent gets actualized into an automation. The second is the translation of that intent into instructions that a machine can follow. I'm pretty sure only the first one is actually engineering. The second is, in some sense, mechanical. It reminds me of the relationship between an architect and a draftsperson.

I have been much freer to think about engineering and objectives since handing off the coding to the machine. There was an Ars Technica article on this the other day that really nails the way I've been experiencing this: https://arstechnica.com/information-technology/2026/01/10-th...

Why do I trust the finished product if I don't trust the environment? This one feels a little more straightforward: it's for the same reason that construction workers wear hard hats in environments that will eventually be safe for children. The process of building things involves dangerous tools and exposed surfaces. I need the guardrails while I'm building, even though I'm confident in what I've built.

snowmobile•24m ago
This comment is also AI generated and contains nonsensical metaphors (the last paragraph).
borenstein•20m ago
This was 100% not AI generated! Could you explain what's nonsensical about the metaphor?
asragab•15m ago
At least we can be confident your comments aren't ai generated.
hmokiguess•11m ago
Thank you so much for this analogy. This reminded me how I’ve always bike without a helmet, even though I’ve been in crashes and hits before, it just isn’t in my nature to worry about safety in the same way others do I guess? People do be different and it’s all about your relationship with managing and tolerating risk.

(I am not saying one way is better than the other, it’s just different modes of engaging with risk. I obviously understand that having a helmet can and would save my life should an accident occur. The keyword here is “should/would/can” which some people believe in “shall/will/does” and prefer to live this way. Call it different faith or belief systems I guess)

asragab•17m ago
Wait, so you think the principle danger of vibe coding, is the code it produces and not the actual process which the agents take to get there? Given that autonomous agent require extraordinary permissions where as presumably running code on a production website often does not.

Wait, do you also think there are no guardrails in a production system that would fundamentally alter the security posture of a system rather than on a single laptop with a single developer would engender.

Wait, do you not believe all the AI companies aren't also trying to develop (have developed) their own sandboxing solutions?

Wait, did you even bother to think about what you were writing before you decided to write it?

snowmobile•11m ago
You seem upset. I'm simply saying that if I didn't trust a human developer to run shell commands on the webserver (or the much lower bar of my own laptop), I woudn't trust them to push code that's supposed to run on that webserver, even after "auditing" the code. Would you let an agent run freely ssh:d into your webserver?
asragab•6m ago
You seem inexperienced, lots of orgs do not allow their devs to arbitrarily ssh into their webservers without requesting elevation, which is fundamentally the difference between autonomous agent development `dangerously-skipping-permissions` and it asking every time to use commands? Which is the point of a sandbox?
kjok•45m ago
Genuine question: why is everyone rolling out their own sandbox wrappers around VMs/Docker for agents?
borenstein•40m ago
I know, right? The day I initially thought about posting this, there was another one called `yolo-box`. (That attempt--my very first post--got me instantly shadow-banned due to being on a VPN, which led to an unexpected conversation with @dang, which led to some improvements, which led to it being a week later.)

I think it's the convergence of two things. First, the agents themselves make it easier to get exactly what you want; and second, the OEM solutions to these things really, really aren't good enough. CC Cloud and Codex are sort of like this, except they're opaque and locked down, and they work for you or they don't.

It reminds me a fair bit of 3D printer modding, but with higher stakes.

m-hodges•39m ago
It all feels like temporary workflow fixes until The Agent Companies just ship their opinionated good enough way to do it.
borenstein•29m ago
It probably is. Some of this stuff will hang around because power users want control. Some of it will evolve into more sophisticated solutions that get turned into products and become easier to acquihire than the build in house. A lot of it will become obsolete when the OEMs crib the concept. But IMO all of those are acceptable outcomes if what you really want is the thing itself.
derpsteb•38m ago
My experience is that neither has a good UX for what I usually try to do with coding agents. The main problem I see is setup/teardown of the boxes and managing tools inside them.
catlifeonmars•32m ago
Because of findings like this

https://www.anthropic.com/research/small-samples-poison

(A small number of samples can poison LLMs of any size) to save clicks to read the headline

The way I think of it is, coding agents are power tools. They can be incredibly useful, but can also wreak a lot of havoc. Anthropic (et al) is marketing them to beginners and inevitably someone is going to lose their fingers.

vivzkestrel•27m ago
- I am not interested in running claude or any of the agents as much as I am interested in running untrusted user code on the cloud inside a sandbox

- Think codesandbox, how much time does it take for a VM here to boot?

- How safe do you think this solution would be to let users execute untrusted code inside while being able to pip install and npm install all sorts of libraries and

- how do you deploy this inside AWS Lambda/Fargate for the same usecase?

p410n3•17m ago
The whole issue is why i stopped using in-editor LLMs and wont use Agents for "real" work. I cant be sure of what context it wants to grab. With the good ol' copy paste into webui I can be 100%sure what the $TECHCORP sees and can integrate whatever it spits out by hand, acting as the first version of "code review". (Much like you would read over stackoverflow code back in the day).

If you want to build some greenfield auxiliary tools fine, agents make sense but I find that even gemini's webui has gotten good enough to create multiple files instead of putting everything in one file.

This way I also dont get locked in to any provider

borenstein•2m ago
The leakage issue is real. Before there was a way to use "GPT Pro" models on enterprise accounts, I had a separate work-sponsored Pro-tier account. First thing I did was disable "improve models for everyone." One day I look and, wouldn't you know it, it had somehow been enabled again. I had to report the situation to security.

As far as lock-in, though, that's been much less of a problem. It's insanely easy to switch because these tools are largely interchangeable. Yes, this project is currently built around Claude code, but that's probably a one-hour spike away from flexibility.

I actually think the _lack_ of lock-in is the single biggest threat to the hyperscalers. The technology can be perfectly transformative and still not profitable, especially given the current business model. I have Qwen models running on my Mac Studio that give frontier models a run for their money on many tasks. And I literally bought this hardware in a shopping mall.

kxbnb•2m ago
Really cool approach to the containment problem. The insight about "capping the blast radius of a confused agent" resonates - decision fatigue is real when you're constantly approving agent actions.

The exfiltration controls are interesting. Have you thought about extending this to rate limiting and cost controls as well? We've been working on similar problems at keypost.ai - deterministic policy enforcement for MCP tool calls (rate limits, access control, cost caps).

One thing we've found is that the enforcement layer needs to be in-path rather than advisory - agents can be creative about working around soft limits. Curious how you're handling the boundary between "blocked" and "allowed but logged"?

Great work shipping this - the agent security space needs more practical tools.