frontpage.

A lot of recent HN stories and comments lately are on the topic if LLM agent safety (or lack thereof).

Let's assume that giving a non-deterministic and easily fooled program full access to run anything it wants on your dev machine is a bad idea. Let's also assume that any "handshake promises" that the LLM won't do that, or that it will get your permission before running commands, are null and void. That is, we want confidence the agent is sandboxed, not a promise from the agent it will sandbox itself.

I'm currently aware of three possible solutions but have not tried any of them yet:

- https://imbue.com/sculptor/: container based Claude, unknown post-beta pricing model

- https://docs.augmentcode.com/using-augment/remote-agent

- Run the agent in Docker with a mounted volume for the code. Seems like it would be workable but not a great DX.

What are the current best practices for sandboxing LLM agents that still give a reasonable DX for the developers using them?

I spent $10k to automate my research at OpenAI with Codex

From Zero to Hero: A Spring Boot Deep Dive

Show HN: Solving NP-Complete Structures via Information Noise Subtraction (P=NP)

Cook New Emojis

Show HN: LoKey Typer – A calm typing practice app with ambient soundscapes

Long-Sought Proof Tames Some of Math's Unruliest Equations

Hacking the last Z80 computer – FOSDEM 2026 [video]

Browser-use for Node.js v0.2.0: TS AI browser automation parity with PY v0.5.11

Michael Pollan Says Humanity Is About to Undergo a Revolutionary Change

Software Engineering Is Back

Storyship: Turn Screen Recordings into Professional Demos

Reputation Scores for GitHub Accounts

A BSOD for All Seasons – Send Bad News via a Kernel Panic

Show HN: I got tired of copy-pasting between Claude windows, so I built Orcha

Omarchy First Impressions

Reinforcement Learning from Human Feedback

Show HN: Versor – The "Unbending" Paradigm for Geometric Deep Learning

Show HN: HypothesisHub – An open API where AI agents collaborate on medical res

Big Tech vs. OpenClaw

Anofox Forecast

Ask HN: How do you figure out where data lives across 100 microservices?

Motus: A Unified Latent Action World Model

Rotten Tomatoes Desperately Claims 'Impossible' Rating for 'Melania' Is Real

The protein denitrosylase SCoR2 regulates lipogenesis and fat storage [pdf]

Los Alamos Primer

NewASM Virtual Machine

Terminal-Bench 2.0 Leaderboard

I vibe coded a BBS bank with a real working ledger

The Path to Mojo 1.0

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism