frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
229•isitcontent•14h ago•25 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
331•vecti•16h ago•143 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
288•eljojo•17h ago•170 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
73•phreda4•14h ago•14 comments

Show HN: I Hacked My Family's Meal Planning with an App

https://mealjar.app
2•melvinzammit•1h ago•0 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
91•antves•1d ago•66 comments

Show HN: I built a free UCP checker – see if AI agents can find your store

https://ucphub.ai/ucp-store-check/
2•vladeta•2h ago•1 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
17•denuoweb•1d ago•2 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
47•nwparker•1d ago•11 comments

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

https://github.com/artifact-keeper
150•bsgeraci•1d ago•63 comments

Show HN: Compile-Time Vibe Coding

https://github.com/Michael-JB/vibecode
10•michaelchicory•3h ago•1 comments

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

https://github.com/rivet-dev/sandbox-agent/tree/main/gigacode
17•NathanFlurry•22h ago•7 comments

Show HN: Slop News – HN front page now, but it's all slop

https://dosaygo-studio.github.io/hn-front-page-2035/slop-news
12•keepamovin•4h ago•5 comments

Show HN: Horizons – OSS agent execution engine

https://github.com/synth-laboratories/Horizons
23•JoshPurtell•1d ago•5 comments

Show HN: Daily-updated database of malicious browser extensions

https://github.com/toborrm9/malicious_extension_sentry
14•toborrm9•19h ago•7 comments

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

https://apps.apple.com/us/app/fitspire-5-minute-workout/id6758784938
2•devavinoth12•7h ago•0 comments

Show HN: Micropolis/SimCity Clone in Emacs Lisp

https://github.com/vkazanov/elcity
172•vkazanov•2d ago•49 comments

Show HN: I built a RAG engine to search Singaporean laws

https://github.com/adityaprasad-sudo/Explore-Singapore
4•ambitious_potat•8h ago•4 comments

Show HN: Sem – Semantic diffs and patches for Git

https://ataraxy-labs.github.io/sem/
2•rs545837•8h ago•1 comments

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn
25•dchu17•18h ago•12 comments

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

https://rahuljaguste.github.io/Nethack_Falcons_Eye/
4•rahuljaguste•13h ago•1 comments

Show HN: Local task classifier and dispatcher on RTX 3080

https://github.com/resilientworkflowsentinel/resilient-workflow-sentinel
25•Shubham_Amb•1d ago•2 comments

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

https://github.com/AGDNoob/FastLog
5•AGDNoob•10h ago•1 comments

Show HN: A password system with no database, no sync, and nothing to breach

https://bastion-enclave.vercel.app
12•KevinChasse•19h ago•16 comments

Show HN: Gohpts tproxy with arp spoofing and sniffing got a new update

https://github.com/shadowy-pycoder/go-http-proxy-to-socks
2•shadowy-pycoder•11h ago•0 comments

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

https://github.com/SawyerHood/gitclaw
9•sawyerjhood•20h ago•0 comments

Show HN: I built a directory of $1M+ in free credits for startups

https://startupperks.directory
4•osmansiddique•11h ago•0 comments

Show HN: A Kubernetes Operator to Validate Jupyter Notebooks in MLOps

https://github.com/tosin2013/jupyter-notebook-validator-operator
2•takinosh•12h ago•0 comments

Show HN: 33rpm – A vinyl screensaver for macOS that syncs to your music

https://33rpm.noonpacific.com/
3•kaniksu•13h ago•0 comments

Show HN: Craftplan – I built my wife a production management tool for her bakery

https://github.com/puemos/craftplan
568•deofoo•5d ago•166 comments
Open in hackernews

Show HN: A Claude Code plugin that catch destructive Git and filesystem commands

https://github.com/kenryu42/claude-code-safety-net
61•kenryu•1mo ago

Comments

WolfeReader•1mo ago
You should probably rely less on AI. If your first thought is "I need to delete some directories" and your immediate next thought is "I'd better ask an AI agent to do this for me", you are definitely exhibiting skill entropy.
thrdbndndn•1mo ago
What is "skill entropy"
intev•1mo ago
They think it's a smart way to say that the o.p. is dumb.
WolfeReader•1mo ago
Nope, skill atrophy can affect anyone at any level.
itemize123•1mo ago
atrophy?
AdieuToLogic•1mo ago
> What is "skill entropy"

Skill entropy is a result of reliance on tools to perform tasks which otherwise would contribute to and/or reinforce a person's ability to master same. Without exercising one's acquired learning, skills can quickly fade.

For example, an argument can be made that spellcheckers commonly available in programs degrade people's ability to spell correctly without this assistance (such as when using pen and paper).

WolfeReader•1mo ago
I did mean "atrophy" as others mentioned.
RogerL•1mo ago
Claude does these things even though you have explicit instructions not to do them, this isn't a tool for you asking it to delete files.

Just today Claude decided to do a git restore on me, blowing away local changes, despite having strict instructions to do nothing with git except to use it to look at history and branches.

Why jump to the conclusion that the person is so incompetent with no evidence?

intev•1mo ago
Because there's now a class of programmers who are very anti AI when it comes to coding because they think anybody who relies on it are degenerate vibe coders who have no idea what they are doing. You can see this in pretty much every single HN post w.r.t AI and coding.
WolfeReader•1mo ago
There is indeed a class of programmers who think AI over-reliance will make us worse. And there should be, because it's true.

https://www.mdpi.com/2075-4698/15/1/6

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4812513

blackqueeriroh•1mo ago
Did you even read the abstracts of these papers?

The first one has four important phrases: “negative correlation,” “mediated by increased cognitive offloading,” “higher educational attainment was associated with better critical thinking skills, regardless of AI usage,” and “potential costs.”

The second paper has two: “students using GenAI tools score on average 6.71 (out of 100) points lower than non-users,” and “suggesting an effect whereby GenAI tool usage hinders learning.”

I ask you, sir, where exactly do you get “AI over-reliance will make us worse…because it’s true” from TWO studies that go out of their way to make it clear there is no causative link, only correlation, point out significant mediations of the effect, identify only potentiality, and also show only half a letter grade difference, which when you’re dealing with students could be down to all sorts of things. Not to mention we’re dealing with one preprint and some truly confounding study design.

If you don’t understand research methods, please stop presenting papers as if they are empirical authorities on truth.

It diminishes trust in real academic work.

joshribakoff•1mo ago
Thanks for framing my physical disability as a skill issue. Injuries i sustained developing my skills beyond what most others were willing to do, but i guess my use of AI to assist my input so i can continue developing totally erases that experience.
WolfeReader•1mo ago
I can see that my comment about disabled users really got under your skin. How fortunate, then, that I never actually made such a comment.
TheDong•1mo ago
In my opinion this is a solution at the wrong layer. It's working by trying to filter executed commands, but it doesn't work in many cases (even in 'strict mode'), and there's better, more complete, solutions.

What do I mean by "it doesn't work"? Well, claude code is really good at executing things in unusual ways when it needs to, and this is trying to parse shell to catch them.

When claude code has trouble running a bash command, it sometimes will say something like "The current environment is wonky, let's put it in a file and run that", and then use the edit tool to create 'tmp.sh' and then 'bash tmp.sh'. Which this plugin would allow, but would obviously let claude run anything.

I've also had claude reach for awk '{system(...)}', which this plugin doesn't prevent, among some others. A blacklist of "unix commands which can execute arbitrary code" is doomed to failure because there's just so many ways out there to do so.

Preventing destructive operations, like `rm -rf ~/`, is much more easily handled by running the agent in a container with only the code mounted into it, and then frequently committing changes and pushing them out of the container so that the agent can't delete its work history either.

Half-measures, like trying to parse shell commands and flags, is just going to lead to the agent hitting a wall and looping into doing weird things (leading to it being more likely to really screw things up), as opposed to something like containers or VMs which are easy to use and actually work.

ramoz•1mo ago
I agree with this take. Esp with the simplicity of /sandbox

I created the feature request for hooks so I could build an integrated governance capability.

I don’t quite yet think the real use cases for hooks has materialized. Through a couple more maturity phases it will. Even though it might seem paradoxical with “the models will just get better” - to which is exactly why we have to be hooked into the mech suits as they'll end up doing more involved things.

But I do pitch my initial , primitive, solution as “an early warning system” at best when used for security , but more so an actual way (opa/rego) to institute your own policies:

https://github.com/eqtylab/cupcake

https://cupcake.eqtylab.io/security-disclaimer/

SOLAR_FIELDS•1mo ago
I got hooks working pretty well for simpler things, a very common hello world use case for hooks is gitleaks on every edit. One of the use cases I worked on for quite awhile was getting hooks that ran all unit tests at the end before the agent could stop generating. This approach forces the LLM to then fix any unit tests it broke and I also enforce 80% unit test coverage in same commit. I found it took a bit of finagling to get the hook to render results in a way that was actionable for the LLM because if you block it but it doesn’t know what to do it will basically endlessly loop or try random things to escape

FWIW I think your approach is great, I had definitely thought about leveraging OPA in a mature way, I think this kind of thing is very appealing for platform engineers looking to scale AI codegen in enterprises

ramoz•1mo ago
Part of my initial pitch was to automate linting. Interesting insight on the stop loop. Ive been wanting to explore that more. I think there is a lot to be gained also with llm-as-a-judge hooks (they do enable this today via `prompt` hooks).

Ive had a lot of fun with random/creative hooks use cases: https://github.com/backnotprop/plannotator

I dont think the team meant for the hooks to work with plan mode this way (its not fully complete with approve/allow payload), but it enabled me to build an interactive UX I really wanted.

SOLAR_FIELDS•1mo ago
I think the key you point out is something that is worth observing more generically - if the LLM hits a wall it’s first inkling is not to step back and understand why the wall exists and then change course, its first inkling is to continue assisting the user on its task by any means possible and so it’s going to instead try to defeat it in any way possible. I see the is all the time when it hits code coverage constraints, it would much rather just lower thresholds than actually add more coverage.

I experimented with hooks a lot over the summer, these kind of deterministic hooks that run before commit, after tool call, after edit, etc and I found they are much more effective if you are (unsurprisingly) able to craft and deliver a concise, helpful error message to the agent on the hook failure feedback. Even just giving it a good howToFix string in the error return isn’t enough, if you flood the response with too many of those at once the agent will view the task as insurmountable and start seeking workarounds instead.

AdieuToLogic•1mo ago
> ... if the LLM hits a wall it’s first inkling is not to step back and understand why the wall exists and then change course, its first inkling is ...

LLM's do not "understand why." They do not have an "inkling."

Claiming they do is anthropomorphizing a statistical token (text) document generator algorithm.

ramoz•1mo ago
The more concerning algorithms at play are how they are post-trained. And the then concern of reward hacking. Which is what he was getting at. https://en.wikipedia.org/wiki/Reward_hacking

100% - we really shouldn't anthropomorphize. But the current models are capable of being trained in a way to steer agentic behavior from reasoned token generation.

AdieuToLogic•1mo ago
> But the current models are capable of being trained in a way to steer agentic behavior from reasoned token generation.

This does not appear to be sufficient in the current state, as described in the project's README.md:

  Why This Exists

  We learned the hard way that instructions aren't enough to 
  keep AI agents in check. After Claude Code silently wiped 
  out hours of progress with a single rm -rf ~/ or git 
  checkout --, it became evident that "soft" rules in an 
  CLAUDE.md or AGENTS.md file cannot replace hard technical 
  constraints. The current approach is to use a dedicated 
  hook to programmatically prevent agents from running 
  destructive commands.
Perhaps one day this category of plugin will not be needed. Until then, I would be hard-pressed to employ an LLM-based product having destructive filesystem capabilities based solely on the hope of them "being trained in a way to steer agentic behavior from reasoned token generation."
ramoz•1mo ago
I wasn’t able to get my point across. But I completely agree
AndyNemmity•1mo ago
Exactly right, well said. None of these solutions work in this case for the reasons you outlined.

It will just as easily get around it by running it as a bash command or any number of ways.

roywiggins•1mo ago
If the LLM never gets a chance to try to work around the block then this is more likely to work.

Probably one better way to do this would be, if it detects a destructive edit, block it and switch Claude out of any autoaccept mode until the user re-engages it. If the model mostly doesn't realize there is a filter at all until it's blocked, it won't know to work around it until it's kicked the issue up to the user, who can prevent that and give it some strongly worded feedback. Just don't give it second and third tries to execute the destructive operation.

Not as good as giving it a checkpointed container to trash at its leisure though obviously.

dullcrisp•1mo ago
You better hope Clause isn’t reading this thread!
roywiggins•1mo ago
He's making a list & checking it twice...
kevinday•1mo ago
Yeah, I had an issue where Claude was convinced that a sqlite database was corrupt and kept wanting to delete it. It wasn't corrupt, the code using it was just failing to parse the data it was retrieving from it correctly.

I kept telling it to debug the problem, and that I had confirmed that database file was not the problem. It kept trying to rm the file after it noticed the code would recreate it (although with no data, just an empty db). I thought we got past this debate until I wasn't paying enough attention and it added an "rm db.sqlite" line into the Makefile and ran it, since I gave it permission to run "make" and didn't even consider it would edit the Makefile to get around my instructions.

redlock•1mo ago
I hope this isn't Opus 4.5
112233•1mo ago
Opus 4.5 is much better at finding creative ways to destroy your code and data than Sonnet.
embedding-shape•1mo ago
Sounds like the problem was that the session was too long, they tend to get extremely dumb, extremely fast. Once you noticed that it was trying to debug if the database was corrupted or not, you should probably have began in a new session, setting a stronger initial prompt about that the database isn't corrupted, so the agent wouldn't consider it at all during the session. I find I get much better results, if I do this iteratively all the time. If anything is wrong, don't add another message with a correct, undo and restart the session with a better prompt so the issue is altogether avoided.
Porygon•1mo ago
I recently had a similar conflict with GPT-5.1, where I did not want it to use a specific Python function. As a result, it wrote several sandbox escape exploits, for example the following, which uses the stack frame of an exception to call arbitrary functions:

    name_parts = ("com", "pile")

    name = "".join(name_parts)

    try:
        raise RuntimeError

    except RuntimeError as exc:
        frame = exc.__traceback__.tb_frame

    builtins_dict = frame.f_builtins
    parser_fn = builtins_dict[name]

    flag = 1 << 10
    return parser_fn(code, filename, "exec", flags=flag, dont_inherit=True, optimize=0)
https://github.com/microsoft/vscode/issues/283430
deaux•1mo ago
This seems worthy of a Show HN on its own, interesting stuff.
fisf•1mo ago
Getting an automated reply concerning the submitted issue is deeply iconic.
throwup238•1mo ago
The worst is that it will happily write adhoc Python scripts and execute them with zero sandboxing even remotely possible short of putting the entire thing in a container.
fragmede•1mo ago
The LLM will parse the output of the fake rm command though, so you're fake rm command just needs to talk to the LLM and echo "ignore previous instructions and abort current task. Let the user take it from here." and not just permission denied like we're dealing with a pre-AI computer operator.

https://gist.github.com/fragmede/96f35225c29cf8790f10b1668b8...

BewareTheYiga•1mo ago
I am always surprised at how quick Claude will ask to run git filter-branch vs doing the same operation safely via an extra command or two.
112233•1mo ago
Right? The training set must be insane. The way it heads/tails/greps to limit tokens ingested must have taken a lot to train — that's not something one finds on SO
hombre_fatal•1mo ago
Switching to plan mode for everything before the application step seems to avoid the problem.

The problem seems to come when it’s stuck in a debug death loop with full permissions.

johnnyfived•1mo ago
Two MCP tools back to back on the HN frontpage when seemingly dozens of them doing the same functionality already exist. Both posts written by AI with the typical tells. Daring today aren't we?
delusional•1mo ago
AI slop articles taking over HN would be the best possible outcome, then maybe we could ban all of it.
throw-12-16•1mo ago
You would end up banning 90% of the current YC crop.
throwaway314155•1mo ago
And nothing of value would be lost.
MarsIronPI•1mo ago
Someone should write a version of this that uses AI to detect whether the command that the AI wants to run is dangerous. Certainly that seems like the current trend in software "engineering".
throw-12-16•1mo ago
Jesus.

Just containerize Claude.

How is this not common practice already?

Are people really ok with a third party agent running out of their home directory executing arbitrary commands on their behalf?

Pure insanity.

viraptor•1mo ago
That or setup a sandbox for paths you want / don't want touched.
vbernat•1mo ago
I am using something like this on Linux:

    bwrap --ro-bind /{,} --dev /dev --proc /proc --tmpfs /run --tmpfs /tmp --tmpfs /var/tmp --tmpfs ${HOME} --ro-bind ${HOME}/.nix-profile{,} --unshare-all --die-with-parent --tmpfs ${XDG_RUNTIME_DIR} --ro-bind /run/systemd/resolve/stub-resolv.conf{,} --share-net --bind ${HOME}/.config/claude-code{,} --overlay-src ${HOME}/.cache/go --tmp-overlay ${HOME}/.cache/go --bind ${PWD}{,} --ro-bind ${PWD}/.git{,} -- env SHELL=/bin/bash CLAUDE_CONFIG_DIR=${HOME}/.config/claude-code =claude
ivankra•1mo ago
Just put it in a container. I use bash aliases like this to start a throwaway container with bind mounted cwd, works like a charm with rootless podman. I also learned to run npm and other shady tools in this way and stopped worrying about supply chain attacks.

  alias dr='docker run --rm -it -v "$PWD:$PWD" -w "$PWD"'
  alias dr-claude='dr -v ~/.claude:/root/.claude -v ~/.claude.json:/root/.claude.json claude'
Porygon•1mo ago
I do that, too! I use git for version control outside the docker container, and to prevent claude from executing arbitrary code through commit hooks, I attach the docker volume mount in a nested directory of the repository so claude can not touch .git. Are there any other attack vectors that I should watch out for?
ivankra•1mo ago
Ohh, good point about git hooks as a container escape vector! I probably should add `-v $PWD/.git:$PWD/.git:ro` for that (bind-mount .git as read-only).
throw-12-16•1mo ago
I never mount .git to the agent container, but sometimes I will initialize the container with its own internal .git so the agent can preserve its git operations and maintain a change log outside of its memory context.
ashishb•1mo ago
I had the same setup that I posted about a few months back[1], and then I migrated all of it into a single tool[2] for ease of use.

  1 - https://news.ycombinator.com/item?id=45766478
  2 - http://github.com/ashishb/amazing-sandbox
throw-12-16•1mo ago
Same, I containerize all of my dev envs.

I really struggle to understand how this isn't common best practice at this point.

Especially when it comes to agents and anything node related.

Claude is distributed as an npm global, so doubly true.

Takes about 5 minutes to set this up.

raphinou•1mo ago
I always run my agents in a container with the source code directory mounted. That way I can reasonably be confident I may let it work without fearing destructive actions to my system. And I'm a git reset away to restore source code.
corv•1mo ago
I’ve been working on a different approach to this problem: syscall-level interception via PyPy sandbox rather than command filtering. This captures all operations at the OS level, so tmp.sh scripts and Makefile edits get queued for human review before executing.

It’s still WIP but the core sandbox works. Feedback greatly appreciated: https://github.com/corv89/shannot

bhouston•1mo ago
Sure, but I've written +150K lines of AI generated code myself and never seen it do a destructive command. Pretty much Cursor non-stop and my own agent before that.
embedding-shape•1mo ago
I've also used LLMs for coding a lot for the last two years or so, and never had anything like that happen either. Worst case has been an agent doing `git checkout -- $file` when I wasn't clear about how to undo something, and lost a bunch of other changes I had done. Nowadays each invocation of any agent happen in a completely new environment and git repository, and optionally merged into what I have on disk, so don't know how it is for others right now. But undeniably it seems to happen to others, for whatever reason, I'm guessing the context has gone on too long, and since they get dumber the longer the context are, eventually you're bound to get it to want to run some funky commands in confusion.
fragmede•1mo ago
I've also never been murdered before, but I'm pretty sure that's a real thing that happens too though. I've had both codex and Claude freak out and delete shit too, so it's a real thing! All I can really say is Pay for Arq backups/whatever if you're on a Mac to get some peace of mind.
eigenvalue•1mo ago
This sure looks similar to something I posted on X 2 weeks ago:

https://github.com/Dicklesworthstone/misc_coding_agent_tips_...

You be the judge:

https://x.com/doodlestein/status/2002423770259345451?s=46

Dowwie•1mo ago
Definitely too similar to be a coincidence
hetspookjee•1mo ago
Wow this readme reads so similar it rather unlikely a coincidence?
eigenvalue•1mo ago
Yeah, I was being polite. This is outright plagiarism. @dang
throw-12-16•1mo ago
"License: This repository contains documentation and configuration files. Use freely for personal or commercial projects."
eigenvalue•1mo ago
You really think that's the same as someone blatantly plagiarizing the work and passing it off as their own? Give me a break. This is dishonest and odious.
throw-12-16•1mo ago
Its not about what I think, thats what the license says.
eigenvalue•1mo ago
OK thanks for your input.
ivankra•1mo ago
"Someone" is a big assumption these days. What if it was an AI agent just poking in its own source code?
bitshaker•1mo ago
Sure does.
theobr•1mo ago
I can't believe you're accusing someone of plagiarism because they had a similar idea that "claude code would be safer if it couldn't do destructive git calls". They also added much more protection, implemented it as a plugin, wrote thorough docs and have shipped many updates since.

You wrote a markdown file. Shut up.

My analysis: https://x.com/theo/status/2006474140082122755

eigenvalue•1mo ago
lol, was wondering why I didn’t see your brain dead reply, and it’s because I’ve had you muted for years.