I have been running a bunch of stuff in there with a custom environment that allows "*"
Sandboxing the agent hardly seems like a sufficient defense here.
From there it splits out each phase into three parts: implementation, code review, and iteration.
After each part, I do a code review and iteration.
If asked, the proposal is broken down into small, logical chunks so code review is pretty quick. It can only stray so far off track.
I treat it like a strong mid-level engineer who is learning to ship iteratively.
Codex is pretty good at finding complex bugs in the code, but Claude is better at getting stuff working
You merely watched the tools do the work.
Compiled output can change between versions, heck, can even change during runtime (JIT compilation).
If you're barely doing anything neither of these things can possibly be true even with current technology.
Not even a scrap of self-preservation?
I don’t see my customers being able to one-shot their way to the full package of what I provide them anytime soon either. As they gain that capability, I also gain the capability to accelerate what more value I provide them.
I don’t think automation is the cause of your inability to feed and house yourself if it reduces the labor needed by capital. That’s a social and political issue.
Edit: I have competitors already cloning them with CC regularly, and they spend more than 24h dedicated to it too
If the capability does arrive, that’s why I’m using what I can today to get a bag before it’s too late.
I can’t stop development of automation. But I can help workers organize, that’s more practical.
What if they are, or worse? Are you prepared for that?
If you point me towards your products, someone can try to replicate them in 24 hours. Sound good?
Edit: I found it, but your website is broken on mobile. Needs work before it's ready to be put into the replication machine. If you'd like I can do this for you for a small fee at my consulting rate (wink emoji).
All the more reason to not hand-code it in a week.
I reckon something lie Qubes could work fairly well.
Create a new Qube and have control over network connectivity, and do everything there, at the end copy the work out and destroy it.
When I’m satisfied with the spec, I turn on “allow all edits” mode and just come back later to review the diff at the end.
I find this works a lot better than hoping I can one shot my original prompt or having to babysit the implementation the whole way.
These days I often use https://gitingest.com - it can grab any full repo on GitHub has something you can copy and paste, e.g. https://gitingest.com/simonw/llm
[client]
root = "~/repo/client"
include = [
"src/**/*.ts",
"src/**/*.vue",
"package.json",
"tsconfig*.json",
"*.ts",
]
exclude = [
"src/types/*",
"src/scss/*",
]
output = "bundle-client.txt"
$ bundle -p client
What do you do when you repeatedly need to bundle the same thing? Bash history?A massive productivity boost I get is using to do server maintenance.
Using gcloud compute ssh, log into all gh runners and run docker system prune, in parellel for speed and give me a summary report of the disk usage after.
This is an undocumented and underused feature of basic agentic abilities. It doesn't have to JUST write code.
AI can still be helpful here if new to scheduling a simple shell command, but I'd be asking the AI how do I automate the task away, not manually asking the AI to do the thing every time, or using my runners in a fashion that means I don't have to even concern myself with scheduled prune command calls.
AI said “I got this” :)
I barely understand what I just said, and I’m sure it would have taken me a whole day to track this down myself.
Obviously I did NOT turn on auto-approve for the aws command during this process! But now I’m making a restricted role for CC to use in this situation, because I feel like I’ll certainly be doing something like this again. It’s like the AWS Q button, except it actually works.
The reason they don't do that is because some popular and necessary apps use it. Like Chrome.
However, I tried this approach too and it's the wrong way to go IMHO, quite beyond the use of undocumented APIs. What you actually want to do is virtualize, not sandbox.
Setting up "permissions.allow" in `.claude/settings.local.json` takes minimal time. Claude even lets you configure this while approving code, and you can use wildcards like "Bash(timeout:*)". This is far safer than risking disasters like dropping a staging database or deleting all unstaged code, which Claude would do last week, if I were running it in the YOLO mode.
The worst part is seeing READMEs in popular GitHub repos telling people to run YOLO mode without explaining the tradeoffs. They just say, "Run with these parameters, and you're all good, bruh," without any warning about the risks.
I wish they could change the parameter to signify how scary it can be, just like React did with React.__SECRET_INTERNALS_DO_NOT_USE_OR_YOU_WILL_BE_FIRED (https://github.com/reactjs/react.dev/issues/3896)
It's a never ending game of whitelisting.
That would mean that their, undoubtedly extremely interesting, emails actually get met with more than a "450 4.1.8 Unable to find valid MX record for sender domain" rejection.
I'm sure this is just an oversight being caused by obsolete carbon lifeforms still being in charge of parts of their infrastructure, but still...
The cost estimate came out to 63 cents - details here: https://gistpreview.github.io/?27215c3c02f414db0e415d3dbf978...
igor47•1d ago
ares623•3h ago
abathologist•2h ago