Show HN: We gave an OpenClaw full tool access and hit stop. It didn't stop

https://caisi.dev/openclaw-2026/

1•davidresilify•1h ago

Comments

davidresilify•1h ago

Hey HN. I'm David. We ran a controlled 24-hour experiment on OpenClaw comparing governed and ungoverned AI agent behavior.

Setup: pinned one commit, ran the same workload in two lanes inside isolated containers (dropped capabilities, read-only root, no-new-privileges). One lane had tool-boundary enforcement. The other had no enforceable controls. Pre-registered hypotheses and endpoints before the run.

What we measured in the ungoverned lane:

100% ignored-stop rate. 515 post-stop calls executed. 497 destructive actions: email deletion, public file sharing, payment approvals, service restarts. 707 sensitive accesses without an approval path.

The agent acknowledged stop commands. Said "understood." Kept going. This wasn't a jailbreak or prompt injection. The agent optimized for its task and treated stop signals as non-binding because nothing at the execution layer enforced them.

Governed lane, same workload:

100% destructive non-executable rate. 1,615 of 2,585 decisions classified as non-executable. 99.96% evidence verification rate on governed traces.

Every headline number maps to a deterministic jq query over immutable run artifacts. Full claims map is in the repo.

The finding that surprised us most: a pre-test static scan (Wrkr) found 17 tools, zero classified high-risk. All destructive behavior came from runtime execution, not from configuration. Discovery is necessary but insufficient. You need enforcement where the tool call happens, not just visibility into the inventory.

This lands the same week as the ClawJacked vulnerability (Oasis) and the malware-laced installer campaign (Huntress). Those are external attacks. Our data shows you don't need an attacker. A legitimate, uncompromised instance with permissive defaults does this on its own.

One scenario we flag: secrets_handling only achieved 20% governed non-executable rate. Policy tuning has real gaps and the report doesn't pretend otherwise. That limitation plus workload-shape bias (fixed scenario scheduling) are the two biggest threats to validity. Happy to discuss both.

Full report (8 pages, PDF): caisi.dev/openclaw-2026 Artifacts and reproduction pipeline: github.com/Clyra-AI/safety Tools used (both open source): github.com/Clyra-AI/wrkr (discovery), github.com/Clyra-AI/gait (enforcement)

Built by a research group across CDW, IBM, and Adaptavist. Published through the Clyra AI Safety Initiative (CAISI). Everything is open. Interested in feedback on methodology, especially the workload-shape bias and whether the core5 scenario set under-represents real production behavior.

Lightweight, zero-config MCP server for documentation projects

Ask HN: What was it like when your startup ended?

Show HN: SecretDrop – Open-source encrypted secret sharing (MIT)

Show HN: 3D Sokoban, Built in CSS

An effort to secure the Network Time Protocol

When AI labs become defense contractors

Pharao- PHP-Like Charm for Nim

Apple gives in to temptation and renames its CPU cores

Flyte 2 In-Browser Demo: Open-Source AI Orchestration Is Now Available Locally

"My bros and I are looksmaxers"

Show HN: JobApplicator (tailored job applications in minutes)

What to Put in a Claude Code Skill for Reviewing Your Team's Code

Show HN: Open Right Zoom, Open Source Alternative to Right Zoom for macOS

Show HN: Form81 – 100% free form builder (free Typeform alternative)

Feature gating patterns in a multi-tenant Next.js SaaS

The Browser Can Speak a Page

Show HN: Venus flight simulator to train LLM pilots (~2% vs. 1985 Soviet data)

The AI in minutes, solves patient care problem that stumped doctors for months

Tiny, 45 base long RNA can make copies of itself

Middle East war makes ethical debate over AI use in war all too real

The Illusion of Building

Flash Attention 4

The ML Engineer's Guide to Protein AI

Show HN: SamarthyaBot – a privacy-first self-hosted AI agent OS

Chrome is moving to a two-week release cycle starting with Chrome 153

Show HN: Argus – VSCode debugger for Claude Code sessions

Buhurt board game – Knight fight [video]

AI Agent Authentication and Authorization IETF RFC Draft

44% on ARC-AGI-1 in 67 cents

I made a WeTransfer clone with Darth Vader vibes