frontpage.

Show HN: I built a thinking framework for Claude

https://bengiaventures.github.io/effective-thinking-skill/

1•bengia•1h ago

I built an open-source Claude Code skill called /think that applies a structured 5-element analysis framework (ground in facts, stress-test for failure, reframe the question, trace implications, audit your own reasoning) before synthesizing a recommendation. The obvious question: does it actually produce better output than just asking Claude directly? To test this, I ran blind A/B comparisons. Two isolated Claude Opus 4.6 agents get the same question — one runs /think, one responds naturally. Both responses are anonymized (framework markers stripped, sections retitled by content) and presented blind. The test covers 5 topics any professional would recognize: scaling a team post-fundraise, build vs buy decisions, when to pivot a product, SaaS pricing strategy, and the remote/hybrid/office debate. An AI judge scored /think winning all 5 pairs. But AI judging AI is circular — which is why the blind test is live for humans to judge. What I found so far (~21 comparisons across calibration + blind tests):

/think wins ~69% of comparisons overall Risk coverage is the clearest advantage (17-2 across all tests) — it consistently surfaces failure modes the organic response misses Decision impact is nearly even — organic Claude is often more actionable for practical problems Novel insight is mostly a wash — both find similar core insights, just different ones No decisive gaps in either direction. The advantage is depth and rigor, not dramatic superiority

Honest limitations:

All judges so far are AI. The whole point of publishing the blind test is to get human validation. ~21 comparisons is a pattern, not statistical significance Anonymization isn't perfect — /think responses have stylistic tells (confidence assessments, "what would change this conclusion" sections) The framework costs significantly more tokens

The skill itself is a recursive learning agent — it persists what it learns to a .think/ directory and loads that context in future sessions. Over time it builds project-specific knowledge. It also used its own framework to diagnose and fix its own weaknesses after the first round of testing. Everything is open source: https://github.com/bengiaventures/effective-thinking-skill I'd genuinely like to know if the blind test matches what the AI judges found, or if humans see something different. Takes about 15 minutes.

An educational fork of OpenClaw in which ALL safety guardrails have been removed

How One Rock Poisoned (Almost) The Entire Planet

Speed Is the Moat: Inference Performance on AMD GPUs

The Future of VSP Scrolling on the C64

Show HN: Writing a C++20M:N Scheduler from Scratch (EBR, Work-Stealing)

Yellow Journalism

You Want It Darker?

Add schedule pause periods to triggers page

The mysterious symptom popping up in some GLP-1 users

Show HN: Air – Open-source black box for AI agents (tamper-evident audit trails)

LayerV – Senior Software Engineer – Remote (US) – $125k–150k equity and benefits

pg_background: Make Postgres do the long work (while your session stays light)

Skunk mating season becoming a headache for Bay Area residents

The Great Reboot

Show HN: Privatiser – Redact secrets, IPs, and PII before sharing with AI

Local memory for any LLM agent

Gravity Basins (2024) [video]

Magic Words Need Measuring Sticks

Godot veteran says 'AI slop' pull requests have become overwhelming

I Use Obsidian

Ask HN: Are compiler errors for unused code necessary?

Memories Family

Book a Meeting with a YC Founder

Ask HN: Can AI replace apps, or will economics keep the app market alive?

Show HN: Preference-aware routing for OpenClaw via Plano

The Servo project and its impact on the web platform ecosystem

Mira: An agent that never forgets anything. Persistent, shared memory

Python HTTP server using Erlang and BEAM

Dual nationals face scramble for UK passports as new rules come into force

GraphQLite: SQLite graph extension supporting Cypher