frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: 20+ Claude Code agents coordinating on real work (open source)

https://github.com/mutable-state-inc/lean-collab
18•austinbaggio•1h ago
Single-agent LLMs suck at long-running complex tasks.

We’ve open-sourced a multi-agent orchestrator that we’ve been using to handle long-running LLM tasks. We found that single LLM agents tend to stall, loop, or generate non-compiling code, so we built a harness for agents to coordinate over shared context while work is in progress.

How it works: 1. Orchestrator agent that manages task decomposition 2. Sub-agents for parallel work 3. Subscriptions to task state and progress 4. Real-time sharing of intermediate discoveries between agents

We tested this on a Putnam-level math problem, but the pattern generalizes to things like refactors, app builds, and long research. It’s packaged as a Claude Code skill and designed to be small, readable, and modifiable.

Use it, break it, tell me about what workloads we should try and run next!

Comments

clairekart•1h ago
What’s the failure mode you see with single-agent Claude Code on complex tasks? (looping, context drift, plan collapse, tool misuse?)
austinbaggio•1h ago
All of the above. The most frustrating one with the Putnam example with Claude was generating solutions that obviously didn't compile. This feels like plan collapse- not verifying its own work. I'm sure that if you just had a dumb two-model setup, it would eventually get to compiling code after n runs, but that was just for this one failure mode.
Atotalnoob•58m ago
You can use hooks to not allow it to stop without successful build
miligauss•1h ago
It's a more of a black box with claude, at least with this you see the proof strategy and mistakes made by the model when it decomposes the problem. I think instead of Ralph looping you get something that is top-down. If models were smarter and context windows bigger i am sure complex tasks like this one would be simpler, but braking it down into sub agents and having a collective --"we already tried this strategy and it backtracked"-- intelligence is a nice way to scope a limited context window to an independent sub problem.
raphaelmolly8•1h ago
The Lean angle here is really interesting: most multi-agent demos dodge hard verification, but tying each agent’s output to makes the feedback loop objective. Curious how you’re handling goal-claim conflicts/duplication when two agents find competing tactic sequences for the same subgoal—do you keep both in memory with some ranking signal (time-to-verify, proof term size, etc.)?
yodon•1h ago
Can you add a license.txt file so we know we have permission to run this (eg MIT and GPL V3 are very different)
austinbaggio•1h ago
Oversight - added MIT. How are you thinking of using it?
yodon•47m ago
For the moment, researching multi-agent orchestration. At first glance, your work looks among the best in class of published work I've seen. Particularly interested to understand the memory/communication/search model you're using, as it sounds like you've trying to think well past the GasTown/Beads/Claude-Code-Swarms concepts.
austinbaggio•5m ago
Very kind of you to say. Our whole vision is that agents can produce way better results, compounding their intelligence, when they lean on shared memory.

I'm curious to see how it feels for you when you run it. I'm happy to help however I can.

christinetyip•1h ago
Cool, what’s a good first task to try this on where it’s likely to beat a single agent?
miligauss•58m ago
we tried putnam a2
austinbaggio•58m ago
Math proofs are really easy to run with this specific harness. Our next experiments are going to be bigger, think full code base refactors. We're working on applying RLM to improve context window limits so we can keep more of the actual code in RAM,

Any workloads you want to see? The best are ones that have ways to measure the output being successful, thinking about recreating the C compiler example Anthropic did, but doing it for less than the $20k in tokens they used.

zmanian•1h ago
“How does progress subscription work — are agents watching specific signals (test failures, TODO list, build status), or just a global feed?”
miligauss•59m ago
claude code doesn't support subscriptions out of the box, so we use the subscription feature to just alert the orchestrator to a single polling file. Not the most elegant thing but still a token save over reading a bunch of sub agent logs. It is as reactive as you can be given the current feature set of claude code.
slopusila•51m ago
seems like it requires an API key to your proprietary Ensue memory system
austinbaggio•45m ago
Yeah we're using Ensue since it already handles the annoying infra pieces you’d otherwise have to build to make this work (shared task state + updates, event streams/subscriptions, embeddings + retrieval over intermediate artifacts). You can run the example with a free key from ensue-network.ai. This repo focuses on the orchestration harness.
yodon•38m ago
The first screen of your signup flow asks for "organization" - is that used as a username or as an organization name or both (I can't tell what if anything will be on the next screen)

If your registration process is eventually going to ask me for a username, can the org name and user name be the same?

austinbaggio•30m ago
username==orgname for now, so yes, just treat that as one in the same
austinbaggio•30m ago
We're working on improvements to make it easier to join orgs as a user so you can add friends/colleagues, but for now treat them as the same object
yodon•16m ago
When you get a chance to work on your login flow, I recommend giving users an opportunity to request the key rather than automatically showing it once only on the first screen.

I created the account from my phone, and don't have access to the dev tools I'd want to paste the key into. I can deal with it, but I don't know if I'll be able to regenerate the key if I lose it, I'd rather not store it on my phone, and I don't trust my accuracy in manually typing it in on my laptop while looking at my phone, so all the options feel not great. Again, not an actual roadblock, but still something I'd encourage fixing.

Edit added: Good thing I copied the key to my phone before writing this message. Jumping over to this page seems to have forced a refresh/logout on the ensure page in the other tab, so my token would (I think? maybe?) be lost at this point if I'd done it in the other order.

From AI burnout to AI native: the 5-level blueprint to using agents

https://www.theneuron.ai/explainer-articles/ai-burnout-to-native-5-level-blueprint/
1•swolpers•1m ago•0 comments

Show HN: Workledger – Your offline first Engineering notebook

https://workledger.org/
1•recvonline•1m ago•0 comments

Hibiki-Zero:real-time and multilingual speech translation model

https://kyutai.org/blog/2026-02-12-hibiki-zero
1•pain_perdu•2m ago•0 comments

Show HN: A video agent with Canvas2D code-gen and generative capabilities

https://gliadirector.com/?referral=hn1000
1•vickyliin•4m ago•0 comments

GLM-5 with mlx-lm on single 512GB M3 Ultra in Q4

https://twitter.com/awnihannun/status/2022007608811696158
1•tosh•4m ago•0 comments

Views on Mastra's SOTA Memory

1•manthangupta109•5m ago•0 comments

I'm skipping Carnival to port my AI CMS to English (Pre-selling integration)

https://739088950838.gumroad.com/l/uvdnia
1•eliasolie•6m ago•1 comments

Show HN: ListofDisks – hard drive price index across 7 retailers not just Amazon

2•listofdisks•6m ago•0 comments

Resently – macOS menu bar app that sets your Slack status by Wi-Fi network

https://apps.apple.com/se/app/presently/id6504904260?mt=12
1•netdigger•6m ago•1 comments

News publishers limit Internet Archive access due to AI scraping concerns

https://www.niemanlab.org/2026/01/news-publishers-limit-internet-archive-access-due-to-ai-scrapin...
3•mellosouls•7m ago•0 comments

Show HN: Android Notifications over MQTT

https://github.com/firebadnofire/mqttmonitor
1•firebadnofire•7m ago•0 comments

Why Static and Outdated Docs Are Holding Your Product Back

https://www.jamdesk.com/blog/why-static-and-outdated-docs-are-holding-your-product-back
2•gbourne•13m ago•0 comments

Ui.sh

https://ui.sh/
1•linhns•13m ago•0 comments

Japan content piracy and fake goods cause ¥10.4T in damage

https://www.japantimes.co.jp/news/2026/01/27/japan/crime-legal/digital-content-piracy-damages/
1•PaulHoule•14m ago•0 comments

Show HN: Timefence – Python lib to detect temporal data leak in ML training

https://github.com/gauthierpiarrette/timefence
2•Emojizing•14m ago•0 comments

The Easiest Person to Fool

https://toluakinola.substack.com/p/the-easiest-person-to-fool
2•takinola•15m ago•0 comments

Cadence Releases AI Super Agent, Chipstack, for Chip Design and Verification

https://www.cadence.com/en_US/home/company/newsroom/press-releases/pr/2026/cadence-unleashes-chip...
1•Anessk01•15m ago•0 comments

Tell HN: GPT-5.3-codex is now available in the API

1•bigwheels•15m ago•0 comments

How does OpenClaw even work?

https://gyld.ai/blog/openclaw-open-source-ai-agent-guide-2026
1•curtrosenvall•18m ago•0 comments

Can Diazepam Cause Anxiety?

1•bestonlinephar•20m ago•1 comments

Do you think I am a goldfish?

http://muratbuffalo.blogspot.com/2026/02/do-you-think-i-am-goldfish.html
2•mark4•20m ago•0 comments

Show HN: Google's Epstein Files

https://twitter.com/sushrut141/status/2021992771872346198
2•wanderinglight•21m ago•2 comments

Blogging with Emacs org-mode and SvelteKit

https://www.chiply.dev/post0
2•chiply•21m ago•0 comments

Ask HN: Is Prettier extension working for you in Cursor?

1•vldszn•24m ago•0 comments

Show HN: A New and comprehensive Vibe Coding web platform is here

https://hypeframe.ai
1•theonlyvasudev•24m ago•0 comments

Memgraph 3.8: Atomic GraphRAG and Vector Single Store with Performance Upgrades

https://memgraph.com/blog/memgraph-3-8-release-atomic-graphrag-vector-single-store-parallel-runtime
1•taubek•24m ago•0 comments

BleuNova – Ethical self-hosted AI agent (privacy-first)

https://github.com/BleuRadience/BleuNova-AI-Agent
1•bleuradience•25m ago•1 comments

Show HN: ConsentScope – detect cookies loaded before user consent

https://www.consentscope.pro/
1•murzynalbinos•26m ago•0 comments

Weight-loss revolution (does not much) show up in the data

https://www.ft.com/content/0de44a07-528d-4515-9fb4-f6636d9c4230
2•marojejian•27m ago•2 comments

U.S. Smuggled Thousands of Starlink Terminals into Iran After Protest Crackdown

https://www.wsj.com/world/middle-east/u-s-smuggled-thousands-of-starlink-terminals-into-iran-afte...
3•fortran77•27m ago•3 comments