frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Mad Bugs: Claude Wrote a Full FreeBSD Remote Kernel RCE with Root Shell

https://blog.calif.io/p/mad-bugs-claude-wrote-a-full-freebsd
1•dnqthao•6m ago•0 comments

China can survive without the Strait of Hormuz

https://www.reuters.com/graphics/IRAN-CRISIS/CHINA-OIL/egpbeormkvq/
3•giuliomagnifico•10m ago•0 comments

Beyond Bestsellers: How We're Teaching IKEA's Recommender to Think Differently

https://medium.com/flat-pack-tech/beyond-the-bestsellers-how-were-teaching-ikea-s-recommender-to-...
1•robin_reala•13m ago•0 comments

Claude Code Leaks

https://github.com/rosaboyle/awesome-cc-oss
1•dheerajmp•14m ago•1 comments

Mad Bugs: Vim vs. Emacs vs. Claude

https://blog.calif.io/p/mad-bugs-vim-vs-emacs-vs-claude
5•Munksgaard•18m ago•0 comments

NASA: Artemis II Live Views from Orion

https://www.youtube.com/watch?v=6RwfNBtepa4
1•nstj•18m ago•0 comments

The World Sees Trump's America as a Sad Joke

https://newrepublic.com/article/205701/europe-american-decline-trump-greenland
2•chmaynard•18m ago•0 comments

TuxCraft – easy open-source tool to run Minecraft servers

1•Cheesehamster•19m ago•0 comments

The most-disliked people in the publishing industry

https://www.woman-of-letters.com/p/the-most-disliked-people-in-the-publishing
2•Caiero•19m ago•0 comments

How do the likely/unlikely macros in the Linux kernel work?

https://stackoverflow.com/questions/109710/how-do-the-likely-unlikely-macros-in-the-linux-kernel-...
2•tosh•24m ago•0 comments

NumPy as Synth Engine

https://kennethreitz.org/essays/2026-03-29-numpy_as_synth_engine
1•signa11•25m ago•0 comments

The potential of erroneous outbound traffic

https://blog.apnic.net/2026/04/01/the-potential-of-erroneous-outbound-traffic/
1•jruohonen•28m ago•0 comments

Why heroism is bad, and what we can do to stop it

https://sre.google/resources/practices-and-processes/no-heroes/
1•walterbell•29m ago•1 comments

When AI Fails

https://whenaifail.com/
1•clubdorothe•31m ago•0 comments

Microsoft closes worst quarter since 2008: 'Redmond is in a pickle'

https://www.cnbc.com/2026/03/31/microsofts-stock-closes-worst-quarter-since-2008-financial-crisis...
2•1vuio0pswjnm7•32m ago•2 comments

Multi-agent systems have a distributed systems problem

https://christophermeiklejohn.com/ai/agents/distributed/zabriskie/2026/03/30/multi-agent-systems-...
2•azhenley•33m ago•0 comments

Index providers shouldn't bend the rules for Elon Musk

https://economist.com/leaders/2026/03/31/index-providers-shouldnt-bend-the-rules-for-elon-musk
4•andsoitis•34m ago•1 comments

From Organizational Hierarchy to Intelligence

https://block.xyz/inside/from-hierarchy-to-intelligence
1•walterbell•37m ago•0 comments

Dnf5-ageist: Age verification for DNF5 [video]

https://www.youtube.com/watch?v=flH17X32MrY
1•goode•38m ago•0 comments

Ask HN: Is weird it that Anthropic raised my API limit from $500/mo to $200k?

1•noduerme•39m ago•1 comments

Better Blog AI | Automated Blog publishing to any CMS

https://betterblogai.com
1•leula_t•42m ago•2 comments

Reverse-engineering the Alesis MMT8 firmware

https://github.com/dnewcome/mmt8
2•dnewcome•44m ago•1 comments

OnlyOffice Gets Forked as "Made in Europe", Sparks Licensing and Trust Debate

https://itsfoss.com/news/onlyoffice-forked/
2•abdelhousni•50m ago•0 comments

Show HN: SwiftLM – Qwen Chat on iPhone, 100B+ Moe on M5 Pro 64GB (Native Swift)

https://github.com/SharpAI/SwiftLM
1•aegis_camera•50m ago•0 comments

The Sims Creator's Quest to Turn His and Your Own Mind into a Video Game

https://www.vulture.com/article/will-wright-proxi-the-sims-games.html
2•chaostheory•51m ago•0 comments

Improving my focus by giving up my big monitor

https://ounapuu.ee/posts/2026/04/01/focus/
2•Fudgel•54m ago•1 comments

Allbirds, once valued at $4B, just sold its assets for next to nothing

https://www.msn.com/en-us/money/companies/allbirds-the-tech-bro-favorite-once-valued-at-4-billion...
5•timr•56m ago•0 comments

DSTs Are Just Polymorphically Compiled Generics

https://faultlore.com/blah/dsts-are-polymorphic-generics/
1•g0xA52A2A•56m ago•0 comments

Create polished, pro-grade screen recordings – MIT Licensed

https://github.com/webadderall/Recordly
1•pbd•57m ago•1 comments

Claude Wrote a Full FreeBSD Remote Kernel RCE with Root Shell (CVE-2026-4747)

https://github.com/califio/publications/blob/main/MADBugs/CVE-2026-4747/write-up.md
3•ishqdehlvi•1h ago•0 comments
Open in hackernews

Show HN: Calx – track and compile corrections humans make with AI agents

https://github.com/getcalx
1•spenceships•1h ago
Last year I got laid off and started building a company. Fast forward to a month ago, I built a production system with 6 AI agents across 82,000 lines of code in 20 days for $250. I kept obsessive correction logs. Every time an agent made a mistake and I told it what to do differently, and I made sure it logged the correction itself.

When I transferred 237 of those corrections as rules to a new agent to save time with onboarding in a new repo, it made 44 new mistakes. 13 were in categories the rules explicitly covered. The rules were present in context. The behavior wasn't there. I published the field study with full correction logs.

Then Meta's Superintelligence Labs published HyperAgents (arXiv:2603.19461, March 2026). They found the complementary result: improvements DO transfer across domains when embodied in executable mechanisms (persistent memory, performance tracking, eval loops), not when written as rule text. Two independent studies, same boundary: documentation is not behavior.

So I built Calx. pip install getcalx gives you a CLI + MCP server that:

Captures corrections developers make to AI agents Detects recurrence via keyword similarity (Jaccard), auto-promotes at 3x threshold Promotes recurring corrections to enforced rules and hooks, injected at session start Scopes rules per domain/directory so each agent gets only what's relevant

It runs as a FastMCP server over Streamable HTTP (SQLite locally) so any MCP-compatible client connects: Claude Code, Claude Desktop, Cursor, custom agents. It is primarily designed for Claude Code. It also handles token discipline (prevents context compaction from destroying correction signal), multi-agent orchestration, session lifecycle hooks, orientation gates, and dirty-exit recovery.

The difference from agent memory tools: existing agent memory systems store information for retrieval. Calx tracks the behavioral plane, how an agent works with a specific person, not just what it knows. The data shows the information plane alone doesn't reliably change behavior.

v0.5.0, 443 tests, MIT license. Paper with full evidence: https://doi.org/10.5281/zenodo.19159223

Comments

spenceships•1h ago
Here's some context on how this happened:

The origin was accidental. I was building a startup (AI career translation platform), not running an experiment. The correction logs were just how I managed the agents.

When the transfer failed, honestly it didn't occur to me that I had measured it at all until well after. I was pivoting the platform to go fully agentic and had burned through 1.9B tokens in 4 days or something. So, I did an audit to see what fell through the cracks. The audit was when I began realizing what I had found. At that point the paper just made sense, because I hadn't seen anyone else talking about it.

What surprised me the most: architectural corrections (changing how something is structured) had zero recurrence. Process corrections ("always do X before Y") had roughly 50% persistence, with recurring failure chains. One correction chain went eight entries deep, each referencing the previous ones. The agent kept making the same category of mistake with slight variations.

HyperAgents landing the same week I was writing this up was genuinely lucky timing, and I didn't find out about it until last week. In my opinion, their imp@50 = 0.630 on math (where traditional transfer scored 0.0) is the clearest evidence that the mechanism vs documentation distinction is real and measurable.

What I'd love feedback on:

Is the MCP server the right distribution mechanism, or do people want this as IDE plugins? I have always strongly believed in meeting people where they are when it comes to Open Source, but I'm curious what this community thinks The recurrence detection uses Jaccard similarity on keyword sets. This is simple and works for my data, but I suspect it breaks on large teams. Anyone have experience with correction clustering at scale? The paper methodology is N=1. HyperAgents converged on the boundary but it doesn't account for everything. I know the limitations. If anyone wants to replicate with their own correction logs, the framework is designed for it and I'd actively help. I am quite eager to have people mess around with the tool and let me know their thoughts

As a note, I am still in the process of shipping the hook and orchestration methodology to work with the MCP server, and at the time of this writing I'm about a third of the way through the build. Am hoping to have it live and packaged by morning EST.

Happy to answer questions about the correction dynamics, the MCP architecture, or anything in the paper.