frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Open-Source SDK for AI Knowledge Work

https://github.com/ClioAI/kw-sdk
11•ankit219•2h ago
GitHub: https://github.com/ClioAI/kw-sdk

Most AI agent frameworks target code. Write code, run tests, fix errors, repeat. That works because code has a natural verification signal. It works or it doesn't.

This SDK treats knowledge work like an engineering problem:

Task → Brief → Rubric (hidden from executor) → Work → Verify → Fail? → Retry → Pass → Submit

The orchestrator coordinates subagents, web search, code execution, and file I/O. then checks its own work against criteria it can't game (the rubric is generated in a separate call and the executor never sees it directly).

We originally built this as a harness for RL training on knowledge tasks. The rubric is the reward function. If you're training models on knowledge work, the brief→rubric→execute→verify loop gives you a structured reward signal for tasks that normally don't have one.

What makes Knowledge work different from code? (apart from feedback loop) I believe there is some functionality missing from today's agents when it comes to knowledge work. I tried to include that in this release. Example:

Explore mode: Mapping the solution space, identifying the set level gaps, and giving options.

Most agents optimize for a single answer, and end up with a median one. For strategy, design, creative problems, you want to see the options, what are the tradeoffs, and what can you do? Explore mode generates N distinct approaches, each with explicit assumptions and counterfactuals ("this works if X, breaks if Y"). The output ends with set-level gaps ie what angles the entire set missed. The gaps are often more valuable than the takes. I think this is what many of us do on a daily basis, but no agent directly captures it today. See https://github.com/ClioAI/kw-sdk/blob/main/examples/explore_... and the output for a sense of how this is different.

Checkpointing: With many ai agents and especially multi agent systems, i can see where it went wrong, but cant run inference from same stage. (or you may want multiple explorations once an agent has done some tasks like search and is now looking at ideas). I used this for rollouts a lot, and think its a great feature to run again, or fork from a specific checkpoint.

A note on Verification loop: The verify step is where the real leverage is. A model that can accurately assess its own work against a rubric is more valuable than one that generates slightly better first drafts. The rubric makes quality legible — to the agent, to the human, and potentially to a training signal.

Some things i like about this: - You can pass a remote execution environment (including your browser as a sandbox) and it would work. It can be docker, e2b, your local env, anything, the model will execute commands in your context, and will iterate based on feedback loop. Code execution is a protocol here.

- Tool calling: I realize you don't need complex functions. Models are good at writing terminal code, and can iterate based on feedback, so you can just pass either functions in context and model will execute or you can pass docs and model will write the code. (same as anthropic's programmatic tool calling). Details: https://github.com/ClioAI/kw-sdk/blob/main/TOOL_CALLING_GUID...

Lastly, some guides: - SDK guide: https://github.com/ClioAI/kw-sdk/blob/main/SDK_GUIDE.md - Extensible. See bizarro example where i add a new mode: https://github.com/ClioAI/kw-sdk/blob/main/examples/custom_m... - working with files: https://github.com/ClioAI/kw-sdk/blob/main/examples/with_fil... - this is simple but i love the csv example: https://github.com/ClioAI/kw-sdk/blob/main/examples/csv_rese... - remote execution: https://github.com/ClioAI/kw-sdk/blob/main/examples/with_cus...

And a lot more. This was completely refactored by opus and given the rework, probably would have taken a lot of time to release it.

MIT licensed. Would love your feedback.

Comments

Noel25•2h ago
One design goal here was to make “knowledge work” verifiable in the same way code is. The rubric/verify loop was our attempt to give agents a signal beyond “sounds good,” especially for research or strategy tasks where correctness isn’t binary. Curious how others here handle verification for non-code agent workflows.

The Singularity will occur on a Tuesday

https://campedersen.com/singularity
280•ecto•2h ago•162 comments

Show HN: Showboat and Rodney, so agents can demo what they've built

https://simonwillison.net/2026/Feb/10/showboat-and-rodney/
47•simonw•1h ago•27 comments

Launch HN: Livedocs (YC W22) – An AI-native notebook for data analysis

https://livedocs.com
21•arsalanb•1h ago•6 comments

Simplifying Vulkan one subsystem at a time

https://www.khronos.org/blog/simplifying-vulkan-one-subsystem-at-a-time
160•amazari•6h ago•69 comments

Mathematicians disagree on the essential structure of the complex numbers

https://www.infinitelymore.xyz/p/complex-numbers-essential-structure
72•FillMaths•3h ago•60 comments

Show HN: Rowboat – AI coworker that turns your work into a knowledge graph (OSS)

https://github.com/rowboatlabs/rowboat
55•segmenta•2h ago•14 comments

Clean-room implementation of Half-Life 2 on the Quake 1 engine

https://code.idtech.space/fn/hl2
255•klaussilveira•8h ago•49 comments

Ex-GitHub CEO launches a new developer platform for AI agents

https://entire.io/blog/hello-entire-world/
102•meetpateltech•3h ago•80 comments

Markdown CLI viewer with VI keybindings

https://github.com/taf2/mdvi
22•taf2•1h ago•5 comments

China's Data Center Boom: A View from Zhangjiakou (2025)

https://sinocities.substack.com/p/chinas-data-center-boom-a-view-from
5•fzliu•24m ago•0 comments

Qwen-Image-2.0: Professional infographics, exquisite photorealism

https://qwen.ai/blog?id=qwen-image-2.0
290•meetpateltech•10h ago•145 comments

Google Handed ICE Student Journalist's Bank and Credit Card Numbers

https://theintercept.com/2026/02/10/google-ice-subpoena-student-journalist/
275•lehi•1h ago•105 comments

Show HN: I made paperboat.website, a platform for friends and creativity

https://paperboat.website/home/
33•yethiel•2h ago•22 comments

Oxide raises $200M Series C

https://oxide.computer/blog/our-200m-series-c
394•igrunert•5h ago•201 comments

Frontier AI agents violate ethical constraints 30–50% of time, pressured by KPIs

https://arxiv.org/abs/2512.20798
495•tiny-automates•16h ago•322 comments

Show HN: I built a macOS tool for network engineers – it's called NetViews

https://www.netviews.app
119•n1sni•14h ago•38 comments

Show HN: Stripe-no-webhooks – Sync your Stripe data to your Postgres DB

https://github.com/pretzelai/stripe-no-webhooks
18•prasoonds•2h ago•5 comments

The Switch to Linux and the Beginning of My Self-Hosting Journey

https://hazemkrimi.tech/blog/linux-self-hosting-journey/
6•kingcrimson1000•1h ago•0 comments

Parse, Don't Validate (2019)

https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/
178•shirian•4h ago•115 comments

A brief history of oral peptides

https://seangeiger.substack.com/p/a-brief-history-of-oral-peptides
8•odedfalik•22h ago•3 comments

The Evolution of Bengt Betjänt

https://andonlabs.com/blog/evolution-of-bengt
4•lukaspetersson•16h ago•0 comments

Competition is not market validation

https://www.ablg.io/blog/competition-is-not-validation
4•tonioab•3h ago•0 comments

Show HN: Deadlog – almost drop-in mutex for debugging Go deadlocks

https://github.com/stevenctl/deadlog
4•dirteater_•1h ago•0 comments

I started programming when I was 7. I'm 50 now and the thing I loved has changed

https://www.jamesdrandall.com/posts/the_thing_i_loved_has_changed/
370•jamesrandall•4h ago•319 comments

Semaglutide improves knee osteoarthritis independant of weight loss

https://www.cell.com/cell-metabolism/abstract/S1550-4131(26)00008-2
135•randycupertino•2h ago•96 comments

Show HN: Multimodal perception system for real-time conversation

https://raven.tavuslabs.org
7•mert_gerdan•45m ago•1 comments

Redefining Go Functions

https://pboyd.io/posts/redefining-go-functions/
59•todsacerdoti•5h ago•16 comments

Vercel's CEO offers to cover expenses of 'Jmail'

https://www.threads.com/@qa_test_hq/post/DUkC_zjiGQh
170•vinnyglennon•4h ago•122 comments

Europe's $24T Breakup with Visa and Mastercard Has Begun

https://europeanbusinessmagazine.com/business/europes-24-trillion-breakup-with-visa-and-mastercar...
374•NewCzech•8h ago•346 comments

Jury told that Meta, Google 'engineered addiction' at landmark US trial

https://techxplore.com/news/2026-02-jury-told-meta-google-addiction.html
399•geox•5h ago•307 comments