frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Research-Driven Agents: What Happens When Your Agent Reads Before It Codes

https://blog.skypilot.co/research-driven-agents/
41•hopechong•2h ago

Comments

hopechong•2h ago
Coding agents that read papers before writing code find optimizations that code-only agents miss.

We added a literature review phase to Karpathy’s autoresearch loop and pointed it at llama.cpp. The agent autonomously read arxiv papers, studied competing forks and spun up VMs to run parallel experiments.

dataviz1000•1h ago
(Sorry to spam.)

I'm working on this also from a different angle. Hopefully sharing adds to the conversation.

First, about the loop, Claude's (coding agent) context and attention is big enough to self-reflect. Agent Tuning shows a technique that not only demonstrates this but a way quantify it. [0] The difference is autoresearch's val_bpb measures what the agent built; Agent Tuning's p̂ measures the agent itself.

> Claude's attention doesn't distinguish between "instructions I'm writing" and "instructions I'm following" -- they're both just tokens in context.

Second, doing research, finding academic research to add to context helps. Here is an example of an implementation that creates trading strategies by reading research and recreating them in creative new ways. [1]

The biggest problem is the coding agents don't "Fail fast and loud". They fail deceivingly.

[0] https://github.com/adam-s/agent-tuning

[1] https://github.com/adam-s/alphadidactic

phendrenad2•45m ago
This is obvious, right? If you want to build a Facebook clone, you wouldn't tell the agent "build Facebook". You would provide it with a description of every page on Facebook, behaviors, interactions, UI, etc.
faeyanpiraat•30m ago
Have you even read the TL;DR in the linked article??
phendrenad2•23m ago
You mean this part?

> TL;DR: Coding agents generate better optimizations when they read papers and study competing projects before touching code

What made you think I hadn't read the article, let alone that TL;DR? I'm really curious. Jumping to an insulting "have you read the article" is a big step, so it'll be really interesting to see where your mind went.

KingOfCoders•42m ago
I use #PPPCDC for prompting: plan,plan,plan then verify with: Compare the plan to the existing Code. Reread and compare the plan to the Docs. Fix the areas you're not Confident about.
hungryhobbit•38m ago
I think anyone who uses Claude knows that it works smarter when you have it make a plan first, and ask it to research the existing code as much as possible first ... so the results in this article doesn't surprise me at all.

However, I'd be curious to hear back from others who have tried adding the shell script (at the end of the article) to their flow: does it (really) improve Claude?

doctorpangloss•32m ago
The skypilot devs need to focus on decoupling their offering, so that their very valuable "find the cheapest cloud" functionality isn't married to a glitchy reinvention of Kubernetes JobSet and MLflow
simlevesque•13m ago
I've been making skills from arxiv papers for a while. I have a one for multi-object tracking for example. It has a SKILL.md describing all important papers (over 30) on the subject and a folder with each paper's full content as reStructuredText.

To feed Arxiv papers to LLMs I found that RST gives the best token count/fidelity ratio. Markdown lacks precision. LateX is too verbose. I have a script with the paper's urls, name and date that downloads the LateX zips from Arxiv, extracts it, transforms them to RST and then adds them to the right folder. Then I ask a LLM to make a summary from the full text, then I give other LLMs the full paper again with the summary and ask them to improve on and and proofread them. While this goes on I read the papers myself and at the end I read the summaries and if I approve them I add it to the skill. I also add for each paper info on how well the algorithms described do in common benchmarks.

I highly recommend doing something similar if you're working in a cutting-edge domain. Also I'd like to know if anyone has recommendations to improve what I do.

alex000kim•5m ago
sounds similar to "LLM Knowledge Bases" https://xcancel.com/karpathy/status/2039805659525644595
austinbaggio•9m ago
Research step makes sense, can also confirm that running multiple agents with diverse strategies also compound results more quickly than single agents
alex000kim•3m ago
I am sure this would works well in general. There is a challenge wrt to how to make them communicate effectively to e.g. 1) avoid duplicative work and 2) allow them to combine/overlay each others' findings to yield even better results
outside1234•2m ago
A research step (gather insights from across the codebase and internet for how to accomplish the next step), planning step (how should I sequence implementation given that research), an implementation step, and a verification step (code review of the implementation) is super effective workflow for me.

What Game Engines Know About Data That Databases Forgot

https://nockawa.github.io/blog/what-game-engines-know-about-data/
58•speckx•1h ago•33 comments

Hegel, a universal property-based testing protocol and family of PBT libraries

https://hegel.dev
26•PaulHoule•54m ago•4 comments

Research-Driven Agents: What Happens When Your Agent Reads Before It Codes

https://blog.skypilot.co/research-driven-agents/
41•hopechong•2h ago•13 comments

Show HN: I built a Cargo-like build tool for C/C++

https://github.com/randerson112/craft
71•randerson_112•3h ago•71 comments

PicoZ80 – Drop-In Z80 Replacement

https://eaw.app/picoz80/
5•rickcarlino•38m ago•1 comments

EFF is leaving X

https://www.eff.org/deeplinks/2026/04/eff-leaving-x
606•gregsadetsky•2h ago•540 comments

Top laptops to use with FreeBSD

https://freebsdfoundation.github.io/freebsd-laptop-testing/
208•fork-bomber•10h ago•110 comments

Reallocating $100/Month Claude Code Spend to Zed and OpenRouter

https://braw.dev/blog/2026-04-06-reallocating-100-month-claude-spend/
216•kisamoto•10h ago•169 comments

Introduction to Nintendo DS Programming

https://www.patater.com/files/projects/manual/manual.html
180•medbar•1d ago•33 comments

A WebGPU implementation of Augmented Vertex Block Descent

https://github.com/jure/webphysics
100•juretriglav•7h ago•11 comments

Doing Impressions: Monet's Early Caricatures (ca. late 1850s)

https://publicdomainreview.org/collection/claude-monet-caricatures/
34•prismatic•3d ago•1 comments

Little Snitch comes to Linux, but the core logic is closed source

https://the.unknown-universe.co.uk/privacy-security/little-snitch-linux/
16•TheIPW•2h ago•5 comments

Meta removes ads for social media addiction litigation

https://www.axios.com/2026/04/09/meta-social-media-addiction-ads
440•giuliomagnifico•6h ago•179 comments

Wit, unker, Git: The lost medieval pronouns of English intimacy

https://www.bbc.com/future/article/20260408-the-extinct-english-words-for-just-the-two-of-us
143•eigenspace•9h ago•82 comments

Unfolder for Mac – A 3D model unfolding tool for creating papercraft

https://www.unfolder.app/
20•codazoda•2h ago•2 comments

Show HN: CSS Studio. Design by hand, code by agent

https://cssstudio.ai
111•SirHound•8h ago•82 comments

ChatGPT Pro now starts at $100/month

https://chatgpt.com/pricing/
116•strongpigeon•1h ago•123 comments

Open source security at Astral

https://astral.sh/blog/open-source-security-at-astral
326•vinhnx•15h ago•78 comments

Launch HN: Relvy (YC F24) – On-call runbooks, automated

https://www.relvy.ai
34•behat•7h ago•22 comments

Aggro Is the Foundation (2022)

https://radimentary.wordpress.com/2022/11/07/aggro-is-the-foundation/
4•surprisetalk•6d ago•0 comments

Lichess and Take Take Take Sign Cooperation Agreement

https://lichess.org/@/Lichess/blog/lichess-and-take-take-take-sign-cooperation-agreement/DZS0S0Dy
148•stevage•7h ago•39 comments

Help Keep Thunderbird Alive

https://updates.thunderbird.net/en-US/thunderbird/140.0/apr26-1e/donate/
418•playfultones•12h ago•305 comments

How Pizza Tycoon simulated traffic on a 25 MHz CPU

https://pizzalegacy.nl/blog/traffic-system.html
231•FinnKuhn•6h ago•47 comments

The Vercel plugin on Claude Code wants to read your prompts

https://akshaychugh.xyz/writings/png/vercel-plugin-telemetry
237•akshay2603•4h ago•88 comments

Haunted Paper Toys

http://ravensblight.com/papertoys.html
213•exvi•3d ago•28 comments

LittleSnitch for Linux

https://obdev.at/products/littlesnitch-linux/index.html
1222•pluc•19h ago•402 comments

Building a framework-agnostic Ruby gem (and making sure it doesn't break)

https://newsletter.masilotti.com/p/on-building-a-framework-agnostic
35•joemasilotti•2d ago•4 comments

Creating the Futurescape for the Fifth Element (2019)

https://theasc.com/articles/fantastic-voyage-creating-the-futurescape-for-the-fifth-element
106•nixass•10h ago•86 comments

Small Engines

https://scottlocklin.wordpress.com/2026/03/25/very-small-engines/
64•surprisetalk•3d ago•15 comments

Tree Calculus

https://treecalcul.us/
103•tosh•6d ago•24 comments