frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Fine-tuned small LLMs can beat large ones with programmatic data curation

https://www.tensorzero.com/blog/fine-tuned-small-llms-can-beat-large-ones-at-5-30x-lower-cost-with-programmatic-data-curation/
34•GabrielBianconi•7h ago

Comments

alchemist1e9•4h ago
I’ve been thinking about curating primary sources themselves and then using those for fine-tuning.

Anyone gone that route and know of projects with very high quality curated source materials? ideally categorized and labeled.

k8si•4h ago
Maybe this is a nitpick but CoNLL NER is not a "challenging task". Even pre-LLM systems were getting >90 F1 on that as far back as 2016.

Also, just in case people want to lit review further on this topic: they call their method "programmatic data curation" but I believe this approach is also called model distillation and/or student-teacher training.

GabrielBianconi•3h ago
Thanks for the feedback!

We chose a set of tasks with different levels of complexity to see how this approach would scale. For LLMs, the "challenge" with NER is not the task itself but the arbitrariness of the labels in the dataset. I agree it's still much simpler than the other tasks we present (agentic RAG, agentic tool use, maze navigation).

There are definitely strong parallels to model distillation and student-teacher training, with the primary difference being that we don't simply take all the data from the larger model but rather filter the dataset based on metrics from the environment. In the "Does curation even matter?" section, we show that this generally improves the result by a good margin.

We link to Vicuna, which might be the closest reference as prior art: https://lmsys.org/blog/2023-03-30-vicuna/

Thanks!

mwigdahl•3h ago
Is this just distillation but with a step to filter out low-quality responses first?
GabrielBianconi•3h ago
AFAIK, distillation typically refers to tuning on the logits of the larger model, so you wouldn't be able to do that with fine-tuning APIs (OpenAI + Google in our blog post). We fine-tune on the outputs themselves.

But broadly speaking, yes, we generate data using a large model, curate the best samples using metrics from the environment, and fine-tune on that data. This isn't a novel technique from an academic perspective; our focus is on applying it to different use cases (e.g. agentic RAG, agentic tool use) and models (OpenAI, Google, Qwen).

Thanks!

mwigdahl•3h ago
Thanks for the explanation and the clarification on terminology! I've used a similar approach myself and it sounded like you were doing something similar.
littlestymaar•1h ago
> AFAIK, distillation typically refers to tuning on the logits of the larger model

I think this is called “logit distillation” which is a particular form of distillation but not the only one.

> so you wouldn't be able to do that with fine-tuning APIs (OpenAI + Google in our blog post)

Dististillation from competitors' API is so common it has been given a name: it's called “distealing”.

6510•3h ago
Noob question: Would it be possible to train a small model for a single prompt?
GabrielBianconi•3h ago
With supervised fine-tuning (SFT), you'll often see good results with 100-1000+ datapoints (they can be variations of the same prompt template). If you have more limited data, reinforcement fine-tuning (RFT) can work well in the 10-100 range.

Good luck!

Show HN: I spent 6 years building a ridiculous wooden pixel display

https://benholmen.com/blog/kilopixel/
642•benholmen•7h ago•95 comments

Is It FOSS?

https://isitreallyfoss.com/
66•exiguus•2h ago•11 comments

Qwen-Image: Crafting with native text rendering

https://qwenlm.github.io/blog/qwen-image/
249•meetpateltech•7h ago•74 comments

NASA's Curiosity picks up new skills

https://www.jpl.nasa.gov/news/marking-13-years-on-mars-nasas-curiosity-picks-up-new-skills/
77•Bluestein•4h ago•24 comments

AWS European Sovereign Cloud to be operated by EU citizens

https://www.aboutamazon.eu/news/aws/aws-european-sovereign-cloud-to-be-operated-by-eu-citizens
45•pulisse•2h ago•36 comments

How we made JSON.stringify more than twice as fast

https://v8.dev/blog/json-stringify
132•emschwartz•9h ago•24 comments

What Does One Billion Dollars Look Like?

https://whatdoesonebilliondollarslooklike.website/
21•alexrustic•1h ago•16 comments

Indian Sign Painting: A typeface designer's take on the craft

https://bl.ag/indian-sign-painting-a-typeface-designers-take-on-the-craft/
100•detaro•2d ago•16 comments

Content-Aware Spaced Repetition

https://www.giacomoran.com/blog/content-aware-sr/
60•ran3000•4h ago•15 comments

Show HN: I've been building an ERP for manufacturing for the last 3 years

https://github.com/crbnos/carbon
5•barbinbrad•1h ago•0 comments

Job-seekers are dodging AI interviewers

https://fortune.com/2025/08/03/ai-interviewers-job-seekers-unemployment-hiring-hr-teams/
474•robtherobber•15h ago•730 comments

EconTeen – Financial Literacy Lessons and Tools for Teens

https://econteen.com/
4•Chrisjackson4•23m ago•1 comments

Hiroshima (1946)

https://www.newyorker.com/magazine/1946/08/31/hiroshima
25•pseudolus•2d ago•17 comments

OpenIPC: Open IP Camera Firmware

https://openipc.org/à
180•zakki•3d ago•105 comments

Cellular Starlink expands to support IoT devices

https://me.pcmag.com/en/networking/31452/spacexs-cellular-starlink-expands-to-support-iot-devices
57•teleforce•3d ago•38 comments

Once a death sentence, cardiac amyloidosis is finally treatable

https://www.nytimes.com/2025/08/04/well/cardiac-amyloidosis.html
74•elektor•3h ago•2 comments

DrawAFish.com Postmortem

https://aldenhallak.com/blog/posts/draw-a-fish-postmortem.html
221•hallak•11h ago•52 comments

How we built Bluey’s world

https://www.itsnicethat.com/features/how-we-built-bluey-s-world-cartoon-background-scenery-art-director-catriona-drummond-animation-090725
299•skrebbel•3d ago•137 comments

Perplexity is using stealth, undeclared crawlers to evade no-crawl directives

https://blog.cloudflare.com/perplexity-is-using-stealth-undeclared-crawlers-to-evade-website-no-crawl-directives/
901•rrampage•9h ago•523 comments

What Can a Cell Remember?

https://www.quantamagazine.org/what-can-a-cell-remember-20250730/
42•chapulin•4d ago•4 comments

A deep dive into Rust and C memory interoperability

https://notashes.me/blog/part-1-memory-management/
127•hyperbrainer•8h ago•57 comments

Customizing tmux

https://evgeniipendragon.com/posts/customizing-tmux-and-making-it-less-dreadful/
75•EPendragon•7h ago•71 comments

My Ideal Array Language

https://www.ashermancinelli.com/csblog/2025-7-20-Ideal-Array-Language.html
109•bobajeff•10h ago•50 comments

Show HN: Sidequest.js – Background jobs for Node.js using your database

https://docs.sidequestjs.com/quick-start
42•merencia•7h ago•11 comments

Read your code

https://etsd.tech/posts/rtfc/
156•noeclement•10h ago•89 comments

Century-old stone “tsunami stones” dot Japan's coastline (2015)

https://www.smithsonianmag.com/smart-news/century-old-warnings-against-tsunamis-dot-japans-coastline-180956448/
124•deegles•10h ago•43 comments

Objects should shut up

https://dustri.org/b/objects-should-shut-the-fuck-up.html
263•gm678•9h ago•202 comments

Show HN: Tiny logic and number games I built for my kids

https://quizmathgenius.com/
66•min2bro•8h ago•25 comments

Is the interstellar object 3I/ATLAS alien technology? [pdf]

https://lweb.cfa.harvard.edu/~loeb/HCL25.pdf
72•jackbravo•10h ago•94 comments

Circadian justice (2022)

https://eprints.lse.ac.uk/112431/
54•anigbrowl•5h ago•21 comments