frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

A $196 fine-tuned 7B model outperforms OpenAI o3 on document extraction

https://arxiv.org/abs/2509.22906
75•henriquegodoy•1h ago

Comments

giancarlostoro•41m ago
So its a model designed exclusively for a purpose? Then the results should not be that surprising. It's still impressive don't get me wrong.
mvieira38•34m ago
The results are interesting for showing the efficacy of small, fine-tuned models that can be run locally. AI providers as a business need their do-all models to be better than these if they want long-term revenue through the APIs, right?
giancarlostoro•24m ago
It depends on the provider and their goals. We have recently seen a schism between OpenAI and Anthropic, whereby Athropic is going all in on automation and programming and OpenAI is going all in for I guess a personal assistant / personal tasks AI.
empath75•10m ago
They can sell fine tuned models running on cheaper hardware in bulk, too. Scale is a thing.
uxcolumbo•39m ago
Anybody have a link to this model?

Can't seem to see it on the arxiv site.

dhaivat•33m ago
I see the dataset published by the author - https://huggingface.co/datasets/HenriqueGodoy/extract-0 however the model is not published/public yet!
trjordan•35m ago
It really seems like all the next big leaps in AI are going to be fine-tuning fit-for-purpose models.

Everything past GPT5 has been ... fine. It's better at chat (sort of, depending on your tone preferenc) and way better at coding/tool use. In our product (plan out a migration with AI), they've gotten worse, because they want to chat or code. I'd have expected the coding knowledge to generalize, but no! Especially Claude really wants to change our code or explain the existing plan to me.

We're getting around it with examples and dynamic prompts, but it's pretty clear that fine-tuning is in our future. I suspect most of the broad-based AI success is going to look like that in the next couple years.

lenerdenator•16m ago
We'll need to find a way to make fine-tuning happen on consumer hardware. I hope we do that sooner rather than later. $196 is not awful, but still pretty high up on the cost side for hobbyists.
era37•32m ago
LLMs are only going to improve by fragmenting them into specialized systems for low parameter high performance results. We’ve reached the point where models will get smaller and more compact
Imustaskforhelp•30m ago
Yes this was my understanding too.

Like I wanted this from a year or two ago to just lets say have a model which lets say is genuinely really really good at sveltekit as an example instead of a model which is good at a lot of different things of sorts yknow

A model for sveltekit, A model for react and for coding general purpose too and preferably we can have a website which can make it easy to find these models/run them, ollama comes to my mind right now but it has really enshittened a little bit from the time when I was thinking about this but so maybe now a little competition on that side wouldn't hurt I suppose.

christkv•28m ago
I guess we are going to be using multiple small specialized models with a reasoning model and tooling.
verbify•24m ago
I thought "The Bitter Lesson" was that whole a specialised system will outperform in the short term, generalized systems with lots of data win in the long term.

http://www.incompleteideas.net/IncIdeas/BitterLesson.html

jvanderbot•14m ago
Over time. But for a given instant, specialization will always win. That message is for researchers, who seek to have long term impact and it's bitter because it goes against their desire to provide long term impact from their own clever abstraction or insights.

But it's informative for the engineers that need something right now, because it means taking the best general purpose tool and specializing it will outperform the general tool, and you can sustain that if you are willing to always hop tools and respecialize. As we may.

just-the-wrk•30m ago
We're seeing the insect-ization of neural nets. Smaller specialists are evolving for their relevant tasks
tom_wilde•24m ago
Sceptical. Performs really well on 1000 docs. Let’s see it for real..! No model supplied.

https://github.com/herniqeu/extract0

To quote Mulder: I want to believe.

jrm4•19m ago
Seems this gives us a clear bifurcation of what AI is about to do.

Open-Source style small players will actually solve problems with AI.

And the big money invested things are going to do stupid pointless bubbly things at best, or enshittify other good things at worst.

Govern yourselves accordingly.

lenerdenator•18m ago
It's not surprising given specialization typically leads to better outcomes.

I guess this is a small step forward, if nothing else, to the day when I can actually teach a model something in situ on my personal machine (notice I said machine, not "machines") in a very short amount of time. I feel that until then, LLMs and similar technologies won't be maximally helpful. They're very useful, but not maximally helpful.

mountainriver•16m ago
It's wild to me how many people still think that fine-tuning doesn't work. There was just a thread the other day with numerous people arguing that RL should only be done by the big labs.

There is so much research that shows you can beat frontier models with very little investment. It's confusing that the industry at large hasn't caught up with that

esafak•11m ago
A LoRA fine tune of DeepSeekR1-Distill-Qwen-7B with a training cost of $196.
whakim•9m ago
Ok, but what was the cost of labor put into curation of the training dataset and performing the fine-tuning? Hasn’t the paper’s conclusion been repeatedly demonstrated - that it is possible to get really good task-specific performance out of fine-tuned smaller models? There just remains the massive caveat that closed-source models are pretty cheap and so the ROI isn’t there in a lot of cases.
mnkv•9m ago
> the generation of 281,128 augmented examples, from which 1,000 were held out as a benchmark test set.

This model is trained on a custom dataset of 280k examples then tested on 1k very similar examples from the same dataset. Of course it is specialized to outperform general models on this specific task in this specific domain with this specific json format for output.

This is a reasonable hobby project and interesting approach to synthetic data generation but not impressive research.

At minimum you should test your model on other benchmarks that have similar tasks e.g. docbench

m3kw9•3m ago
So they tested using training examples? Lmao
Jimmc414•1m ago
The LoRA + GRPO training pipeline and the semantic similarity reward function over exact matching is actually interesting, but there is an evaluation issue that is a significant problem if you want to accept the headline at face value.

They trained on synthetic extractions like "extract equations from arXiv papers" and "extract regulatory information from FDA documents," then tested on more synthetic extractions from the same sources. Essentially, "model trained on synthetic arXiv/PubMed/FDA extractions performs better on more synthetic arXiv/PubMed/FDA extractions than a model that never saw this distribution."

I'd like to see how it handles extractions from a real contract, or a low quality scan of a financial document, or processes a format it didn't see in training. o3 very likely handles these variations better, but we don't have that data to compare.

We need the model weights or tests on standard benchmarks to verify if this generalizes beyond documents that look like the training distribution.

OpenAI releases prompt library for any role

https://academy.openai.com/public/clubs/work-users-ynjqu/resources/chatgpt-for-any-role
1•linhns•1m ago•0 comments

Show HN: Simple 'photobooth' app inspired by "The 28 AI tools I wish existed"

https://mix-re.web.app
1•maxaw•6m ago•0 comments

MCP and AI Agent Authorization. A Guide to Securing the New AI Perimeter

https://www.cerbos.dev/blog/mcp-security-ai-agent-authorization-a-ciso-and-architects-guide
2•emreb•7m ago•0 comments

Vercel Notches $9.3B Valuation in Latest AI Funding Round

https://www.bloomberg.com/news/articles/2025-09-30/vercel-notches-9-3-billion-valuation-in-latest...
1•the_mitsuhiko•8m ago•0 comments

Sora 2

https://blog.samaltman.com/sora-2
2•sroussey•8m ago•0 comments

OpenAI's New Sora Video Generator to Require Copyright Holders to Opt Out

https://www.wsj.com/tech/ai/openais-new-sora-video-generator-to-require-copyright-holders-to-opt-...
1•mfiguiere•9m ago•0 comments

NTK (1997–2007)

http://www.ntk.net/
1•robin_reala•11m ago•0 comments

Show HN: AI tools for high lvl researchers who need assistance – not answers

https://www.ubik.studio/
1•ieuanking•11m ago•0 comments

Hegseth uses rare meeting of generals to announce new military standards

https://www.politico.com/news/2025/09/30/hegseth-meeting-generals-standards-00586122
3•cosmicgadget•12m ago•1 comments

Nuclear Thermal Rocket Emulator for a Hardware-in-the-Loop Test Bed

https://www.mdpi.com/1996-1073/18/16/4439
1•PaulHoule•13m ago•0 comments

Answering your top questions about Android developer verification

https://android-developers.googleblog.com/2025/09/lets-talk-security-answering-your-top.html
1•rom1v•13m ago•1 comments

A 127-year old ham sandwich ekiben

https://frankbear.substack.com/p/the-story-behind-the-story-of-a-127
1•hglaser•13m ago•0 comments

Do you think Liquid Glass will be widespread outside the Apple ecosystem?

1•andraskindler•13m ago•1 comments

ColdFusion (2025)'s CFOAUTH Tag

https://www.raymondcamden.com/2025/09/30/coldfusion-2025s-cfoauth-tag
1•mooreds•14m ago•0 comments

Ruby Central's "security measures" leave front door wide open

https://joel.drapper.me/p/ruby-central-security-measures/
3•zorpner•14m ago•0 comments

Why is Linux still trash in 2025?

1•coolThingsFirst•15m ago•4 comments

IP over Lasers

https://www.mikekohn.net/micro/ip_over_lasers.php
3•sgt•16m ago•0 comments

Save Quantum Computing from Regulation

https://www.technologylaw.ai/p/quantum-computing-regulation
1•pcaharrier•17m ago•0 comments

AAUP vs. Rubio [pdf]

https://storage.courtlistener.com/recap/gov.uscourts.mad.282460/gov.uscourts.mad.282460.261.0.pdf
2•coloneltcb•20m ago•1 comments

Hedge Funds Have to Be Big

https://www.bloomberg.com/opinion/newsletters/2025-09-30/hedge-funds-have-to-be-big
10•feross•23m ago•5 comments

Sneakernet

https://en.wikipedia.org/wiki/Sneakernet
1•helle253•23m ago•0 comments

The bold gamble that helped Wiz CEO Assaf Rappaport win a $32B deal

https://fortune.com/article/wiz-cloud-security-ceo-assaf-rappaport-google-sundar-pichai/
1•jgeralnik•23m ago•0 comments

Tunix: A JAX-native LLM Post-Training Library

https://github.com/google/tunix
1•saikatsg•25m ago•0 comments

White House Announces ‘TrumpRx’ Drug-Buying Site, and Pricing Deal With Pfizer

https://www.wsj.com/health/pharma/white-house-to-announce-trumprx-drug-buying-website-and-deal-wi...
13•impish9208•28m ago•4 comments

Shellshock

https://dwheeler.com/essays/shellshock.html
2•udev4096•29m ago•0 comments

Sesame – Maya: A Personal Companion

https://app.sesame.com/
1•tomaytotomato•30m ago•1 comments

Show HN: Desktop app to self-host static sites on a VPS without sysadmin skills

https://judi.systems/sprouts/
1•hsn915•30m ago•0 comments

Ask HN: What are the major world tarot cards?

2•phoenixhaber•30m ago•0 comments

Show HN: A USB controller makes one flash drive 4 independent disks (no drivers)

https://xusb.net/project-1/logical-split-disk-embodiment/
2•xusbnet•31m ago•0 comments

Ask HN: If AI results in UBI, will everyone stop criticizing AI on social media?

1•amichail•33m ago•3 comments