frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Canada unveils auto industry plan in latest pivot away from US

https://www.bbc.com/news/articles/cvgd2j80klmo
1•breve•53s ago•0 comments

The essential Reinhold Niebuhr: selected essays and addresses

https://archive.org/details/essentialreinhol0000nieb
1•baxtr•3m ago•0 comments

Rentahuman.ai Turns Humans into On-Demand Labor for AI Agents

https://www.forbes.com/sites/ronschmelzer/2026/02/05/when-ai-agents-start-hiring-humans-rentahuma...
1•tempodox•5m ago•0 comments

StovexGlobal – Compliance Gaps to Note

1•ReviewShield•8m ago•0 comments

Show HN: Afelyon – Turns Jira tickets into production-ready PRs (multi-repo)

https://afelyon.com/
1•AbduNebu•9m ago•0 comments

Trump says America should move on from Epstein – it may not be that easy

https://www.bbc.com/news/articles/cy4gj71z0m0o
2•tempodox•9m ago•0 comments

Tiny Clippy – A native Office Assistant built in Rust and egui

https://github.com/salva-imm/tiny-clippy
1•salvadorda656•14m ago•0 comments

LegalArgumentException: From Courtrooms to Clojure – Sen [video]

https://www.youtube.com/watch?v=cmMQbsOTX-o
1•adityaathalye•17m ago•0 comments

US moves to deport 5-year-old detained in Minnesota

https://www.reuters.com/legal/government/us-moves-deport-5-year-old-detained-minnesota-2026-02-06/
2•petethomas•20m ago•1 comments

If you lose your passport in Austria, head for McDonald's Golden Arches

https://www.cbsnews.com/news/us-embassy-mcdonalds-restaurants-austria-hotline-americans-consular-...
1•thunderbong•24m ago•0 comments

Show HN: Mermaid Formatter – CLI and library to auto-format Mermaid diagrams

https://github.com/chenyanchen/mermaid-formatter
1•astm•40m ago•0 comments

RFCs vs. READMEs: The Evolution of Protocols

https://h3manth.com/scribe/rfcs-vs-readmes/
2•init0•47m ago•1 comments

Kanchipuram Saris and Thinking Machines

https://altermag.com/articles/kanchipuram-saris-and-thinking-machines
1•trojanalert•47m ago•0 comments

Chinese chemical supplier causes global baby formula recall

https://www.reuters.com/business/healthcare-pharmaceuticals/nestle-widens-french-infant-formula-r...
1•fkdk•50m ago•0 comments

I've used AI to write 100% of my code for a year as an engineer

https://old.reddit.com/r/ClaudeCode/comments/1qxvobt/ive_used_ai_to_write_100_of_my_code_for_1_ye...
2•ukuina•52m ago•1 comments

Looking for 4 Autistic Co-Founders for AI Startup (Equity-Based)

1•au-ai-aisl•1h ago•1 comments

AI-native capabilities, a new API Catalog, and updated plans and pricing

https://blog.postman.com/new-capabilities-march-2026/
1•thunderbong•1h ago•0 comments

What changed in tech from 2010 to 2020?

https://www.tedsanders.com/what-changed-in-tech-from-2010-to-2020/
2•endorphine•1h ago•0 comments

From Human Ergonomics to Agent Ergonomics

https://wesmckinney.com/blog/agent-ergonomics/
1•Anon84•1h ago•0 comments

Advanced Inertial Reference Sphere

https://en.wikipedia.org/wiki/Advanced_Inertial_Reference_Sphere
1•cyanf•1h ago•0 comments

Toyota Developing a Console-Grade, Open-Source Game Engine with Flutter and Dart

https://www.phoronix.com/news/Fluorite-Toyota-Game-Engine
1•computer23•1h ago•0 comments

Typing for Love or Money: The Hidden Labor Behind Modern Literary Masterpieces

https://publicdomainreview.org/essay/typing-for-love-or-money/
1•prismatic•1h ago•0 comments

Show HN: A longitudinal health record built from fragmented medical data

https://myaether.live
1•takmak007•1h ago•0 comments

CoreWeave's $30B Bet on GPU Market Infrastructure

https://davefriedman.substack.com/p/coreweaves-30-billion-bet-on-gpu
1•gmays•1h ago•0 comments

Creating and Hosting a Static Website on Cloudflare for Free

https://benjaminsmallwood.com/blog/creating-and-hosting-a-static-website-on-cloudflare-for-free/
1•bensmallwood•1h ago•1 comments

"The Stanford scam proves America is becoming a nation of grifters"

https://www.thetimes.com/us/news-today/article/students-stanford-grifters-ivy-league-w2g5z768z
4•cwwc•1h ago•0 comments

Elon Musk on Space GPUs, AI, Optimus, and His Manufacturing Method

https://cheekypint.substack.com/p/elon-musk-on-space-gpus-ai-optimus
2•simonebrunozzi•1h ago•0 comments

X (Twitter) is back with a new X API Pay-Per-Use model

https://developer.x.com/
3•eeko_systems•1h ago•0 comments

Zlob.h 100% POSIX and glibc compatible globbing lib that is faste and better

https://github.com/dmtrKovalenko/zlob
3•neogoose•1h ago•1 comments

Show HN: Deterministic signal triangulation using a fixed .72% variance constant

https://github.com/mabrucker85-prog/Project_Lance_Core
2•mav5431•1h ago•1 comments
Open in hackernews

Extract-0: A specialized language model for document information extraction

https://arxiv.org/abs/2509.22906
195•henriquegodoy•4mo ago

Comments

giancarlostoro•4mo ago
So its a model designed exclusively for a purpose? Then the results should not be that surprising. It's still impressive don't get me wrong.
mvieira38•4mo ago
The results are interesting for showing the efficacy of small, fine-tuned models that can be run locally. AI providers as a business need their do-all models to be better than these if they want long-term revenue through the APIs, right?
giancarlostoro•4mo ago
It depends on the provider and their goals. We have recently seen a schism between OpenAI and Anthropic, whereby Athropic is going all in on automation and programming and OpenAI is going all in for I guess a personal assistant / personal tasks AI.
empath75•4mo ago
They can sell fine tuned models running on cheaper hardware in bulk, too. Scale is a thing.
lemonlearnings•4mo ago
The car analogy is Tesla needs to make tractors to compete.
uxcolumbo•4mo ago
Anybody have a link to this model?

Can't seem to see it on the arxiv site.

dhaivat•4mo ago
I see the dataset published by the author - https://huggingface.co/datasets/HenriqueGodoy/extract-0 however the model is not published/public yet!
trjordan•4mo ago
It really seems like all the next big leaps in AI are going to be fine-tuning fit-for-purpose models.

Everything past GPT5 has been ... fine. It's better at chat (sort of, depending on your tone preferenc) and way better at coding/tool use. In our product (plan out a migration with AI), they've gotten worse, because they want to chat or code. I'd have expected the coding knowledge to generalize, but no! Especially Claude really wants to change our code or explain the existing plan to me.

We're getting around it with examples and dynamic prompts, but it's pretty clear that fine-tuning is in our future. I suspect most of the broad-based AI success is going to look like that in the next couple years.

lenerdenator•4mo ago
We'll need to find a way to make fine-tuning happen on consumer hardware. I hope we do that sooner rather than later. $196 is not awful, but still pretty high up on the cost side for hobbyists.
selim-now•4mo ago
well, fine-tuning is possible on consumer hardware, the problem is that it would be slow and that you're limited in the size of the dataset you can use in the process.

In case you would want to follow the approach in this paper and synthetically augment a dataset – using an LLM for that (instead of a smaller model) just makes sense and then the entire process cannot be easily run on your local machine.

derac•4mo ago
GPT 5 came out less than two months ago, lol.
era37•4mo ago
LLMs are only going to improve by fragmenting them into specialized systems for low parameter high performance results. We’ve reached the point where models will get smaller and more compact
Imustaskforhelp•4mo ago
Yes this was my understanding too.

Like I wanted this from a year or two ago to just lets say have a model which lets say is genuinely really really good at sveltekit as an example instead of a model which is good at a lot of different things of sorts yknow

A model for sveltekit, A model for react and for coding general purpose too and preferably we can have a website which can make it easy to find these models/run them, ollama comes to my mind right now but it has really enshittened a little bit from the time when I was thinking about this but so maybe now a little competition on that side wouldn't hurt I suppose.

christkv•4mo ago
I guess we are going to be using multiple small specialized models with a reasoning model and tooling.
selim-now•4mo ago
isn't that the premise of the Nvidia paper? https://arxiv.org/pdf/2506.02153
verbify•4mo ago
I thought "The Bitter Lesson" was that whole a specialised system will outperform in the short term, generalized systems with lots of data win in the long term.

http://www.incompleteideas.net/IncIdeas/BitterLesson.html

jvanderbot•4mo ago
Over time. But for a given instant, specialization will always win. That message is for researchers, who seek to have long term impact and it's bitter because it goes against their desire to provide long term impact from their own clever abstraction or insights.

But it's informative for the engineers that need something right now, because it means taking the best general purpose tool and specializing it will outperform the general tool, and you can sustain that if you are willing to always hop tools and respecialize. As we may.

jvanderbot•4mo ago
Unbeknownst to me, parent edited to incorporate my comment. Move along.
lemonlearnings•4mo ago
I think there is a bitter lesson to the bitter lesson.

Sure you can throw more compute at it. But it cost a lot of money and you hit resource limits.

We have been doing an end run around the bitter lesson with prompt engineering. Also by using different models for vision vs. text. By getting (human coding) agents to "think" and run code.

The bitter lesson might be that you cant predict what the thing is that will be most optimal tomorrow and any player in the AI game can be innovated out of existence at any time.

Maybe anyone except TMSC.

just-the-wrk•4mo ago
We're seeing the insect-ization of neural nets. Smaller specialists are evolving for their relevant tasks
tom_wilde•4mo ago
Sceptical. Performs really well on 1000 docs. Let’s see it for real..! No model supplied.

https://github.com/herniqeu/extract0

To quote Mulder: I want to believe.

jrm4•4mo ago
Seems this gives us a clear bifurcation of what AI is about to do.

Open-Source style small players will actually solve problems with AI.

And the big money invested things are going to do stupid pointless bubbly things at best, or enshittify other good things at worst.

Govern yourselves accordingly.

NitpickLawyer•4mo ago
> stupid pointless bubbly things at best, or enshittify other good things at worst.

oAI just announced like 5bn revenue for half a year, with 13bn projected till end of year. Doesn't seem so pointless now, does it?

jrm4•4mo ago
I'm sorry, did you just defend some big money company purely on the basis of the money they can project in one year?

Did you miss when I said "bubble?" Sigh, y'all are not serious.

lenerdenator•4mo ago
It's not surprising given specialization typically leads to better outcomes.

I guess this is a small step forward, if nothing else, to the day when I can actually teach a model something in situ on my personal machine (notice I said machine, not "machines") in a very short amount of time. I feel that until then, LLMs and similar technologies won't be maximally helpful. They're very useful, but not maximally helpful.

mountainriver•4mo ago
It's wild to me how many people still think that fine-tuning doesn't work. There was just a thread the other day with numerous people arguing that RL should only be done by the big labs.

There is so much research that shows you can beat frontier models with very little investment. It's confusing that the industry at large hasn't caught up with that

jokethrowaway•4mo ago
the subject of this news likely doesn't generalise to random documents

You need some serious resources to do this properly, think about granite docling model by IBM.

For LLM: Finetuning makes sense for light style adjustments with large models (eg. customize a chat assistant to sound a certain way) or to teach some simple transformations (eg. a new output format). You get away with 100-1000 samples.

If you want to teach new behaviour you need a lot of data, likely too much to justify the investment for your average chatgpt wrapper AI company. The pragmatic choice is often to just prompt engineer and maybe split your task and combine multiple prompts

selim-now•4mo ago
interesting, I would argue that fine-tuning makes sense especially in cases where you want to narrow down a small model to a single task – in this case you can get the most bang-per-parameter in a way, using a small model that performs very well in a very narrow space.
mountainriver•4mo ago
The pragmatic case is always prompt engineering. I’m speaking to the common idea that fine tuning doesn’t work, which if you need it, and have the capital (which isn’t all that much) it’s helpful
esafak•4mo ago
A LoRA fine tune of DeepSeekR1-Distill-Qwen-7B with a training cost of $196.
whakim•4mo ago
Ok, but what was the cost of labor put into curation of the training dataset and performing the fine-tuning? Hasn’t the paper’s conclusion been repeatedly demonstrated - that it is possible to get really good task-specific performance out of fine-tuned smaller models? There just remains the massive caveat that closed-source models are pretty cheap and so the ROI isn’t there in a lot of cases.
selim-now•4mo ago
If the cost of getting the model is $200, then the cost of the trade-off seems to be quite clear.

You are right that the labor is a factor, unless you use a platform like https://www.distillabs.ai/ then the process is automated. (I'm affiliated)

mnkv•4mo ago
> the generation of 281,128 augmented examples, from which 1,000 were held out as a benchmark test set.

This model is trained on a custom dataset of 280k examples then tested on 1k very similar examples from the same dataset. Of course it is specialized to outperform general models on this specific task in this specific domain with this specific json format for output.

This is a reasonable hobby project and interesting approach to synthetic data generation but not impressive research.

At minimum you should test your model on other benchmarks that have similar tasks e.g. docbench

m3kw9•4mo ago
So they tested using training examples? Lmao
fxwin•4mo ago
> held out
Aperocky•4mo ago
Actually in this case that's not exactly true:

> generation of 281,128 augmented examples

All example are already correlated because they are generated in the same way.

littlestymaar•4mo ago
> All example are already correlated because they are generated in the same way.

All examples of “document information extraction” would be correlated no matter where they come from because they all would be “document information extraction” examples…

The real question is whether or not the examples are representative of the broad “document information extraction” use-case.

_carltg•4mo ago
The problem is the methodology they use to hold them out. For a truly independent validation set, they need to hold out the material before augmentation, not after. If you hold out after augmentation, then you leverage biases from the training regimen already and hence you artificially boost your model's performance. This is not sufficient to demonstrate your model is generalizing properly.

In analogy: instead of taking leaves off of different trees, they are taking leaves from different branches from the same tree.

selim-now•4mo ago
That would definitely make the evaluation more robust. My fear is that with LLMs at hand people became allergic to preparing good human-labelled evaluation sets and would always to some degree use an LLM as a crutch.
fxwin•4mo ago
I would agree with that
bangaladore•4mo ago
> Of course, it is specialized to outperform general models on this specific task in this specific domain with this specific json format for output.

My understanding is generally this is not considered an obvious result. In that high parameter generalist models largely outperform lower parameter specialists.

The real issue is they tested on data in their training set. *

* Incorrect-- Edit misread parent comment.

disiplus•4mo ago
They did not test on the data that they tested, that's not what he wrote.
DetroitThrow•4mo ago
They synthetically generated 290k examples and kept 10k of them for testing.

It's worth pointing out that that's technically not testing on the training set, but looking at how similar examples are in the dataset, it's clear that severe overfitting would be unavoidable. That also makes the headline very misleading.

The weights may not be published since using it for document extraction on even the same format but with slightly different content or lengths would show how abysmal this finetune does outside of the synthetic data.

bangaladore•4mo ago
Thanks, rereading it makes it clear that you are correct.
littlestymaar•4mo ago
> The real issue is they tested on data in their training set.

Hm, no.

They trained on a part of their synthetic set and tested on another part of the set. Or at least that's what they said they did:

> from which 1,000 were held out as a benchmark test set.

Emphasis mine.

bangaladore•4mo ago
Thanks, rereading it makes it clear that you are correct.
_carltg•4mo ago
Yes, but due to it being derived from the same underlying source dataset, it is effectively evaluating on the training dataset, not an independent validation/ test dataset.

The difference is subtle but important. If we expect the model to truly outperform a general model, it should generalize to a completely independent set.

kingjimmy•4mo ago
in todays news, overfit models are overfit.
gundmc•4mo ago
It's not novel research, but I think it drives home the point that many narrow applications of AI do not require the largest, latest (and most expensive) models. And in many of those cases, a small fine-tuned model is the most performant and cost-effective.

It is probably obvious to most who follow the space closely, but you'd be surprised how many engineers don't recognize this.

Garlef•4mo ago
It's a matter of ROI: When is it worth it to build something specialized?
ImJasonH•4mo ago
Is anybody working on making building specialized things easier and cheaper?
-_-•4mo ago
Yes! At https://RunRL.com we offer hosted RL fine-tuning, so all you need to provide is a dataset and reward function or environment.
selim-now•4mo ago
yes! check out https://distillabs.ai/ – follows a similar approach except the evaluation set is held out before the synthetic data generation, which I would argue makes it more robust (I'm affiliated)
sigbottle•4mo ago
Well, one day it might be at the level of shell scripting. I don't think about "the tradeoffs of building a specialized shell script", I just do it because it's cheap and easy and solves a problem right then and there.

I don't know how you would even begin to make this kind of same observation for ML models, but seems possible. The 2010s weren't exactly building out "trivial" models, but compared to the architectures and optimizations out now, yeah those models are toy by comparison.

Jimmc414•4mo ago
The LoRA + GRPO training pipeline and the semantic similarity reward function over exact matching is actually interesting, but there is an evaluation issue if you want to accept the headline at face value.

They trained on synthetic extractions like "extract equations from arXiv papers" and "extract regulatory information from FDA documents," then tested on more synthetic extractions from the same sources. Essentially, "model trained on synthetic arXiv/PubMed/FDA extractions performs better on more synthetic arXiv/PubMed/FDA extractions than a model that never saw this distribution."

I'd like to see how it handles extractions from a real contract, or a low quality scan of a financial document, or processes a format it didn't see in training. o3 very likely handles these variations better, but we don't have that data to compare.

We need the model weights or tests on standard benchmarks to verify if this generalizes beyond documents that look like the training distribution.

manishsharan•4mo ago
I would also love to see how the model performs on management type presentations. I am referring to Gartner and McKinsey and BCG etc. presentations.
dcastm•4mo ago
Hey OP, I found some issues with your code:

During SFT, it uses the full training dataset[1]:

df = pd.read_csv('data/extraction_training_data.csv')

And during the evaluation, it uses the middle part of the same dataset[2]:

df = pd.read_csv('data/extraction_training_data.csv')

df = df[100000:100000+NUM_TEST_SAMPLES]

Also, you split train/test/val by chunk and not by document[3]. Then, the model "has seen" the documents that you're using to evaluate it (even if you're not evaluating it on the same chunks).

[1]: https://github.com/herniqeu/extract0/blob/0f8696a6fb1b620658...

[2]: https://github.com/herniqeu/extract0/blob/0f8696a6fb1b620658...

[3]: https://github.com/herniqeu/extract0/blob/0f8696a6fb1b620658...

_carltg•4mo ago
Yes, this is the main concern I have with this result as well.

In other words, rather than plucking different leaves (augments) from the same branch or tree (source dataset), you should be evaluating it on an entirely different tree.

This paper in essence does not have a validation dataset, it only has a training dataset and evaluates on a subpopulation (even though that population was never trained on)