frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Identifier for files and directories (like ISBN for Books)

https://github.com/skorotkiewicz/fsid
1•modinfo•37s ago•0 comments

Show HN: Holy Grail: Open-Source Autonomous Development Agent

https://github.com/dakotalock/holygrailopensource
1•Moriarty2026•7m ago•1 comments

Show HN: Minecraft Creeper meets 90s Tamagotchi

https://github.com/danielbrendel/krepagotchi-game
1•foxiel•14m ago•1 comments

Show HN: Termiteam – Control center for multiple AI agent terminals

https://github.com/NetanelBaruch/termiteam
1•Netanelbaruch•15m ago•0 comments

The only U.S. particle collider shuts down

https://www.sciencenews.org/article/particle-collider-shuts-down-brookhaven
1•rolph•17m ago•1 comments

Ask HN: Why do purchased B2B email lists still have such poor deliverability?

1•solarisos•18m ago•2 comments

Show HN: Remotion directory (videos and prompts)

https://www.remotion.directory/
1•rokbenko•20m ago•0 comments

Portable C Compiler

https://en.wikipedia.org/wiki/Portable_C_Compiler
2•guerrilla•22m ago•0 comments

Show HN: Kokki – A "Dual-Core" System Prompt to Reduce LLM Hallucinations

1•Ginsabo•22m ago•0 comments

Software Engineering Transformation 2026

https://mfranc.com/blog/ai-2026/
1•michal-franc•24m ago•0 comments

Microsoft purges Win11 printer drivers, devices on borrowed time

https://www.tomshardware.com/peripherals/printers/microsoft-stops-distrubitng-legacy-v3-and-v4-pr...
3•rolph•24m ago•1 comments

Lunch with the FT: Tarek Mansour

https://www.ft.com/content/a4cebf4c-c26c-48bb-82c8-5701d8256282
2•hhs•27m ago•0 comments

Old Mexico and her lost provinces (1883)

https://www.gutenberg.org/cache/epub/77881/pg77881-images.html
1•petethomas•31m ago•0 comments

'AI' is a dick move, redux

https://www.baldurbjarnason.com/notes/2026/note-on-debating-llm-fans/
4•cratermoon•32m ago•0 comments

The source code was the moat. But not anymore

https://philipotoole.com/the-source-code-was-the-moat-no-longer/
1•otoolep•32m ago•0 comments

Does anyone else feel like their inbox has become their job?

1•cfata•32m ago•1 comments

An AI model that can read and diagnose a brain MRI in seconds

https://www.michiganmedicine.org/health-lab/ai-model-can-read-and-diagnose-brain-mri-seconds
2•hhs•35m ago•0 comments

Dev with 5 of experience switched to Rails, what should I be careful about?

1•vampiregrey•38m ago•0 comments

AlphaFace: High Fidelity and Real-Time Face Swapper Robust to Facial Pose

https://arxiv.org/abs/2601.16429
1•PaulHoule•39m ago•0 comments

Scientists discover “levitating” time crystals that you can hold in your hand

https://www.nyu.edu/about/news-publications/news/2026/february/scientists-discover--levitating--t...
2•hhs•41m ago•0 comments

Rammstein – Deutschland (C64 Cover, Real SID, 8-bit – 2019) [video]

https://www.youtube.com/watch?v=3VReIuv1GFo
1•erickhill•41m ago•0 comments

Tell HN: Yet Another Round of Zendesk Spam

4•Philpax•41m ago•0 comments

Postgres Message Queue (PGMQ)

https://github.com/pgmq/pgmq
1•Lwrless•45m ago•0 comments

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

https://github.com/kjnez/django-rclone
2•cui•48m ago•1 comments

NY lawmakers proposed statewide data center moratorium

https://www.niagara-gazette.com/news/local_news/ny-lawmakers-proposed-statewide-data-center-morat...
2•geox•49m ago•0 comments

OpenClaw AI chatbots are running amok – these scientists are listening in

https://www.nature.com/articles/d41586-026-00370-w
3•EA-3167•50m ago•0 comments

Show HN: AI agent forgets user preferences every session. This fixes it

https://www.pref0.com/
6•fliellerjulian•52m ago•0 comments

Introduce the Vouch/Denouncement Contribution Model

https://github.com/ghostty-org/ghostty/pull/10559
2•DustinEchoes•54m ago•0 comments

Show HN: SSHcode – Always-On Claude Code/OpenCode over Tailscale and Hetzner

https://github.com/sultanvaliyev/sshcode
1•sultanvaliyev•54m ago•0 comments

Microsoft appointed a quality czar. He has no direct reports and no budget

https://jpcaparas.medium.com/microsoft-appointed-a-quality-czar-he-has-no-direct-reports-and-no-b...
3•RickJWagner•56m ago•0 comments
Open in hackernews

AI2: Open Coding Agents

https://allenai.org/blog/open-coding-agents
253•publicmatt•1w ago

Comments

jauntywundrkind•1w ago
Awesome stuff. Output speed looks crazy fast too.

I wonder if this indeed will start prompting more language specific work.

Afaik training still requires not just looking at sample code but also being able to write loss functions being able to have problems the AI can work at. That seems hard.

One random thought, are there training styles of just deleting some code from "good" projects then making the AI make it work again?

CuriouslyC•1w ago
The technique people use is to capture PR diffs from public repos and extract the tests then use that to see if agents can reconstruct the patch that satisfies the tests.
ahmadyan•1w ago
Claims in the article are incorrect. They conveniently ignore Meta CWM models, which are open-sourced [1] and open-weight [2] and are at 65% SWE-bench verified (with TTS) and 54% pass@1 and the same size (32B dense). So claims like "surpassing prior open-source state-of-the-art coding models of comparable sizes and context lengths" and conveniently leaving out the previous OSS SOTA out of your eval tables are ... sketch.

[1]https://github.com/facebookresearch/cwm [2]https://huggingface.co/facebook/cwm

philipkglass•1w ago
The difference is that the Allen Institute models have open training data, not just open code and weights. Meta doesn't share the training data you would need to reproduce their final models. For many uses open-weight models are nearly as good, but for advancing research it's much better to have everything in the open.
kevmo314•1w ago
Reading their paper, it wasn't trained from scratch, it's a fine tune of a Qwen3-32B model. I think this approach is correct, but it does mean that only a subset of the training data is really open.
mhitza•1w ago
The linked open weight disallows commercial, and is only licensed for research purpose
ethan_l_shen•1w ago
Hey! These are great observations. So first, while TTS can improve performance, we wanted to evaluate the raw capability of our model. This meant generating only one rollout per evaluation instance, which follows other papers in the space like SWE-smith and BugPilot. In addition, TTS adds extra inference cost and is reliant on how rollouts are ranked, two confounding factors for deployable models where memory and inference speed are extremely important.

Following that line of reasoning, context length is another very large confounding factor. Longer context lengths improve performance - but also result in enormous increases in KV cache size and memory requirements. We decide to control for this in our paper and focus at the 32K context length for 32B size models, a context length that already pushes the bounds of what can be "deployable" locally.

Still, we evaluate at 64K context length using YARN and are able to outperform CWM's 54% performance (non TTS), which it achieves using 128K context, a substantial increase over what we use. This is also pretty significant because we only ever train at 32K context, but CWM trains for a full 128K.

khimaros•1w ago
it's great to see this kind of progress in reproducible weights, but color me confused. this claims to be better and smaller than Devstral-Small-2-24B, while clocking in at 32B (larger) and scoring more poorly?
ethan_l_shen•1w ago
Hey! We are able to outperform Devstral-Small-2-24B when specializing on repositories, and come well within the range of uncertainty with our best SERA-32B model. That being said, our model is a bit larger than Devstral 24B. Could you point out what in the paper gave the impression that we were smaller? If theres something unclear we would love to revise
khimaros•1w ago
"SERA-32B is the first model in Ai2's Open Coding Agents series. It is a state-of-the-art open-source coding agent that achieves 49.5% on SWE-bench Verified, matching the performance of much larger models like Devstral-Small-2 (24B)" from https://huggingface.co/allenai/SERA-32B
ethan_l_shen•1w ago
Ah great catch I don't know how we missed that. Thanks! Will fix.
nickandbro•1w ago
Great work! Really respect AI2. they open source everything. The model, the weights, the training pipeline, inference stack, and corpus
Imustaskforhelp•1w ago
Hey this looks great? Is it available on Openrouter.

I wish if AI2 could release a more denser model on Openrouter for free than the 8B model as I was using Devstral model for agentic purposes.

If we can get an agentic good 32B like model on openrouter for ~free, then I feel like it will be very interesting to see how things would go imo.

Good luck with AI2! The premise of truly open source models is really interesting and I feel like it could help bring more innovation in the space imo!

ripped_britches•1w ago
One claim in article is definitely very wrong or at least needs to be narrowed. Claude is the only closed agent harness and there are about two dozen open ones. Many models may be closed, but when people say agent they are generally referring to the harness, not the underlying model.
janmue•1w ago
“Strong closed-weight coding agents like Devstral Small 2 are an important point of comparison.”

Devstral Small 2 is an open-weights model: https://huggingface.co/mistralai/Devstral-Small-2-24B-Instru...

evilduck•1w ago
They either updated it or you quoted it wrong but the article says Devstral is open-weights now.
janmue•1w ago
Yeah, they’ve updated it. Here’s the old version: https://web.archive.org/web/20260128034831mp_/https://allena...
Kyle-Wiggers•1w ago
Yes! We updated the blog, thanks for flagging the mistake.
hogehoge51•1w ago
Whats the practical benefit of fine tune training on a local repo, vs putting the summary of local infomation in the context? i.e every team has their own style and preference for coding patterns that could be generalized - but i imagine a large scale model has seen fhem all so they could be described in the context, or are there specific domain level patterns that can be generalized that would never be seen outside an org so are difficult for a model to infer without fresh tunning?
hdjrudni•1w ago
I work on the biggest codebase in the world. We have a fine-tuned model on our codebase. I've not been impressed with it. It does not produce better code than the non-tuned model.

Maybe there's certain problems that it excels at but probably 99% of what I throw it at can be gleaned from the context/nearby code anyway, like you said. Even if I'm using some in-house library (pretty much all of our code), the models are good enough to dig into that library and read the headers if they need to.

Maybe it can help with speed? If it needs to do less research before it can start coding.

metadat•1w ago
Fine-tuning coder models is not nearly as effective as intelligently managing the context with frontier models (opus, gpt-5.2-codex).
NitpickLawyer•1w ago
I don't think it's even a question. A 32b model will not compete with SotA for years to come (if ever). The idea behind this release is to fine-tune on your codebase and compare to non-finetuned open models from the same class (or one higher). So if you need local processing, without access to SotA (security, compliance, whatever) then this is an interesting avenue for you. And the cost is fairly low. They are releasing the method to do this on your own codebase / docs / processes.
miki123211•1w ago
Is this how you say "I work at Google" without explicitly saying that?
Der_Einzige•1w ago
Prove it's the biggest codebase in the world. No way do you know that for sure!
grim_io•1w ago
"Hey Claude, please scaffold me the biggest codebase in the world"
forty•1w ago
How many lines of code is there in the biggest codebase in the world?
lostmsu•1w ago
AFAIK gpt-oss-20b on high reasoning has SWE score of just over 60. It is smaller than all comparable models. Maybe I am missing something, but it is still state of the art all the way up to 50B parameters vs all models released after.

At least https://huggingface.co/facebook/cwm team had balls comparing to it directly (sort of, see TTS).

What does this model do that gpt-oss-20b does not? AFAIU the base model it was finetuned from is not reproducible, and if I flip a single bit in gpt-oss-20b and tell you how (instruction under MIT) that would satisfy "fully open finetuning" they claim as advantage. But that "open" fine-tuned gpt-oss-20b is probably going to beat their model.

Am I missing something?

mirekrusin•1w ago
For low cost tuning wouldn't something like LoRa via ie. unsloth on ie. GLM-4.7-Flash be the way to go?
nl•1w ago
Note that this is also a super interesting technique for specialising consumer facing apps like Lovable that need to generate code that matches your API very well.

It's also a great approach for building custom languages.

lrvick•1w ago
So this "open" system still requires you to use Claude to actually use it?
somebodythere•1w ago
No. You can point e.g. Opencode/Cline/Roo Code/Kilo Code at your inference endpoint. But CC has high install base and users are used to it, so it makes sense to target it.
d4rkp4ttern•1w ago
An interesting shift I’ve seen over the past few weeks, is we’re starting to refer to bare LLMs themselves as “agents”.

Used to be that agent = LLM + scaffold/harness/loop/whatever.

eudoxus•1w ago
I think some of the distinction here is that the more recent "bare LLMs" have been more purpose built, augmented with "agent" specific RL, and in general more fine tuned for the requirements of "agents". Things such as specific reasoning capabilities, tool calling, etc.

These all make the "bare LLMs" better suited to be used within the "agent" harness.

I think the more accurate term would be "agentic LLMs" instead of calling them "agents" outright. As to why its the case now, probably just human laziness and colloquialisms.

fassssst•1w ago
Yes, the post training is the special sauce.
cjonas•1w ago
My definition of agent has always been an LLM with "effectful" tools, run in a loop where the LLM gets to decide when the task is complete. In other words, an LLM with "agency".
d4rkp4ttern•1w ago
This is exactly how I think of it. An agent has three elements: intelligence (LLM), autonomy (loop) and tools to do anything interesting/useful.
bob1029•1w ago
GPT 5.2 in a simple while loop runs circles around most things right now. It was released barely a month ago and many developers have been on vacation/hibernating/etc. during this time.

I give it 3-4 more weeks before we start to hear about the death of agentic frameworks. Pointing GPT5+ at a powershell or C#/Python REPL is looking way more capable than wiring up a bunch of domain-specific tools. A code-based REPL is the ultimate tool. You only need one and you can force the model to always call it (100% chance of picking the right tool). The amount of integration work around Process.Start is approximately 10-15 minutes, even if you don't use AI assistance.

d4rkp4ttern•1w ago
Yes this “REPL/CLI is all you need” realization is exactly what’s behind the wild success of Claude Code and derivative CLI coding agents.