frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Django-rclone: Database and media backups for Django, powered by rclone

https://github.com/kjnez/django-rclone
1•cui•1m ago•1 comments

NY lawmakers proposed statewide data center moratorium

https://www.niagara-gazette.com/news/local_news/ny-lawmakers-proposed-statewide-data-center-morat...
1•geox•3m ago•0 comments

OpenClaw AI chatbots are running amok – these scientists are listening in

https://www.nature.com/articles/d41586-026-00370-w
2•EA-3167•3m ago•0 comments

Show HN: AI agent forgets user preferences every session. This fixes it

https://www.pref0.com/
4•fliellerjulian•5m ago•0 comments

Introduce the Vouch/Denouncement Contribution Model

https://github.com/ghostty-org/ghostty/pull/10559
2•DustinEchoes•7m ago•0 comments

Show HN: SSHcode – Always-On Claude Code/OpenCode over Tailscale and Hetzner

https://github.com/sultanvaliyev/sshcode
1•sultanvaliyev•7m ago•0 comments

Microsoft appointed a quality czar. He has no direct reports and no budget

https://jpcaparas.medium.com/microsoft-appointed-a-quality-czar-he-has-no-direct-reports-and-no-b...
1•RickJWagner•9m ago•0 comments

Multi-agent coordination on Claude Code: 8 production pain points and patterns

https://gist.github.com/sigalovskinick/6cc1cef061f76b7edd198e0ebc863397
1•nikolasi•10m ago•0 comments

Washington Post CEO Will Lewis Steps Down After Stormy Tenure

https://www.nytimes.com/2026/02/07/technology/washington-post-will-lewis.html
4•jbegley•10m ago•0 comments

DevXT – Building the Future with AI That Acts

https://devxt.com
2•superpecmuscles•11m ago•4 comments

A Minimal OpenClaw Built with the OpenCode SDK

https://github.com/CefBoud/MonClaw
1•cefboud•11m ago•0 comments

The silent death of Good Code

https://amit.prasad.me/blog/rip-good-code
2•amitprasad•12m ago•0 comments

The Internal Negotiation You Have When Your Heart Rate Gets Uncomfortable

https://www.vo2maxpro.com/blog/internal-negotiation-heart-rate
1•GoodluckH•13m ago•0 comments

Show HN: Glance – Fast CSV inspection for the terminal (SIMD-accelerated)

https://github.com/AveryClapp/glance
2•AveryClapp•14m ago•0 comments

Busy for the Next Fifty to Sixty Bud

https://pestlemortar.substack.com/p/busy-for-the-next-fifty-to-sixty-had-all-my-money-in-bitcoin-...
1•mithradiumn•15m ago•0 comments

Imperative

https://pestlemortar.substack.com/p/imperative
1•mithradiumn•16m ago•0 comments

Show HN: I decomposed 87 tasks to find where AI agents structurally collapse

https://github.com/XxCotHGxX/Instruction_Entropy
1•XxCotHGxX•20m ago•1 comments

I went back to Linux and it was a mistake

https://www.theverge.com/report/875077/linux-was-a-mistake
3•timpera•21m ago•1 comments

Octrafic – open-source AI-assisted API testing from the CLI

https://github.com/Octrafic/octrafic-cli
1•mbadyl•22m ago•1 comments

US Accuses China of Secret Nuclear Testing

https://www.reuters.com/world/china/trump-has-been-clear-wanting-new-nuclear-arms-control-treaty-...
2•jandrewrogers•23m ago•1 comments

Peacock. A New Programming Language

2•hashhooshy•28m ago•1 comments

A postcard arrived: 'If you're reading this I'm dead, and I really liked you'

https://www.washingtonpost.com/lifestyle/2026/02/07/postcard-death-teacher-glickman/
3•bookofjoe•29m ago•1 comments

What to know about the software selloff

https://www.morningstar.com/markets/what-know-about-software-stock-selloff
2•RickJWagner•33m ago•0 comments

Show HN: Syntux – generative UI for websites, not agents

https://www.getsyntux.com/
3•Goose78•33m ago•0 comments

Microsoft appointed a quality czar. He has no direct reports and no budget

https://jpcaparas.medium.com/ab75cef97954
2•birdculture•34m ago•0 comments

AI overlay that reads anything on your screen (invisible to screen capture)

https://lowlighter.app/
1•andylytic•35m ago•1 comments

Show HN: Seafloor, be up and running with OpenClaw in 20 seconds

https://seafloor.bot/
1•k0mplex•35m ago•0 comments

Tesla turbine-inspired structure generates electricity using compressed air

https://techxplore.com/news/2026-01-tesla-turbine-generates-electricity-compressed.html
2•PaulHoule•37m ago•0 comments

State Department deleting 17 years of tweets (2009-2025); preservation needed

https://www.npr.org/2026/02/07/nx-s1-5704785/state-department-trump-posts-x
5•sleazylice•37m ago•1 comments

Learning to code, or building side projects with AI help, this one's for you

https://codeslick.dev/learn
1•vitorlourenco•37m ago•0 comments
Open in hackernews

Show HN: Randomly switching between LMs at every step boosts SWE-bench score

https://www.swebench.com/SWE-bench/blog/2025/08/19/mini-roulette/
5•lieret•5mo ago
What if your agent uses a different LM at every turn? We let mini-SWE-agent randomly switch between GPT-5 and Sonnet 4 and it scored higher on SWE-bench than with either model separately.

GPT-5 by itself gets 65.0%, Sonnet 4 64.8%, but randomly switching at every step gets us 67.2%

This result came pretty surprising to us. There's a few more experiments in the blog post.

Comments

NitpickLawyer•5mo ago
This is really cool! And even cooler that it's tested on their mini agent harness (only has access to "terminal", no other tools) because this implies it's "raw model power" rather than "software glue".

My speculation: this is an "emergent" capability out of good / scalable / "solved" RL. Both Anthropic and oAI seem to have made huge advances in RL. (xAI as well, but haven't yet released their coding model, so we'll see if that continues). In contrast to other RLd models out there (i.e. the deepseeks, the qwens, etc) that score really well on tasks similar to those in benchmarks, both claude4 and gpt5 seem to have "learned" what agentic means at a different level. They can be guided through tasks, asked to do one particular subpart of a task, or a particular approach, etc. And they do it well. The other implementations feel "stubborn". Can't explain it better.

It will be interesting to see what Gemini3 will bring about. goog / deepmind are experts at RL, and gemini2.5 is a bit too old now, so curious to see what they can deliver on this front. My guess is that we'll see the same kind of "it gets it" after scaled RL.

One note, that I've made after using gpt5 for a bit is that it seems to have this "gettheritis" with solving tasks. It wants to solve them so bad, that sometimes it forgets the plan, or rushes through step 5 after solving 1-4 pretty throughly. Might be prompting as well, maybe those havent yet caught up.