frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Ask HN: How do you find SOTA LLMs for a task?

1•throwaw12•7h ago
There are thousands of models at the moment available at Hugging Face. But whenever I need a model for specific task, I am struggling to find SOTA model, can you recommend me how to find it?

I am not ML practitioner, I just need models for my work, for example for coding, I know we can use Claude/Gemini models, but sometimes I want to compare them to SOTA open source, every week something better is coming and reading articles from month ago or finding LLM leaderboard for a specific task is difficult sometimes. I think some kind of model picker already exists, but don't know where

Comments

Oras•7h ago
I usually go to OpenRouter usage to learn that by category https://openrouter.ai/rankings

Scroll down to categories, and select from the dropdown on top right of the chart.

throwaw12•6h ago
that's nice addition to my tool set :) thanks!

but it seems mostly reflects proprietary models (because they are easier to serve)

incomingpain•7h ago
For open source, you're not going to see stats well online. Openhands + devstral doesnt touch the internet, so wont make it to many stats.

You can look at benchmarks.

https://livebench.ai/#/?Agentic+Coding=a

Keep scrolling until you see something your size. Deepseek R1 is nice, but 600B isnt running on my hardware. You'll also notice they arent doing everything. dominated by the Saas options.

https://huggingface.co/models

This is sorted by trending by default. This tends to help show interest but not necessarily the best.

throwaw12•6h ago
> Deepseek R1 is nice, but 600B isnt running on my hardware.

Yeah, this is my concern as well, usually top SOTA generic models are good at many tasks, but I can't test them quickly on my machine locally. Especially when seeing claims how 32B model is outperforming proprietary models in benchmarks, I really want to test it myself in my tasks, but after some time they are dropped from news/trends and difficult to find them

Thursday Is Durable Computing Day

1•jedberg•1m ago•0 comments

A Quick(ish) Introduction to Tuning Postgres

https://byteofdev.com/posts/tuning-postgres-intro/
2•AsyncBanana•1m ago•0 comments

Ask HN: Can we better use heat from data centers?

1•mclau157•1m ago•0 comments

Distribution Package vs. Import Package

https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/
1•Bluestein•3m ago•0 comments

Burning Man Festival Is Burning Through Cash

https://www.bloomberg.com/news/features/2025-07-22/burning-man-festival-struggles-to-make-enough-money
2•petethomas•6m ago•0 comments

MCK: Open-Source MongoDB Operator

https://github.com/mongodb/mongodb-kubernetes
1•mmoogle•7m ago•0 comments

ΜFork: A pure actor-based concurrent machine architecture with memory-safety an

https://ufork.org/
1•fanf2•9m ago•0 comments

Study: How American Consumers Are Using AI

https://www.joeyoungblood.com/artificial-intelligence/study-how-american-consumers-are-using-ai/
1•bhartzer•11m ago•0 comments

Why "How many tennis balls fit in a bus?" is a good interview question

https://medium.com/@orzel.jarek/how-many-tennis-balls-fit-in-a-bus-why-weird-interview-questions-sometimes-make-sense-ec24f6aeec4e
2•saucetest•11m ago•0 comments

Amazon buys Bee AI wearable that listens to everything you say

https://www.theverge.com/news/711621/amazon-bee-ai-wearable-acquisition
2•swyx•13m ago•0 comments

Inheritance over Composition, Sometimes

https://death.andgravity.com/over-composition
1•BerislavLopac•16m ago•0 comments

Show HN: Featurevisor v2.0 – declarative feature flags management with Git

https://featurevisor.com/?v2
2•fahad19•17m ago•0 comments

Crowdfunding Success – Was it worth it?

https://atomic14.substack.com/p/crowdfunding-success-was-it-worth
1•iamflimflam1•18m ago•0 comments

Show HN: It's Like FIFA for Developers 1vs1 Code Battle

https://battlegpt.website
1•roozka10•20m ago•0 comments

Why everyone is probably wrong about AI

https://greyenlightenment.com/2025/07/08/dwarkesh-patel-on-agi-separating-ai-hype-from-reality/
1•paulpauper•22m ago•0 comments

Brave Browser Blocks Windows Recall

https://www.neowin.net/news/brave-browser-blocks-windows-feature-that-takes-screenshots-of-everything-you-do-on-your-pc/
1•bundie•23m ago•0 comments

Taiwan is creating an offshore wind industry to fuel its semiconductor factories

https://restofworld.org/2025/taiwan-offshore-wind-farms-chip-factory-jobs/
1•PaulHoule•23m ago•0 comments

Ask HN: How have you optimized your company/ work?

2•Xx_crazy420_xX•25m ago•0 comments

Show HN: Like Lusha/Apollo, but with 250M deliverable emails

https://hivepoint.io/
1•Simonekis•25m ago•0 comments

Lost in the Wilderness: Ansel Adams in the 1960s

https://ucrarts.ucr.edu/exhibitions/lost-in-the-wilderness-ansel-adams-in-the-1960s/
1•lapetitejort•25m ago•1 comments

Use AI to Create Professionally Bound Coloring Books

https://coloring.app
1•presson•27m ago•1 comments

Integrate Email Notifications with RustMailer: A Must-Have for Developers

https://www.indiehackers.com/post/integrate-email-notifications-with-rustmailer-a-must-have-for-developers-MVHtzOfA3nGgtyoBewao
2•rustmailer•27m ago•0 comments

Conspiracy theorists don't realize they're on the fringe

https://arstechnica.com/science/2025/07/conspiracy-theorists-think-their-views-are-mainstream/
4•nabla9•28m ago•0 comments

Late Ozzy Osbourne's Short, Sweet Stint in Video Games

https://kotaku.com/ozzy-osbourne-death-guitar-hero-brutal-legend-actor-1851786753
2•Bluestein•29m ago•0 comments

PMs Were Vibe Coders All Along

https://justinpaulson.com/articles/pms-were-vibe-coders-all-along
2•justinpaulson•33m ago•0 comments

Top questions every recruiter is asking in 2025: Answered with AI sourcing

https://www.sourcegeek.com/en/news/top-15-questions-every-recruiter-is-asking-in-2025-answered-with-smart-ai-sourcing
1•nielsberkhout•34m ago•0 comments

NonRAID – fork of unRAID array kernel module

https://github.com/qvr/nonraid
13•qvr•34m ago•1 comments

Zetamax – Zetamac clone with progress tracking and modern UI

https://www.zetamax.xyz/
1•rahimnathwani•36m ago•0 comments

First Users

1•onetimeshowapp•36m ago•2 comments

Building a fuzzing testing framework with Locust and Docker

https://lucas-montes.com/blog/python-performance/create-deterministic-simulation-testing-framework/
1•lluc23•37m ago•1 comments