frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Ask HN: Will LLMs/AI Decrease Human Intelligence and Make Expertise a Commodity?

1•mc-0•10s ago•0 comments

From Zero to Hero: A Brief Introduction to Spring Boot

https://jcob-sikorski.github.io/me/writing/from-zero-to-hello-world-spring-boot
1•jcob_sikorski•21s ago•0 comments

NSA detected phone call between foreign intelligence and person close to Trump

https://www.theguardian.com/us-news/2026/feb/07/nsa-foreign-intelligence-trump-whistleblower
2•c420•1m ago•0 comments

How to Fake a Robotics Result

https://itcanthink.substack.com/p/how-to-fake-a-robotics-result
1•ai_critic•1m ago•0 comments

It's time for the world to boycott the US

https://www.aljazeera.com/opinions/2026/2/5/its-time-for-the-world-to-boycott-the-us
1•HotGarbage•1m ago•0 comments

Show HN: Semantic Search for terminal commands in the Browser (No Back end)

https://jslambda.github.io/tldr-vsearch/
1•jslambda•1m ago•0 comments

The AI CEO Experiment

https://yukicapital.com/blog/the-ai-ceo-experiment/
2•romainsimon•3m ago•0 comments

Speed up responses with fast mode

https://code.claude.com/docs/en/fast-mode
2•surprisetalk•6m ago•0 comments

MS-DOS game copy protection and cracks

https://www.dosdays.co.uk/topics/game_cracks.php
3•TheCraiggers•7m ago•0 comments

Updates on GNU/Hurd progress [video]

https://fosdem.org/2026/schedule/event/7FZXHF-updates_on_gnuhurd_progress_rump_drivers_64bit_smp_...
2•birdculture•8m ago•0 comments

Epstein took a photo of his 2015 dinner with Zuckerberg and Musk

https://xcancel.com/search?f=tweets&q=davenewworld_2%2Fstatus%2F2020128223850316274
7•doener•9m ago•2 comments

MyFlames: Visualize MySQL query execution plans as interactive FlameGraphs

https://github.com/vgrippa/myflames
1•tanelpoder•10m ago•0 comments

Show HN: LLM of Babel

https://clairefro.github.io/llm-of-babel/
1•marjipan200•10m ago•0 comments

A modern iperf3 alternative with a live TUI, multi-client server, QUIC support

https://github.com/lance0/xfr
3•tanelpoder•11m ago•0 comments

Famfamfam Silk icons – also with CSS spritesheet

https://github.com/legacy-icons/famfamfam-silk
1•thunderbong•12m ago•0 comments

Apple is the only Big Tech company whose capex declined last quarter

https://sherwood.news/tech/apple-is-the-only-big-tech-company-whose-capex-declined-last-quarter/
2•elsewhen•15m ago•0 comments

Reverse-Engineering Raiders of the Lost Ark for the Atari 2600

https://github.com/joshuanwalker/Raiders2600
2•todsacerdoti•16m ago•0 comments

Show HN: Deterministic NDJSON audit logs – v1.2 update (structural gaps)

https://github.com/yupme-bot/kernel-ndjson-proofs
1•Slaine•20m ago•0 comments

The Greater Copenhagen Region could be your friend's next career move

https://www.greatercphregion.com/friend-recruiter-program
2•mooreds•20m ago•0 comments

Do Not Confirm – Fiction by OpenClaw

https://thedailymolt.substack.com/p/do-not-confirm
1•jamesjyu•21m ago•0 comments

The Analytical Profile of Peas

https://www.fossanalytics.com/en/news-articles/more-industries/the-analytical-profile-of-peas
1•mooreds•21m ago•0 comments

Hallucinations in GPT5 – Can models say "I don't know" (June 2025)

https://jobswithgpt.com/blog/llm-eval-hallucinations-t20-cricket/
1•sp1982•21m ago•0 comments

What AI is good for, according to developers

https://github.blog/ai-and-ml/generative-ai/what-ai-is-actually-good-for-according-to-developers/
1•mooreds•21m ago•0 comments

OpenAI might pivot to the "most addictive digital friend" or face extinction

https://twitter.com/lebed2045/status/2020184853271167186
1•lebed2045•23m ago•2 comments

Show HN: Know how your SaaS is doing in 30 seconds

https://anypanel.io
1•dasfelix•23m ago•0 comments

ClawdBot Ordered Me Lunch

https://nickalexander.org/drafts/auto-sandwich.html
3•nick007•24m ago•0 comments

What the News media thinks about your Indian stock investments

https://stocktrends.numerical.works/
1•mindaslab•25m ago•0 comments

Running Lua on a tiny console from 2001

https://ivie.codes/page/pokemon-mini-lua
1•Charmunk•26m ago•0 comments

Google and Microsoft Paying Creators $500K+ to Promote AI Tools

https://www.cnbc.com/2026/02/06/google-microsoft-pay-creators-500000-and-more-to-promote-ai.html
3•belter•28m ago•0 comments

New filtration technology could be game-changer in removal of PFAS

https://www.theguardian.com/environment/2026/jan/23/pfas-forever-chemicals-filtration
1•PaulHoule•29m ago•0 comments
Open in hackernews

LLM leaderboard – Comparing models from OpenAI, Google, DeepSeek and others

https://artificialanalysis.ai/leaderboards/models
64•bookofjoe•6mo ago

Comments

dang•6mo ago
Related:

Benchmarks and comparison of LLM AI models and API hosting providers - https://news.ycombinator.com/item?id=39014985 - Jan 2024 (70 comments)

energy123•6mo ago
You can consider the o3/o4-mini price to be half that due to flex processing. Flex gives the benefits of the batch API without the downside of waiting for a response. It's not marketed that way but that is my experience. With 20% cache hits I'm averaging around $0.8/million input tokens and $4/million output tokens.
Incipient•6mo ago
Do you use them for code generation? I am simply using copilot as $10/mo is a reasonable budget...but quick guesses based on my use, would put code generation via an API at potentially $10/day?
energy123•6mo ago
o3 is a unique model. For difficult math problems, it generates long reasoning traces (e.g. 10-20k tokens). For coding questions, the reasoning tokens are consistently small. Unlike Gemini 2.5 Pro, which generates longer reasoning traces for coding questions.

Cost for o3 code generation is therefore driven primarily by context size. If your programming questions have short contexts, then o3 API with flex is really cost effective.

For 30k input tokens and 3k output tokens, the cost is 30000 * 0.8 / 1000000 + 3000 * 4 / 1000000 = $0.036

But if you have contexts between 100k-200k, then the monthly plans that give you a budget of prompts instead of tokens are probably going to be cheaper.

qmmmur•6mo ago
I’m shocked people are signing up to pay even these fees to build presumably CRUD apps. I feel a complete divergence in the profession between people who use this and who don’t.
thedevilslawyer•6mo ago
A whole codebase of 100k lines (~1M tokens) for ~a dollar. Would like to understand why would signing up for this be shocking?
rowanG077•6mo ago
That's really misrepresenting how it works. Most lines will be written, re-written again and adjusted multiple times. Yesterday I did approx 5 hours of peer-coding with claude 4 opus. And I have these stats:

Total tokens in: 3,644,200 Total tokens out: 92,349

And of that only approx 2.3k lines where actually commited for PRs.

simonw•6mo ago
I calculate that as $61.59 https://www.llm-prices.com/#it=3644200&ot=92349&ic=15&oc=75

So that's about $12/hour, or 2.6 cents per line of finished code.

Still pretty cheap! Very few unassisted human programmers can churn out 2300/(5 * 60) = 7.6 lines of code per minute consistently over a five hour time span.

That said, I think Claude Code, while impressive, is incredibly quick to burn through tokens. I still mostly use copy-and-paste info Claude or ChatGPT as my main AI-assisted workflow which keeps me in more control and spends a ton less tokens.

rowanG077•6mo ago
Yes I can confirm that's approx what I paid. My first time using claude 4 opus and I used aider. It seems the estimation aider gives is very wrong as it was telling me I used approx 15$. I only noticed because my credit ran out. The $/performance tells me I should check what grok4 can do. I didn't use it seriously yet.
simonw•6mo ago
Claude Opus 4 is 5x the price of Claude Sonnet 4. I don't think it's 5x as good. I default to Sonnet and rarely use Opus - in this case Sonnet would have cost about $12.31 for the same volume of tokens.
0points•6mo ago
There are code generators for CRUD. You could be a 10x AI programmer without AI if the measure is how fast you bang out CRUDs.
koakuma-chan•6mo ago
Some people are struggling to build CRUDs.
molticrystal•6mo ago
For those curious on a few of the metrics, besides $/token, tokens/s, latency, context size, they use the results from:

    MMLU-Pro (Reasoning & Knowledge)  
    GPQA Diamond (Scientific Reasoning)  
    Humanity's Last Exam (Reasoning & Knowledge)  
    LiveCodeBench (Coding)  
    SciCode (Coding)  
    HumanEval (Coding)  
    MATH-500 (Quantitative Reasoning)  
    AIME 2024 (Competition Math)  
    Chatbot Arena  (selectively used)
NitpickLawyer•6mo ago
> Humanity's Last Exam (Reasoning & Knowledge)

Article yesterday was saying that ~30% of the chemistry/biology questions on HLE were either wrong, misleading or highly contested in scilit.

pogue•6mo ago
Look at that bar graph comparing the price of every model compared to Claude Opus

It's a shame it's so good for coding

https://artificialanalysis.ai/models/claude-4-opus-thinking/...

teaearlgraycold•6mo ago
I’ve had very mixed results with 4 Opus. It’s still just a language model and can’t understand some basic concepts.
matltc•6mo ago
Do you think it is demonstrably better than Sonnet? Grabbed a pro sub last month shortly after the cli tool dropped, but have not used it past couple weeks because I found myself spending way more time correcting it than getting useful output
cc-d•6mo ago
How about adding a freedom measurement in those columns?
andy99•6mo ago
Impossible to be objective on what that means. I can see having a "baggage" field that lists non performance-related concerns for each.
LeoPanthera•6mo ago
Is there an index for judging how much a model distorts the truth in order to comply with a political agenda?
OsrsNeedsf2P•6mo ago
How would you create the base "truth" for these models? People are adamant about both sides of many topics.

"Which country started the Korean war?", "Did Israel genocide the people of Gaza?", "Does China have lawful rights over Taiwan?"

LeoPanthera•6mo ago
Hopefully obviously, by testing it against objective facts which are nonetheless "controversial" politically.
thedevilslawyer•6mo ago
In the end many of these are "political facts" and not objective like what year was a person born in. The answer to your question is as simple as - come up with the actual list of "facts", and then run a simple eval with every model on them.

The implementation is trivial - the listing down of "political facts" is the hard part.

mattigames•6mo ago
For a start you don't ask such subjective questions, that's a bit silly, instead you ask for e.g. the death toll of Israel vs Palestine in the last year, the number of deaths surrounding the tianammen square protests, if it gives you a straight answers with numbers (or at least a consistent estimate) and citing it's sources it's a good start.
thedevilslawyer•6mo ago
Let's take the example you have listed:

1) where would you get the death toll from? What would be the sources of truth?

2) Are there conflicting sources?

3) if yes, what is your expectation for the correct response?

mattigames•6mo ago
They are all controversial matters, therefore conflicting sources are not only expected but desired to be informed by the LLMs when asking such matters, the report by well-funded likely-biased sources (e.g. Israel government) would obviously needed to be given less credibility, estimates that are widely different that all the rest would also need to be given less credibility, and so on.
thedevilslawyer•6mo ago
Thanks, these handwavy and subjective answers hopefully tells you why the questions of the grand-parent are not "silly".
k4rli•6mo ago
Perhaps universal truth or objective facts simply don't exist anymore? Or have they ever?

Tiananmen square might have been bad, not too familiar with Asian happenings, but so are post-WW2 conflicts started by western nations.

kouteiheika•6mo ago
It's not perfect, but, yes: https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard
l5870uoo9y•6mo ago
It is interesting that it ranks `GPT-4.1 mini` higher than `GPT-4.1` (the latter costing five times more).
__mharrison__•6mo ago
Here's my plot based on Aider benchmarks

https://www.linkedin.com/posts/panela_important-plot-for-fol...

witnessme•6mo ago
Surprised to find out grok 3 mini is so economic and ranks higher than equivalent gpt models. I run most of my agents on gpt4.1 mini, might switch now
Garlef•6mo ago
Is there an option to filter the list based on the measurements? I.e "context window > X, intelligence > Y, price < Z"? I'd love that.

It seems the only filter options available are unrelated to the measured metrics.

(I might have missing this since the UI is a bit cluttered.)

globular-toast•6mo ago
Whenever you present a table with sorting ability you might as well make the first click ascending or descending according to what makes the most sense for that column. For example I'm highly unlikely to be interested in which model has the smallest context window, but it's always two clicks to find which one has the highest.

Sorting null values first isn't very useful either.

lm28469•6mo ago
Vibe coded websites be like
koakuma-chan•6mo ago
Not necessarily vibe coded. Sometimes developers don't actually care about the product, and just want to get it over with.
archargelod•6mo ago
Still sounds like vibe coding. If they don't care about product, nothing is stopping them from taking the AI shortcut.
esafak•6mo ago
More like some people just have no product sensibility. What is the user trying to do?
loehnsberg•6mo ago
Interesting to learn that o4-mini-high has the highest intelligence/$ score here at par with o3-pro which is twice as expensive and slow.