frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

The Field Guide to Design Futures

https://designfutures.guide/
1•andyjohnson0•37s ago•0 comments

The Other Leverage in Software and AI

https://tomtunguz.com/the-other-leverage-in-software-and-ai/
1•gmays•2m ago•0 comments

AUR malware scanner written in Rust

https://github.com/Sohimaster/traur
2•sohimaster•4m ago•0 comments

Free FFmpeg API [video]

https://www.youtube.com/watch?v=6RAuSVa4MLI
2•harshalone•4m ago•1 comments

Are AI agents ready for the workplace? A new benchmark raises doubts

https://techcrunch.com/2026/01/22/are-ai-agents-ready-for-the-workplace-a-new-benchmark-raises-do...
2•PaulHoule•9m ago•0 comments

Show HN: AI Watermark and Stego Scanner

https://ulrischa.github.io/AIWatermarkDetector/
1•ulrischa•10m ago•0 comments

Clarity vs. complexity: the invisible work of subtraction

https://www.alexscamp.com/p/clarity-vs-complexity-the-invisible
1•dovhyi•11m ago•0 comments

Solid-State Freezer Needs No Refrigerants

https://spectrum.ieee.org/subzero-elastocaloric-cooling
1•Brajeshwar•11m ago•0 comments

Ask HN: Will LLMs/AI Decrease Human Intelligence and Make Expertise a Commodity?

1•mc-0•12m ago•1 comments

From Zero to Hero: A Brief Introduction to Spring Boot

https://jcob-sikorski.github.io/me/writing/from-zero-to-hello-world-spring-boot
1•jcob_sikorski•13m ago•0 comments

NSA detected phone call between foreign intelligence and person close to Trump

https://www.theguardian.com/us-news/2026/feb/07/nsa-foreign-intelligence-trump-whistleblower
6•c420•13m ago•0 comments

How to Fake a Robotics Result

https://itcanthink.substack.com/p/how-to-fake-a-robotics-result
1•ai_critic•14m ago•0 comments

It's time for the world to boycott the US

https://www.aljazeera.com/opinions/2026/2/5/its-time-for-the-world-to-boycott-the-us
3•HotGarbage•14m ago•0 comments

Show HN: Semantic Search for terminal commands in the Browser (No Back end)

https://jslambda.github.io/tldr-vsearch/
1•jslambda•14m ago•1 comments

The AI CEO Experiment

https://yukicapital.com/blog/the-ai-ceo-experiment/
2•romainsimon•16m ago•0 comments

Speed up responses with fast mode

https://code.claude.com/docs/en/fast-mode
3•surprisetalk•19m ago•0 comments

MS-DOS game copy protection and cracks

https://www.dosdays.co.uk/topics/game_cracks.php
3•TheCraiggers•20m ago•0 comments

Updates on GNU/Hurd progress [video]

https://fosdem.org/2026/schedule/event/7FZXHF-updates_on_gnuhurd_progress_rump_drivers_64bit_smp_...
2•birdculture•21m ago•0 comments

Epstein took a photo of his 2015 dinner with Zuckerberg and Musk

https://xcancel.com/search?f=tweets&q=davenewworld_2%2Fstatus%2F2020128223850316274
10•doener•21m ago•2 comments

MyFlames: View MySQL execution plans as interactive FlameGraphs and BarCharts

https://github.com/vgrippa/myflames
1•tanelpoder•23m ago•0 comments

Show HN: LLM of Babel

https://clairefro.github.io/llm-of-babel/
1•marjipan200•23m ago•0 comments

A modern iperf3 alternative with a live TUI, multi-client server, QUIC support

https://github.com/lance0/xfr
3•tanelpoder•24m ago•0 comments

Famfamfam Silk icons – also with CSS spritesheet

https://github.com/legacy-icons/famfamfam-silk
1•thunderbong•25m ago•0 comments

Apple is the only Big Tech company whose capex declined last quarter

https://sherwood.news/tech/apple-is-the-only-big-tech-company-whose-capex-declined-last-quarter/
2•elsewhen•28m ago•0 comments

Reverse-Engineering Raiders of the Lost Ark for the Atari 2600

https://github.com/joshuanwalker/Raiders2600
2•todsacerdoti•29m ago•0 comments

Show HN: Deterministic NDJSON audit logs – v1.2 update (structural gaps)

https://github.com/yupme-bot/kernel-ndjson-proofs
1•Slaine•33m ago•0 comments

The Greater Copenhagen Region could be your friend's next career move

https://www.greatercphregion.com/friend-recruiter-program
2•mooreds•33m ago•0 comments

Do Not Confirm – Fiction by OpenClaw

https://thedailymolt.substack.com/p/do-not-confirm
1•jamesjyu•34m ago•0 comments

The Analytical Profile of Peas

https://www.fossanalytics.com/en/news-articles/more-industries/the-analytical-profile-of-peas
1•mooreds•34m ago•0 comments

Hallucinations in GPT5 – Can models say "I don't know" (June 2025)

https://jobswithgpt.com/blog/llm-eval-hallucinations-t20-cricket/
1•sp1982•34m ago•0 comments
Open in hackernews

GPT5 is worse than 4.1-mini for text and worse than Sonnet 4 for coding

10•hitradostava•6mo ago
It seems that OpenAI have got the PR machine working amazingly. The Cursor CEO said it's the best, as did Simon Willison (https://simonwillison.net/2025/Aug/7/gpt-5/).

But I've found it terrible. For coding (in Cursor), it's slow, fails with tool calls often (no MCP just stock Cursor tools) and stored some new application state in globalThis - something that no model has ever attempted to do in over a year of very heavy Cursor / Claude Code use).

For a summarization/insights API that I work on, it was way worse than gpt-4.1-mini. I tried both mini and full gpt5, with different reasoning settings. It didn't follow instructions, and output was worse across all my evals, even after heavy prompt adjustment. I did a lot of sampling and the results were objectively bad.

Am I the only one? Has anyone seen actual real-world benefits of GPT-5 vs other models?

Comments

cranberryturkey•6mo ago
it solved a huge bug i've been struggling with.
hitradostava•6mo ago
Had Sonnet 4 not been able to?
revskill•6mo ago
Sure.
cranberryturkey•6mo ago
No, it kept going in circles....spent like 3 weeks trying to fix it. Got access to gpt5 yesterday and all major bugs are resolved.
wdb•6mo ago
Interesting I tried it to fix some unit tests that were failing but made the problem worse. Sonnet was able to fix the failing unit tests and the new problems introduced by GPT5. I used Claude Code for Sonnet and Cursor Agent for GPT-5. Maybe Cursor Agent is just bad?
cranberryturkey•6mo ago
I don't know I use roocode.
gaws•6mo ago
What was the bug?
tim_angus•6mo ago
And yet the media keeps using the term "exponential improvement"...
8thcross•6mo ago
I tried it with cursor-agent, their cli - and it generated better code than expected. YMMV. It was more thoughtful and strategic than the other frontier models.
hitradostava•6mo ago
Planning was ok for me, much slower than Sonnet, but comparable. But some of the code it produces is just terrible. Maybe the routing layer sends some code-generation tasks to a much smaller model- but then I don't get why it's so slow!

The only thing that seems better to me is the parallel tool calling.

canerdogan•6mo ago
GPT-5 isn’t really a brand-new model in the way people think. From what I’ve seen, the goal was more about reducing costs and unifying the interface than releasing a totally different architecture. Under the hood it is still routing to models we already know, just picking what it thinks will give the “best” result for the request.

That can be fine for a lot of general use cases, but if you’re working in specific domains like coding agents or high-precision summarization, that routing can actually make results worse compared to sticking with a model you know performs well for your workload.

hitradostava•6mo ago
Thats not what OpenAI are claiming. They are claiming that there are two new flagship models and a router that routes between them.

"GPT‑5 is a unified system with a smart, efficient model that answers most questions, a deeper reasoning model (GPT‑5 thinking) for harder problems, and a real‑time router that quickly decides which to use"

softwaredoug•6mo ago
I feel like they should have let GPT 5 overlap in experimental mode for a month or so. It took a while to get the kinks out of GPT-4 until people trusted it. Just switching it on is really hurting their brand.

The fact they didn’t do this makes me think their finances are in very bad shape.

hitradostava•6mo ago
I agree, I just don't understand how the team at Cursor can say this:

“GPT-5 is the smartest coding model we've used. Our team has found GPT-5 to be remarkably intelligent, easy to steer, and even to have a personality we haven’t seen in any other model. It not only catches tricky, deeply-hidden bugs but can also run long, multi-turn background agents to see complex tasks through to the finish—the kinds of problems that used to leave other models stuck. It’s become our daily driver for everything from scoping and planning PRs to completing end-to-end builds.”

The cynic in me thinks that Cursor had to give positive PR in order to secure better pricing...

cellis•6mo ago
I have not found it to be better than Sonnet. I wrote this on another thread:

Claude Code certainly not as easy to engineer with, though it is less expensive. For instance the @feature isn’t as robust as cursors ime. Also no shift+enter is quite a pain. Linting doesn’t “just work”, cursor with Claude 4.0 max is really thorough, I think even better than GPT-5. Not that Sonnet is better but that whatever “ensemble” of models cursor uses with sonnet seems to both adhere and tool call better than with GPT-5. GPT-5 often says what it will do and then says “say go and I’ll go” or says “you should run command x”, but doesn’t just DO it. Also for bug fixes in difficult codebases nothing beats Gemini 2.5 pro

blurbleblurble•6mo ago
Really struggling with GPT-5 in all the clients I've used (currently using claude code router). This is the default openrouter endpoint. It's opaque, slow, ridiculously terse, botches basic tool calls and edits, pauses constantly even when I ask it not to. It does seem "sharp" at analytical tasks but it's just clunky to work with.

I'll try with cursor CLI but am hesitant to ditch claude code.