frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Ask HN: Should I migrate off of Brave Browser?

1•dcreater•3m ago•0 comments

Chunking for Code Search using Chroma and Tree-sitter [video]

https://www.youtube.com/watch?v=Jw-4oC5HtK4
1•tjkrusinski•11m ago•1 comments

The issue of anti-cheat on Linux

https://tulach.cc/the-issue-of-anti-cheat-on-linux/
3•todsacerdoti•11m ago•0 comments

DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization

https://arxiv.org/abs/2508.14460
1•simonpure•13m ago•0 comments

Why OS Yamato Lets Your Data Fade Away

https://github.com/osyamato/os-yamato
1•tsuyoshi_k•14m ago•4 comments

MAGA's March Toward a Command Economy

https://insights.som.yale.edu/insights/magas-march-toward-command-economy
2•cocacola1•19m ago•0 comments

Control Shopping Cart Wheels with Your Phone

https://www.begaydocrime.com/
1•mystraline•22m ago•0 comments

Can I Travel Cheaply?

https://traveldiscountsite.com/
1•mattysue•25m ago•0 comments

The classical key to the AI revolution

https://engelsbergideas.com/essays/the-classical-key-to-the-ai-revolution/
1•walterbell•26m ago•0 comments

Nisar will scan nearly all of Earth's land and ice surface twice every 12 days

https://gizmodo.com/this-newly-launched-satellite-just-bloomed-a-record-breaking-antenna-in-orbit-2000644290
2•gmays•30m ago•0 comments

A Jupyter widget from a TypeScript React component and styled with tailwind

https://www.nitro.bio/blog/widgets-dev
2•ninjha01•34m ago•1 comments

Israeli army database suggests at least 83% of Gaza dead were civilians

https://www.972mag.com/israeli-intelligence-database-83-percent-civilians-militants/
1•cramsession•37m ago•0 comments

When a Bank Fails, the Public Pays. When Software Fails, Nobody Does

https://substack.com/home/post/p-171127702
2•KrishinAsnani•38m ago•0 comments

NASA's Juno Mission Leaves Legacy of Science at Jupiter

https://www.scientificamerican.com/article/how-nasas-juno-probe-changed-everything-we-know-about-jupiter/
1•apress•38m ago•0 comments

Utopia, a clean, free serif font originally designed by Adobe

https://bhushan-mohanraj.github.io/utopia/
1•bhushanmohanraj•40m ago•0 comments

The AI Doomers Are Getting Doomier

https://www.theatlantic.com/technology/archive/2025/08/ai-doomers-chatbots-resurgence/683952/
1•joegibbs•41m ago•2 comments

Limit vs. Style

https://vibe.des.io/limit-vs-style/
1•desio•44m ago•0 comments

Google scores six-year Meta cloud deal worth over $10B

https://www.cnbc.com/2025/08/21/google-scores-six-year-meta-cloud-deal-worth-over-10-billion.html
3•herpderperator•46m ago•0 comments

Claude AI Nuked My Git Repo

https://geextor.com/2025/08/21/how-i-handed-an-ai-the-keys-to-my-repo-and-it-nuked-everything-instead/
2•randomnumber314•53m ago•1 comments

Y Combinator backs Epic in Apple appeal, calls App Store fee a tax on innovation

https://9to5mac.com/2025/08/21/y-combinator-backs-epic-in-apple-appeal-calls-app-store-fee-a-tax-on-innovation/
6•layer8•55m ago•0 comments

Herdling

https://herdling.game/
1•Bogdanp•57m ago•0 comments

Ask HN: Non-Smart TV Recommendations?

5•behnamoh•1h ago•6 comments

PostgreSQL's explain analyze made readable

https://explain.depesz.com/
2•uonr•1h ago•0 comments

Prediction of Bearing Layer Depth Using Machine Learning Algorithms

https://www.mdpi.com/2504-4990/7/3/69
1•PaulHoule•1h ago•0 comments

Harper Evolves

https://elijahpotter.dev/articles/harper_evolves
1•zdw•1h ago•0 comments

My love for Bitcoin is like the eternal love of people

1•Mriasatoshi•1h ago•0 comments

Three more species of giraffe than previously thought, scientists say

https://www.bbc.co.uk/news/articles/c2l7wxpxn0eo
2•FridayoLeary•1h ago•0 comments

Staff Cuts and Turmoil Hit the CFTC While the Crypto It Oversees Booms

https://www.bloomberg.com/news/features/2025-08-21/as-crypto-duties-loom-cftc-is-hit-by-staff-cuts-and-turmoil
3•petethomas•1h ago•0 comments

Y Combinator Files Brief Supporting Epic Games

https://www.macrumors.com/2025/08/21/y-combinator-epic-games-amicus-brief/
18•greenburger•1h ago•1 comments

German contest to live in depopulated Soviet-era city proves global hit

https://www.theguardian.com/world/2025/aug/21/german-contest-to-live-in-depopulated-soviet-era-city-proves-global-hit
5•c420•1h ago•0 comments
Open in hackernews

From GPT-4 to GPT-5: Measuring Progress in Medical Language Understanding [pdf]

https://www.fertrevino.com/docs/gpt5_medhelm.pdf
42•fertrevino•2h ago
I recently worked on running a thorough healthcare eval on GPT-5. The results show a (slight) regression in GPT-5 performance compared to GPT-4 era models.

I found this to be an interesting finding. Here are the detailed results: https://www.fertrevino.com/docs/gpt5_medhelm.pdf

Comments

woeirua•1h ago
Interesting topic, but I'm not opening a PDF from some random website. Post a summary of the paper or the key findings here first.
42lux•1h ago
It's hacker news. You can handle a PDF.
jeffbee•1h ago
I approve of this level of paranoia, but I would just like to know why PDFs are dangerous (reasonable) but HTML is not (inconsistent).
HeatrayEnjoyer•1h ago
PDFs can run almost anything and have an attack surface the size of Greece's coast.
zamadatix•1h ago
That's not very different than web browsers, but usually security concerned people just disable scripting functionality and such in their viewer (browser, pdf reader, rtf viewer, etc) instead of focusing on the file extension it comes in.

I think pdf.js even defaults to not running scripts in PDFs by default (would need to double check), if you want to view it in the browser's sandbox. Of course there's still always text rendering based security attacks and such but, again, there's nothing unique to that vs a webpage in a browser.

hypoxia•1h ago
Did you try it with high reasoning effort?
ares623•40m ago
Sorry, not directed at you specifically. But every time I see questions like this I can’t help but rephrase in my head:

“Did you try running it over and over until you got the results you wanted?”

SequoiaHope•36m ago
What you describe is a person selecting the best results, but if you can get better results one shot with that option enabled, it’s worth testing and reporting results.
ares623•34m ago
I get that. But then if that option doesn't help, what I've seen is that the next followup is inevitably "have you tried doing/prompting x instead of y"
theshackleford•13m ago
> I get that. But then if that option doesn't help, what I've seen is that the next followup is inevitably "have you tried doing/prompting x instead of y"

Maybe I’m misunderstanding, but it sounds like you’re framing a completely normal proces (try, fail, adjust) as if it’s unreasonable?

In reality, when something doesn’t work, it would seem to me that the obvious next step is to adapt and try again. This does not seem like a radical approach but instead seems to largely be how problem solving sort of works?

For example, when I was a kid trying to push start my motorcycle, it wouldn’t fire no matter what I did. Someone suggested a simple tweak, try a different gear. I did, and instantly the bike roared to life. What I was doing wasn’t wrong, it just needed a slight adjustment to get the result I was after.

furyofantares•8m ago
Something I've experienced with multiple new model releases is plugging them into my app makes my app worse. Then I do a bunch of work on prompts and now my app is better than ever. And it's not like the prompts are just better and make the old model work better too - usually the new prompts make the old model worse or there isn't any change.

So it makes sense to me that you should try until you get the results you want (or fail to do so). And it makes sense to ask people what they've tried. I haven't done the work yet to try this for gpt5 and am not that optimistic, but it is possible it will turn out this way again.

dcre•35m ago
This is not a good analogy because reasoning models are not choosing the best from a set of attempts based on knowledge of the correct answer. It really is more like what it sounds like: “did you think about it longer until you ruled out various doubts and became more confident?” Of course nobody knows quite why directing more computation in this way makes them better, and nobody seems to take the reasoning trace too seriously as a record of what is happening. But it is clear that it works!
aprilthird2021•15m ago
> Of course nobody knows quite why directing more computation in this way makes them better, and nobody seems to take the reasoning trace too seriously as a record of what is happening. But it is clear that it works!

One thing it's hard to wrap my head around is that we are giving more and more trust to something we don't understand with the assumption (often unchecked) that it just works. Basically your refrain is used to justify all sorts of odd setup of AIs, agents, etc.

xnx•1h ago
Have you looked at comparing to Google's foundation models or specialty medical models like MedGemma (https://developers.google.com/health-ai-developer-foundation...)?
username135•1h ago
I wonder what changed with the models that created regression?
teaearlgraycold•48m ago
Not sure but with each release it feels like they’re just wiping the dirt around and not actually cleaning.
aresant•1h ago
Feels like a mixed bag vs regression?

eg - GPT-5 beats GPT-4 on factual recall + reasoning (HeadQA, Medbullets, MedCalc).

But then slips on structured queries (EHRSQL), fairness (RaceBias), evidence QA (PubMedQA).

Hallucination resistance better but only modestly.

Latency seems uneven (maybe more testing?) faster on long tasks, slower on short ones.

woeirua•30m ago
Definitely seems like GPT5 is a very incremental improvement. Not what you’d expect if AGI were imminent.
TrainedMonkey•2m ago
GPT-5 feels like cost engineering. The model is incrementally better, but they are optimizing for least amount of compute. I am guessing investors love that.