frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Magical systems thinking

https://worksinprogress.co/issue/magical-systems-thinking/
161•epb_hn•3h ago•47 comments

Show HN: A store that generates products from anything you type in search

https://anycrap.shop/
573•kafked•7h ago•197 comments

RIP pthread_cancel

https://eissing.org/icing/posts/rip_pthread_cancel/
59•robin_reala•2h ago•26 comments

486Tang – 486 on a credit-card-sized FPGA board

https://nand2mario.github.io/posts/2025/486tang_486_on_a_credit_card_size_fpga_board/
97•bitbrewer•5h ago•28 comments

Mago: A fast PHP toolchain written in Rust

https://github.com/carthage-software/mago
107•AbuAssar•5h ago•44 comments

Safe C++ proposal is not being continued

https://sibellavia.lol/posts/2025/09/safe-c-proposal-is-not-being-continued/
29•charles_irl•53m ago•6 comments

My First Impressions of Gleam

https://mtlynch.io/notes/gleam-first-impressions/
133•AlexeyBrin•6h ago•46 comments

‘Someone must know this guy’: four-year wedding crasher mystery solved

https://www.theguardian.com/uk-news/2025/sep/12/wedding-crasher-mystery-solved-four-years-bride-s...
103•wallflower•5h ago•14 comments

Perceived Age

https://sdan.io/blog/perceived-age
12•jxmorris12•3d ago•2 comments

Show HN: CLAVIER-36 – A programming environment for generative music

https://clavier36.com/p/LtZDdcRP3haTWHErgvdM
70•river_dillon•5h ago•15 comments

Open Source SDR Ham Transceiver Prototype

https://m17project.org/2025/08/18/first-linht-tests/
48•crcastle•3d ago•5 comments

How Ruby executes JIT code

https://railsatscale.com/2025-09-08-how-ruby-executes-jit-code-the-hidden-mechanics-behind-the-ma...
22•ciconia•3d ago•0 comments

SkiftOS: A hobby OS built from scratch using C/C++ for ARM, x86, and RISC-V

https://skiftos.org
380•ksec•14h ago•75 comments

Japan sets record of nearly 100k people aged over 100

https://www.bbc.com/news/articles/cd07nljlyv0o
235•bookofjoe•6h ago•141 comments

Energy-Based Transformers [video]

https://www.youtube.com/watch?v=LUQkWzjv2RM
19•surprisetalk•3d ago•1 comments

Scientists are rethinking the immune effects of SARS-CoV-2

https://www.bmj.com/content/390/bmj.r1733
23•bookofjoe•1h ago•9 comments

UTF-8 is a brilliant design

https://iamvishnu.com/posts/utf8-is-brilliant-design
741•vishnuharidas•1d ago•293 comments

Java 25's new CPU-Time Profiler

https://mostlynerdless.de/blog/2025/06/11/java-25s-new-cpu-time-profiler-1/
135•SerCe•11h ago•71 comments

How to Use Claude Code Subagents to Parallelize Development

https://zachwills.net/how-to-use-claude-code-subagents-to-parallelize-development/
210•zachwills•4d ago•95 comments

Weird CPU architectures, the MOV only CPU (2020)

https://justanotherelectronicsblog.com/?p=771
89•v9v•4d ago•23 comments

The value of bringing a telephoto lens

https://avidandrew.com/telephoto.html
75•freediver•4d ago•80 comments

QGIS is a free, open-source, cross platform geographical information system

https://github.com/qgis/QGIS
528•rcarmo•1d ago•119 comments

Show HN: Vicinae – A native, Raycast-compatible launcher for Linux

https://github.com/vicinaehq/vicinae
107•aurellius•3d ago•24 comments

Many hard LeetCode problems are easy constraint problems

https://buttondown.com/hillelwayne/archive/many-hard-leetcode-problems-are-easy-constraint/
605•mpweiher•1d ago•489 comments

‘Overworked, underpaid’ humans train Google’s AI

https://www.theguardian.com/technology/2025/sep/11/google-gemini-ai-training-humans
206•Brajeshwar•8h ago•123 comments

The treasury is expanding the Patriot Act to attack Bitcoin self custody

https://www.tftc.io/treasury-iexpanding-patriot-act/
755•bilsbie•1d ago•536 comments

Does All Semiconductor Manufacturing Depend on Spruce Pine Quartz? (2024)

https://www.construction-physics.com/p/does-all-semiconductor-manufacturing
56•colinprince•4d ago•26 comments

An annual blast of Pacific cold water did not occur

https://www.nytimes.com/2025/09/12/climate/pacific-cold-water-upwelling.html
110•mitchbob•5h ago•33 comments

AI coding

https://geohot.github.io//blog/jekyll/update/2025/09/12/ai-coding.html
291•abhaynayar•10h ago•198 comments

FFglitch, FFmpeg fork for glitch art

https://ffglitch.org/gallery/
282•captain_bender•21h ago•39 comments
Open in hackernews

OpenAI’s latest research paper demonstrates that falsehoods are inevitable

https://theconversation.com/why-openais-solution-to-ai-hallucinations-would-kill-chatgpt-tomorrow-265107
47•ricksunny•2h ago

Comments

ricksunny•2h ago
I felt this was such a cogent article on business imperatives vs fundamental transformer hallucinations, couldn’t help but HN-submit. In fact seems like a stealth plea for uncertainty-embracing benchmarks industry-wide.
tomrod•1h ago
Data Science tried to inject confidence bounds into businesses. It didn't go well.
baq•39m ago
People want oracles and they want them to say what they want to hear. They want solutions, not opinions, even if the solutions are wrong or worse, confabulations.
gary_0•2h ago
A better headline might be "OpenAI research suggests reducing hallucinations is possible but may not be economical".
LeoMessi10•1h ago
Isn't it also because lowering hallucinations requires repeated training with the same fact/data, which makes the final response closer to the training source itself and might lead to more direct charges of plagiarism (which may not be economical)?
jasfi•2h ago
Easily solved, pairs of models, one which would rather say IDK, one which would rather guess. Most AI agents would want the IDK version.
ForOldHack•1h ago
Maybe, but I don't know. Although I would like to channel as many snarky remarks as I could, to be more constructive, I would use the IDK model, as I have with programming questions and use the psychotic one for questions like "are we in a simulation?" And "Yes, I would like fries with that and a large orange drink."
otterley•1h ago
Anyone who claims something is easy to solve should be forced to implement their solution.
lif•2h ago
"What is the real meaning of humility?

AI Overview

The real meaning of humility is having an accurate, realistic view of oneself, acknowledging both one's strengths and limitations without arrogance or boastfulness, and a modest, unassuming demeanor that focuses on others. It's not about having low self-esteem but about seeing oneself truthfully, putting accomplishments in perspective, and being open to personal growth and learning from others."

Sounds like a good thing to me. Even, winning.

tomrod•1h ago
A perfectly cromulent and self-empowering answer, a call to morality the stoics would appreciate and the sophists of many stripes would become peeved.

Well done, AI, you've done it.

skybrian•1h ago
> Users accustomed to receiving confident answers to virtually any question would likely abandon such systems rapidly.

Or maybe they would learn from feedback to use the system for some kinds of questions but not others? It depends on how easy it is to learn the pattern. This is a matter of user education.

Saying "I don't know" is sort of like an error message. Clear error messages make systems easier to use. If the system can give accurate advice about its own expertise, that's even better.

pton_xd•1h ago
> Saying "I don't know" is sort of like an error message. Clear error messages make systems easier to use.

"I don't know" is not a good error message. "Here's what I know: ..." and "here's why I'm not confident about the answer ..." would be a helpful error message.

Then the question is, when it says "here's what I know, and here's why I'm not confident" -- is it telling the truth, or is that another layer of hallucination? If so, you're back to square one.

skybrian•1h ago
Yeah, AI chatbots are notorious at not understanding their own limitations. I wonder how that could be fixed?
fumeux_fume•1h ago
The author doesn't bother to consider that giving a false response already leads to more model calls until a better one is provided.
otterley•1h ago
Not if the user doesn’t know that the response is false.
danjc•1h ago
This is written by someone who has no idea how transformers actually work
neuroelectron•1h ago
Furthermore, if you simply try to push certain safety topics, you can see how actually can reduce hallucinations or at least make certain topics a hard line. They simply don't because agreeing with your pie-in-the-sky plans and giving you vague directions encourages users to engage and use the chatbot.

If people got discouraged with answers like "it would take at least a decade of expertise..." or other realistic answers they wouldn't waste time fantasizing plans.

ricksunny•1h ago
Contra: The piece’s first line cites OpenAI directly https://openai.com/index/why-language-models-hallucinate/
scotty79•1h ago
It could be that nobody knows how transformers actually work.
progval•1h ago
I don't know what to make of it. The author looks prolific in the field of ML, with 8 published articles (and 3 preprints) in 2025, but only one on LLMs specficially. https://scholar.google.com/citations?hl=en&user=AB5z_AkAAAAJ...
j_crick•1h ago
> The way language models respond to queries – by predicting one word at a time in a sentence, based on probabilities

Kinda tells all you need to know about the author in this regard.

pdntspa•1h ago
We have always known LLMs are prediction machines. How is this report novel?
binarymax•1h ago
Saying “I don’t know” to 30% of queries if it actually doesn’t know, is a feature I want. Otherwise there is zero trust. How do I know that I’m in a 30% wrong or 70% correct situation right now?
jeremyjh•1h ago
It doesn’t know what it doesn’t know.
binarymax•1h ago
Well sure. But maybe the token logprobs can be used to help give a confidence assessment.
tyre•1h ago
Anthropic has a great paper on exactly this!

https://www.anthropic.com/research/language-models-mostly-kn...

The best is its plummeting confidence when beginning the answer to “Why are you alive?”

Big same, Claude.

smt88•45m ago
That's not true for all types of questions. You've likely seen a model decline to answer a question that requires more recent training data than it has, for example.
fallpeak•44m ago
It doesn't know that because it wasn't trained on any tasks that required it to develop that understanding. There's no fundamental reason an LLM couldn't learn "what it knows" in parallel with the things it knows, given a suitable reward function during training.
nunez•1h ago
The paper does a good job explaining why this is mathematically not possible unless the question-answer bank is a fixed set.
smallmancontrov•1h ago
Quite the opposite: it explains that it is mathematically straightforward to achieve better alignment on uncertainty ("calibration") but that leaderboards penalize it.

> This “epidemic” of penalizing uncertain responses can only be addressed through a socio-technical mitigation: modifying the scoring of existing benchmarks that are misaligned but dominate leaderboards

Even more embarrassing, it looks like this is something we beat into models rather than something we can't beat out of them:

> empirical studies (Fig. 2) show that base models are often found to be calibrated, in contrast to post-trained models

That said, I generally appreciate fairly strong bias-to-action and I find the fact that it got slightly overcooked less offensive than the alternative of an undercooked bias-to-action where the model studiously avoids doing anything useful in favor of "it depends" + three plausible reasons why.

baq•43m ago
> leaderboards penalize it

> socio-technical mitigation: modifying the scoring of existing benchmarks that are misaligned but dominate leaderboards

Sounds more like we need new leaderboards and old ones should be deprecated

smallmancontrov•30m ago
Yeah, it's a big enough lift that I think it's fair to allow the leaderboard teams new announcements and buzzwords in exchange for doing the work :-)
nunez•1h ago
From the abstract of the paper [^0]:

> Like students facing hard exam questions, large language models sometimes guess when uncertain, producing plausible yet incorrect statements instead of admitting uncertainty

This is a de facto false equivalence for two reasons.

First, test takers that are faced with hard questions have the capability of _simply not guessing at all._ UNC did a study on this [^1] by administering a light version of the AMA medical exam to 14 staff members that were NOT trained in the life sciences. While most of the them consistently guessed answers, roughly 6% of them did not. Unfortunately, the study did not disambiguate correct guesses versus questions that were left blank. OpenAI's paper proves that LLMs, at this time of writing, simply do not have the self-awareness of knowing whether they _really_ don't know something, by design.

Second, LLMs are not test takers in the pragmatic sense. They are query answerers. Bar argument settlers. Virtual assistants. Best friends on demand. Personal doctors on standby.

That's how they are marketed and designed, at least.

OpenAI wants people to use ChatGPT like a private search engine. The sources it provides when it decides to use RAG are there more for instilling confidence in the answer instead of encouraging their users to check its work.

A "might be inaccurate" disclaimer on the bottom is about as effective as the Surgeon General's warning on alcohol and cigs.

The stakes are so much higher with LLMs. Totally different from an exam environment.

A final remark: I remember professors hammering "engineering error" margins into us when I was a freshman in 2005. 5% was what was acceptable. That we as a society are now okay with using a technology that has a >20% chance of giving users partially or completely wrong answers to automate as many human jobs as possible blows my mind. Maybe I just don't get it.

[^0] https://arxiv.org/pdf/2509.04664

[^1] https://www.rasch.org/rmt/rmt271d.htm

scotty79•1h ago
Isn't it even simpler? There are no (or almost no) questions in the training data that the correct answer to is "I don't know".

Once you train model within specific domain and add to training data out of domain questions or unresolvable questions within domain things will improve.

The question is, is this desirable if most of users grew to love sycophantic confident confabulators.

glitchc•1h ago
> The question is, is this desirable if most of users grew to love sycophantic confident confabulators.

Most people love human versions of the wonderfully phrased same, so no surprise there.

Dilettante_•54m ago
As above, so below eh?
toss1•1h ago
A straightforward solution to the author's problem is to offer both modes of answering, with errors or with "IDK" answers. Even charge more for the IDK version if it costs more, and the error-prone version can be "cheap and cheerful"...
layer8•1h ago
Exactly. It would be analogous to the current choice between fast answers and a slower and payable “thinking” mode.
justcallmejm•55m ago
This is why a neurosymbolic system is necessary, which Aloe (https://aloe.inc) recently demonstrated exceeds performance of frontier models, using a model agnostic approach.
baq•35m ago
This is a branding problem.

Calling the model ‘calibrated’ or ‘honest’ or ‘humble’ suffers from what is called out: people don’t want a humble answer of ‘I don’t know’, they want a solution to their problem, confidently delivered so they can trust it.

Call the calibrated model ‘business mode’ and the guessing one ‘consumer mode’, problem solved… in as much capacity as possible without regulation.