frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Small Programs and Languages

https://ratfactor.com/cards/pl-small
21•todsacerdoti•56m ago•3 comments

Odyc.js – A tiny JavaScript library for narrative games

https://odyc.dev
17•achtaitaipai•49m ago•0 comments

A masochist's guide to web development

https://sebastiano.tronto.net/blog/2025-06-06-webdev/
14•sebtron•46m ago•0 comments

Swift and Cute 2D Game Framework: Setting Up a Project with CMake

https://layer22.com/swift-and-cute-framework-setting-up-a-project-with-cmake
44•pusewicz•3h ago•28 comments

Dystopian tales of that time when I sold out to Google

https://wordsmith.social/elilla/deep-in-mordor-where-the-shadows-lie-dystopian-stories-of-my-time-as-a-googler
66•stego-tech•58m ago•27 comments

Weaponizing Dependabot: Pwn Request at its finest

https://boostsecurity.io/blog/weaponizing-dependabot-pwn-request-at-its-finest
30•chha•3h ago•22 comments

Self-hosting your own media considered harmful according to YouTube

https://www.jeffgeerling.com/blog/2025/self-hosting-your-own-media-considered-harmful
1146•DavideNL•9h ago•472 comments

Jepsen: TigerBeetle 0.16.11

https://jepsen.io/analyses/tigerbeetle-0.16.11
129•aphyr•3h ago•32 comments

The impossible predicament of the death newts

https://crookedtimber.org/2025/06/05/occasional-paper-the-impossible-predicament-of-the-death-newts/
519•bdr•1d ago•176 comments

Deepnote (YC S19) is hiring engineers to build an AI-powered data notebook

https://deepnote.com/join-us
1•Equiet•2h ago

ThornWalli/web-workbench: Old operating system as homepage

https://github.com/ThornWalli/web-workbench
3•rbanffy•2h ago•0 comments

Freight rail fueled a new luxury overnight train startup

https://www.freightwaves.com/news/how-freight-rail-fueled-a-new-luxury-overnight-train-startup
39•Ozarkian•5h ago•49 comments

Show HN: Air Lab – A portable and open air quality measuring device

https://networkedartifacts.com/airlab/simulator
426•256dpi•1d ago•175 comments

Test Postgres in Python Like SQLite

https://github.com/wey-gu/py-pglite
128•wey-gu•13h ago•41 comments

Aether: A CMS That Gets Out of Your Way

https://lebcit.github.io/post/meet-aether-a-cms-that-actually-gets-out-of-your-way/
24•LebCit•7h ago•15 comments

How we’re responding to The NYT’s data demands in order to protect user privacy

https://openai.com/index/response-to-nyt-data-demands/
225•BUFU•13h ago•202 comments

Tokasaurus: An LLM inference engine for high-throughput workloads

https://scalingintelligence.stanford.edu/blogs/tokasaurus/
188•rsehrlich•17h ago•23 comments

Show HN: Claude Composer

https://github.com/possibilities/claude-composer
138•mikebannister•15h ago•78 comments

What a developer needs to know about SCIM

https://tesseral.com/blog/what-a-developer-needs-to-know-about-scim
131•noleary•15h ago•24 comments

APL Interpreter – An implementation of APL, written in Haskell (2024)

https://scharenbroch.dev/projects/apl-interpreter/
121•ofalkaed•17h ago•49 comments

AMD Radeon 8050S “Strix Halo” Linux Graphics Performance Review

https://www.phoronix.com/review/amd-radeon-8050s-graphics
43•rbanffy•4h ago•16 comments

Seven Days at the Bin Store

https://defector.com/seven-days-at-the-bin-store
201•zdw•22h ago•101 comments

Czech Republic: Petition for open source in public administration

https://portal.gov.cz/e-petice/1205-petice-za-povinne-zverejneni-zdrojovych-kodu-softwaru-pouzitych-ve-verejne-sprave
126•harvie•4h ago•19 comments

How to (actually) send DTMF on Android without being the default call app

https://edm115.dev/blog/2025/01/22/how-to-send-dtmf-on-android
3•EDM115•3h ago•0 comments

I made a search engine worse than Elasticsearch (2024)

https://softwaredoug.com/blog/2024/08/06/i-made-search-worse-elasticsearch
106•softwaredoug•19h ago•17 comments

SkyRoof: New Ham Satellite Tracking and SDR Receiver Software

https://www.rtl-sdr.com/skyroof-new-ham-satellite-tracking-and-sdr-receiver-software/
103•rmason•19h ago•11 comments

Show HN: Ask-human-mcp – zero-config human-in-loop hatch to stop hallucinations

https://masonyarbrough.com/blog/ask-human
96•echollama•15h ago•47 comments

Open Source Distilling

https://opensourcedistilling.com/
66•nativeit•12h ago•27 comments

Defending adverbs exuberantly if conditionally

https://countercraft.substack.com/p/defending-adverbs-exuberantly-if
50•benbreen•18h ago•29 comments

Show HN: Lambduck, a Functional Programming Brainfuck

https://imjakingit.github.io/lambduck/
54•jorkingit•15h ago•23 comments
Open in hackernews

Differences in link hallucination and source comprehension across different LLM

https://mikecaulfield.substack.com/p/differences-in-link-hallucination
75•hveksr•1d ago

Comments

dr_kiszonka•1d ago
In such cases, I get better answers to questions starting with "What" and not "Did".
hereonout2•1d ago
Prompt engineering!
milleramp•1d ago
Took some time to realize the SIFT toolbox mentioned in the article is not a Scale-Invariant Feature Transform toolbox.
motorest•1d ago
Taken from the blog:

> Why are we talking about “graduate and PhD-level intelligence” in these systems if they can’t find and verify relevant links — even directly after a search?

This is my pet peeves, and recently OpenAI's models seem to have become very militant in how they stand by and push their obviously hallucinated sources. I'm talking about hallucinating answers, when pressed to cite sources they also hallucinate URLs that never existed, when repeatedly prompted to verify how the are hallucinating the stick to their clearly wrong output, and ultimately fall back to claiming they were right but the URL somehow changed even though it never existed ever.

In order to start talking about PhD-level intelligence, in the very least these LLMs must support PhD-level context-seeking and information verification. It is not enough to output a wall of text that reads quite fluently. You must stick to verifiable facts.

thom•1d ago
I have search enabled 100% of the time with ChatGPT and would never go back to raw-dogging LLM citations. O3 especially has passed the threshold of “not always annoying”. Had an argument with Gemini yesterday where it was insisting on some hallucinated implementation of a function even while giving me a GitHub link to the correct source.
vanschelven•1d ago
Including literal 404s... As an outsider it has always struck me as absurd that they don't just do the equivalent of wget over all provided sources.
alkonaut•1d ago
Or why the LLM doesn’t do a lookup into a subset of the training data as a database and reject the output if it seems to be wrong. A billion of the most urls and the entirety of Wikipedia, arkiv and stackoverflow would go a long way.
vrighter•1d ago
If that could be done, then we would be using that and skipping the llms entirely
alkonaut•1h ago
Can’t see why that couldn’t be done? You save a http request for a ton of the urls.
krzat•1d ago
The approach of generating something and then looking for hallucinations is just stupid. To validate the output I have to be an expert. How do I become an expert if rely on LLMs? It's a dead end.
motorest•1d ago
> The approach of generating something and then looking for hallucinations is just stupid. To validate the output I have to be an expert.

No. You only need to check for sources, and then verify these sources exist and they support the claims.

It's the very definition of "fact".

In some cases, all you need to do is check if a URL that was cited does exist.

capnrefsmmat•1d ago
If the output is interpreting sources rather than just regurgitating quotes from them, you need to exert judgment to verify they support its claims. When the LLM output is about some highly technical subject, it can require expert knowledge just to judge whether the source supports the claims.
vrighter•1d ago
"and suport the claims" is doing some *extremely* heavy lifting there.

I can't write a software program, give the source to the greengrocer and expect him to be able to say anything about its quality. Just like I can't really say much about vegetables.

nkrisc•1d ago
Seems like the LLM is giving correct output if it’s generating a plausible string of tokens in response to your string of tokens.
motorest•1d ago
> Seems like the LLM is giving correct output if it’s generating a plausible string of tokens in response to your string of tokens.

No. If you prompt it to get a response and then you ask it to cite sources, if it outputs broken links that never existed then it clearly failed to deliver correct output.

nkrisc•1d ago
But are the links plausible text given the training data?

If the purpose is to accurately cite sources, how is it even possible to hallucinate them? Seems like folks are expecting way too much from these tools. They are not intelligent. Useful, perhaps.

Scarblac•1d ago
Seems that's just expecting things that LLMs were not designed for.

It's a token producer based on trained weights, it doesn't use any sources.

Even if it were "fixed" so that it only generates URLs that exist, it's still incorrect because it did not use any sources so those URLs are not sources.

soco•3h ago
Then let's face it: LLMs were not designed to give proper answers. Now that we settled this and the emperor is obviously naked, what?
vrighter•1d ago
"correct" for an llm means "fits the statistical distributions in the training data"

"correct" for you is "truth that corresponds to the real world"

They are two very different things. The llm's output is, very much, correct. Because it was never meant to mean anything other than similarity of probability distributions.

It's not what you wanted, but that doesn't make it incorrect. You're just under a wrong assumption about what you were asking for. You were asking for something that looks like it could be true. Even if you ask it to not hallucinate, you're just asking it to make it look like it is not hallucinating. Meanwhile you thought you were asking for the actual, real, answer to your question.

Timwi•23h ago
Oh okay, guess all LLMs are just fine then and we don't need to do any further development on them.
vharuck•23h ago
Right, the dialogue between the user and the LLM closely resembles documents used in training the LLM. People argue with, lie to, and misunderstand others on the internet. Here's a totally plausible hypothetical forum discussion:

Person A: I believe X.

Person B: Do you have a source for that?

A: Yes, it was shown by blah blah in the paper yada yada.

B: I don't think that study exists. Share a link?

A: [posts a URL]

B: That's not a real paper. The URL doesn't even work!

A: Works on my machine.

---

I've seen those kind of chats so many times online. Know what I haven't seen very often? When person A says "You're right, I made up that article. Let me look again for a real one, and I might change my opinion depending on what it says."

soco•3h ago
Why isn't the LLM under the wrong assumption? So I don't get from my tool what I need and it's still me at fault? I am not yet ready to bow to the AI overlords, sorry.
esafak•1d ago
This is trivial to overcome by using a REST client to verify the link through MCP, and by caching results it wouldn't even add much latency.
zone411•1d ago
If anyone is interested in a larger sample size comparing how often LLMs confabulate answers based on provided texts, I have a benchmark at https://github.com/lechmazur/confabulations/. It's always interesting to test new models with it because the results can be unintuitive compared to those from my other benchmarks.
dr_kiszonka•1d ago
Useful benchmark. I noticed o3-high hallucinating too often for such a good model, but it is usually great with search. In my experience, Claude Opus & Sonnet 4 consistently lie, cheat, and try to hide their tracks. Maybe they are good in writing code but I don't trust them with other things.
eviks•1d ago
> Why are we talking about “graduate and PhD-level intelligence” in these systems if they can’t find and verify relevant links

For exactly the same reason the author markets his tool as a research assistant

> It also models an approach that is less chatbot, and more research assistant in a way that is appropriate for student researchers, who can use it to aid research while coming to their own conclusions.

dedicate•1d ago
It's not just that they get links wrong, it's how they get them wrong – like, totally fabricating them and then doubling down! A human messing up a citation is one thing, but this feels... different, almost like a creative act of deception, lol.
simonw•1d ago
The key thing I got from this article is that the o3 and Claude 4 projects (I'm differentiating from the models here because the harness of tools around them is critical too) are massively ahead of GPT 4.1 and Gemini 2.5 when it comes to fact checking in a way that benefits from search and web usage.

The o3 finding matches my own experience: https://simonwillison.net/2025/Apr/21/ai-assisted-search/#o3...

Both o3 and Claude 4 have a crucial new ability: they can run tools such as their search tool as part of their "reasoning" phase. I genuinely think this is one of the most exciting new advances in LLMs in the last six months.

simonw•1d ago
Products, not projects.
diego_moita•1d ago
I have a strange feeling: it seems that original insights and hallucinations are related. One seems to come very frequently with the other.

I've noticed that o3 is the one that lies with the most conviction (compared to Gemini Pro and Claude Sonnet). It will be the hardest to convince that it is wrong, will invent excuses and complex explanations for its lies, almost to a Trump level of lying and deception.

But it is also the one that provides the most interesting insights, that will look at what others don't see.

There might some kind deep truth in this correlation. Or it might be myself having an hallucination...

SubiculumCode•23h ago
I do wonder about the role of test time compute in the blog post in terms of document understanding. A non reasoning output (or low test time compute setting) might easily misinterpret the text, but reasoning models can second guess, consider multiple objectives in turn, and can right the ship.

I note that Gemini 2.5 has one of the lowest confabulation/hallucination rates according to this benchmark [1], so am surprised by the results in the blog.

Also, I have found link hallucination and output quality improve when you restrict searches to, for example, only pubmed sources, and to provide the source link directly into the text (as opposed to Gemini deep research usual method for citation).

One reason, I think, is that unrestricted search will get the paper, the related blog posts and press releases, weight them as equal (and independent!) sources of a fact, when we know that nuance is lost in the latter, and maybe because it will then spend more test time compute in the quality sources, not the press-releases.

[1]https://github.com/lechmazur/confabulations/