frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Commodore 64 Ultimate Review [video]

https://www.youtube.com/watch?v=UtLR4nXAm4w
1•skibz•31s ago•0 comments

Show HN: A registry for curated, high quality Claude skills and skillsets

https://noriskillsets.dev/
1•ritammehta•1m ago•0 comments

We built a museum exhibit about a 1990s game hint line, with a physical binder

https://yarnspinner.dev/blog/hint-line-93/
1•parisidau•4m ago•0 comments

Primary Emotional Systems and Personality: An Evolutionary Perspective (2017)

https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2017.00464/full
1•bfoks•7m ago•0 comments

Paul Bertorelli on the Future of Aviation Journalism: It's Challenging

https://avbrief.com/paul-bertorelli-on-the-future-of-aviation-journalism-its-challenging/
1•Stevvo•8m ago•0 comments

Ask HN: Do you have side income as a software engineer?

1•andrewstetsenko•8m ago•1 comments

Show HN: isometric.nyc – giant isometric pixel art map of NYC

https://cannoneyed.com/isometric-nyc/
4•cannoneyed•9m ago•1 comments

Show HN: Brickify – Webapp to convert 3D models into Lego brick assemblies

https://brickify.ad-si.com
1•adius•9m ago•0 comments

GitHub "Files Changed" Tab Change?

1•nonethewiser•10m ago•0 comments

Show HN: Desktop‑2FA – offline open‑source TOTP authenticator for desktop

https://desktop-2fa.lukasz-perek.workers.dev/
1•wrogistefan•11m ago•0 comments

Show HN: I'm tired of my LLM bullshitting. So I fixed it

2•BobbyLLM•11m ago•0 comments

My friend built a tool to detect when to post on socials starting with this one

https://hadaa.app/hn_dashboard
2•muriithiKabogo•11m ago•1 comments

STL Editing with FreeCAD

https://hackaday.com/2026/01/22/stl-editing-with-freecad/
1•rbanffy•13m ago•0 comments

Iran has now been under a national internet blackout for two full weeks

https://twitter.com/netblocks/status/2014375236674675147
1•beejiu•14m ago•0 comments

Easy Measures Doing, Simple Measures Understanding

https://blog.jim-nielsen.com/2026/easy-vs-simple/
1•sibeliuss•15m ago•0 comments

Notebook.link: The Future of Notebook Sharing

https://medium.com/@QuantStack/introducing-notebook-link-the-future-of-notebook-sharing-5de900a97b4a
4•SylvainCorlay•15m ago•1 comments

Reverse engineering Lyft Bikes for fun (and profit?)

https://ilanbigio.com/blog/lyft-bikes.html
2•ibigio•16m ago•0 comments

Work-from-office mandate? Expect top talent turnover, culture rot

https://www.cio.com/article/4119562/work-from-office-mandate-expect-top-talent-turnover-culture-r...
3•CrankyBear•16m ago•0 comments

Ed tech is profitable. It is also mostly useless

https://www.economist.com/united-states/2026/01/22/ed-tech-is-profitable-it-is-also-mostly-useless
1•2OEH8eoCRo0•17m ago•0 comments

Why there's no European Google? And why it is a good thing

https://ploum.net/2026-01-22-why-no-european-google.html
3•zdw•19m ago•1 comments

A Protocol for Package Management

https://nesbitt.io/2026/01/22/a-protocol-for-package-management.html
1•zdw•19m ago•0 comments

Ask HN: What is your Claude Code setup? For common or spec projects

1•seky•20m ago•3 comments

Understanding LSM trees via read, write, and space amplification

https://www.bitsxpages.com/p/understanding-lsm-trees-via-read
1•agavra•21m ago•0 comments

Feynman on Why He Almost Quit Physics

https://www.youtube.com/watch?v=f9k7zd_9mAo
2•haxiomic•22m ago•0 comments

GraphRAG for Production Engineer Agent Memory

https://www.decodingai.com/p/designing-production-engineer-agent-graphrag
1•rbanffy•24m ago•0 comments

Show HN: Kubecfg – A CLI to manage Kubernetes contexts and namespaces

https://github.com/kadirbelkuyu/kubecfg
1•kadirbelkuyu•25m ago•0 comments

Coinbase Scaled Their Hiring to 150 Engineers per Month

https://newsletter.eng-leadership.com/p/how-coinbase-scaled-their-hiring
1•rbanffy•25m ago•0 comments

Humanizer: A Claude Code skill that removes signs of AI-generated writing

https://github.com/blader/humanizer
1•zdw•25m ago•2 comments

Securing Agents in Production

https://blog.palantir.com/securing-agents-in-production-agentic-runtime-1-5191a0715240
1•A-K•26m ago•0 comments

Test

https://vimeo.com/
1•ashishmathur•26m ago•0 comments
Open in hackernews

GPTZero finds 100 new hallucinations in NeurIPS 2025 accepted papers

https://gptzero.me/news/neurips/
233•segmenta•1h ago

Comments

cogman10•1h ago
Yuck, this is going to really harm scientific research.

There is already a problem with papers falsifying data/samples/etc, LLMs being able to put out plausible papers is just going to make it worse.

On the bright side, maybe this will get the scientific community and science journalists to finally take reproducibility more seriously. I'd love to see future reporting that instead of saying "Research finds amazing chemical x which does y" you see "Researcher reproduces amazing results for chemical x which does y. First discovered by z".

godzillabrennus•1h ago
Have they solved the issue where papers that cite research already invalidated are still being cited?
cogman10•1h ago
AFAIK, no, but I could see there being cause to push citations to also cite the validations. It'd be good if standard practice turned into something like

Paper A, by bob, bill, brad. Validated by Paper B by carol, clare, charlotte.

or

Paper A, by bob, bill, brad. Unvalidated.

gcr•1h ago
Academics typically use citation count and popularity as a rough proxy for validation. It's certainly not perfect, but it is something that people think about. Semantic Scholar in particular is doing great work in this area, making it easy to see who cites who: https://www.semanticscholar.org/

Google Scholar's PDF reader extension turns every hyperlinked citation into a popout card that shows citation counts inline in the PDF: https://chromewebstore.google.com/detail/google-scholar-pdf-...

reliabilityguy•1h ago
Nope.

I am still reviewing papers that propose solutions based on a technique X, conveniently ignoring research from two years ago that shows that X cannot be used on its own. Both the paper I reviewed and the research showing X cannot be used are in the same venue!

b00ty4breakfast•1h ago
does it seem to be legitimate ignorance or maybe folks pushing ahead regardless of x being disproved?
freedomben•51m ago
IMHO, It's mostly ignorance coming a push/drive to "publish or perish." When the stakes are so high and output is so valued, and when reproducability isn't required, it disincentivizes thorough work. The system is set up in a way that is making it fail.

There is also the reality that "one paper" or "one study" can be found contradicted almost anything, so if you just went with "some other paper/study debunks my premise" then you'd end up producing nothing. Plus many inside know that there's a lot of slop out there that gets published, so they can (sometimes reasonably IMHO) dismiss that "one paper" even when they do know about it.

It's (mostly) not fraud or malicious intent or ignorance, it's (mostly) humans existing in the system in which they must live.

f311a•1h ago
For ML/AI/Comp sci articles, providing reproducible code is a great option. Basically, PoC or GTFO.
StableAlkyne•21m ago
The most annoying ones are those which discuss loosely the methodology but then fail to publish the weights or any real algorithms.

It's like buying a piece of furniture from IKEA, except you just get an Allen key, a hint at what parts to buy, and blurry instructions.

j45•1h ago
It will better expose the behaviour of false scientists.
StableAlkyne•46m ago
> I'd love to see future reporting that instead of saying "Research finds amazing chemical x which does y" you see "Researcher reproduces amazing results for chemical x which does y. First discovered by z".

Most people (that I talk to, at least) in science agree that there's a reproducibility crisis. The challenge is there really isn't a good way to incentivize that work.

Fundamentally (unless you're independent wealthy and funding your own work), you have to measure productivity somehow, whether you're at a university, government lab, or the private sector. That turns out to be very hard to do.

If you measure raw number of papers (more common in developing countries and low-tier universities), you incentivize a flood of junk. Some of it is good, but there is such a tidal wave of shit that most people write off your work as a heuristic based on the other people in your cohort.

So, instead it's more common to try to incorporate how "good" a paper is, to reward people with a high quantity of "good" papers. That's quantifying something subjective though, so you might try to use something like citation count as a proxy: if a work is impactful, usually it gets cited a lot. Eventually you may arrive at something like the H-index, which is defined as "The highest number H you can pick, where H is the number of papers you have written with H citations." Now, the trouble with this method is people won't want to "waste" their time on incremental work.

And that's the struggle here; even if we funded and rewarded people for reproducing results, they will always be bumping up the citation count of the original discoverer. But it's worse than that, because literally nobody is going to cite your work. In 10 years, they just see the original paper, a few citing works reproducing it, and to save time they'll just cite the original paper only.

There's clearly a problem with how we incentivize scientific work. And clearly we want to be in a world where people test reproducibility. However, it's very very hard to get there when one's prestige and livelihood is directly tied to discovery rather than reproducibility.

warkdarrior•43m ago
> If you measure raw number of papers (more common in developing countries and low-tier universities), you incentivize a flood of junk.

This is exactly what rewarding replication papers (that reproduce and confirm an existing paper) will lead to.

pixl97•37m ago
And yet if we can't reproduce an existing paper, it's very possible that existing paper is junk itself.

Catch-22 is a fun game to get caught in.

maerF0x0•40m ago
> The challenge is there really isn't a good way to incentivize that work.

What if we got Undergrads (with hope of graduate studies) to do it? Could be a great way to train them on the skills required for research without the pressure of it also being novel?

StableAlkyne•33m ago
Those undergrads still need to be advised and they use lab resources.

If you're a tenure-track academic, your livelihood is much safer from having them try new ideas (that you will be the corresponding author on, increasing your prestige and ability to procure funding) instead of incrementing.

And if you already have tenure, maybe you have the undergrad do just that. But the tenure process heavily filters for ambitious researchers, so it's unlikely this would be a priority.

If instead you did it as coursework, you could get them to maybe reproduce the work, but if you only have the students for a semester, that's not enough time to write up the paper and make it through peer review (which can take months between iterations)

suddenlybananas•19m ago
Unfortunately, that might just lead to a bunch of type II errors instead, if an effect requires very precise experimental conditions that undergrads lack the expertise for.
jimbokun•33m ago
> The challenge is there really isn't a good way to incentivize that work.

Ban publication of any research that hasn't been reproduced.

poulpy123•27m ago
> I'd love to see future reporting that instead of saying "Research finds amazing chemical x which does y" you see "Researcher reproduces amazing results for chemical x which does y. First discovered by z".

But nobody want to pay for it

geokon•15m ago
usually you reproduce previous research as a byproduct of doing something novel "on top" of the previous result. I dont really see the problem with the current setup.

sometimes you can just do something new and assume the previous result, but thats more the exception. youre almost always going to at least in part reproducr the previous one. and if issues come up, its often evident.

thats why citations work as a good proxy. X number of people have done work based around this finding and nobody has seen a clear problem

weslleyskah•7m ago
This is why universities should not make scientific research mandatory or even semi-mandatory for undergraduates; they should be reserved for postgraduates and those pursuing an academic career — those who are truly passionate about it. Even PhD students can be uncertain about their path.

Not to mention the difficulty of finding people in the industry with the technical ability to reproduce the work from scientific papers.

agumonkey•16m ago
I think, at least I hope, that a part of the LLM value will be to create their retirement for specific needs. Instead of asking it to solve any problem, restrict the space to a tool that can help you then reach your goal faster without the statistical nature of LLMs.
mike_hearn•10m ago
Reproducibility is overrated and if you could wave a wand to make all papers reproducible tomorrow, it wouldn't fix the problem. It might even make it worse.

https://blog.plan99.net/replication-studies-cant-fix-science...

qwertox•1h ago
It would be great if those scientists who use AI without disclosing it get fucked for life.
direwolf20•1h ago
"scientists" FYI. Making shit up isn't science.
yesitcan•1h ago
One fuck seems appropriate.
oofbey•1h ago
Harsh sentiment. Pretty soon every knowledge worker will use AI every day. Should people disclose spellcheckers powered by AI? Disclosing is not useful. Being careful in how you use it and checking work is what matters.
ambicapter•1h ago
> Should people disclose spellcheckers powered by AI?

Thank you for that perfect example of a strawman argument! No, spellcheckers that use AI is not the main concern behind disclosing the use of AI in generating scientific papers, government reports, or any large block of nonfiction text that you paid for that is supposed to make to sense.

fisf•1h ago
People are accountable for the results they produce using AI. So a scientist is responsible for made up sources in their paper, which is plain fraud.
oofbey•1h ago
I completely agree. But “disclosing the use of AI” doesn’t solve that one bit.
barbazoo•1h ago
I don’t disclose what keyboard I use to write my code or if I applied spellcheck afterward. The result is 100% theirs.
eichin•15m ago
"responsible for made up sources" leads to the hilarious idea that if you cite a paper that doesn't exist, you're now obliged to write that paper (getting it retroactively published might be a challenge though)
Proziam•1h ago
False equivalence. This isn't about "using AI" it's about having an AI pretend to do your job.

What people are pissed about is the fact their tax dollars fund fake research. It's just fraud, pure and simple. And fraud should be punished brutally, especially in these cases, because the long tail of negative effects produces enormous damage.

freedomben•33m ago
I was originally thinking you were being way too harsh with your "punish criminally" take, but I must admit, you're winning me over. I think we would need to be careful to ensure we never (or realistically, very rarely) convict an innocent person, but this is in many cases outright theft/fraud when someone is making money or being "compensated" for producing work that is fraudulent.

For people who think this is too harsh, just remember we aren't talking about undergrads who cheat on a course paper here. We're talking about people who were given money (often from taxpayers) that committed fraud. This is textbook white collar crime, not some kid being lazy. At a minimum we should be taking all that money back from them and barring them from ever receiving grant money again. In some cases I think fines exceeding the money they received would be appropriate.

geremiiah•1h ago
What they are doing is plain cheating the system to get their 3 conference papers so they can get their $150k+ job at FAANG. It's plain cheating with no value.
barbazoo•1h ago
People that cheat with AI now probably found ways to cheat before as well.
shermantanktop•1h ago
Cheating by people in high status positions should get the hammer. But it gets the hand-wringing what-have-we-come-to treatment instead.
WarmWash•54m ago
We are only looking at one side of the equation here, in this whole thread.

This feels a bit like the "LED stoplights shouldn't be used because they don't melt snow" argument.

vimda•1h ago
"Pretty soon every knowledge worker will use AI every day" is a wild statement considering the reporting that most companies deploying AI solutions are seeing little to no benefit, but also, there's a pretty obvious gap between spell checkers and tools that generate large parts of the document for you
PunchyHamster•57m ago
nice job moving the goalpost from "hallucinated the research/data" to "spellchecker error"
duskdozer•54m ago
>Pretty soon every knowledge worker will use AI every day.

Maybe? There's certainly a push to force the perception of inevitability.

Sharlin•7m ago
In general we're pretty good at drawing a line between purely editorial stuff like using a spellchecker, or even the services a professional editor (no need to acknowledge), and independent intellectual contribution (must be acknowledged). There's no slippery slope.
bwfan123•52m ago
> It would be great if those scientists who use AI without disclosing it get fucked for life.

There need to be dis-incentives for sloppy work. There is a tension between quality and quantity in almost every product. Unfortunately academia has become a numbers-game with paper-mills.

pandemic_region•27m ago
Instead of publishing their papers in the prestigious zines - which is what they're after - we will publish them in "AI Slop Weekly" with name and picture. Up the submission risk a bit.
jordanpg•1h ago
If these are so easy to identify, why not just incorporate some kind of screening into the early stages of peer review?
DetectDefect•1h ago
Because real work takes time and effort, and there is no real incentive for it here.
tossandthrow•1h ago
What makes you believe that are easy to identify?
emil-lp•1h ago
One could require DOIs for each reference. That's both realistic to achieve and easy to verify.

Although then why not just cite existing papers for bogus reasons?

direwolf20•1h ago
Wow! They're literally submitting references to papers by Firstname Lastname, John Doe and Jane Smith and nobody is noticing or punishing them.
emil-lp•1h ago
They might (I hope) still be punished after discovery.
an0malous•1h ago
It’s the way of the future
heliumtera•1h ago
Maybe "muh science" was always a fucking joke and the only difference being now we can point to an undeniable proof it is a fucking joke?
azan_•58m ago
Yes, it only led to all advancements in the history of humanity, what a joke!
Sharlin•27m ago
Aaand "the insane take of the day" award goes to…
CGMthrowaway•1h ago
Which is worse:

a) p-hacking and suppressing null results

b) hallucinations

c) falsifying data

Would be cool to see an analysis of this

Proziam•1h ago
All 3 of these should be categorized as fraud, and punished criminally.
internetter•1h ago
criminally feels excessive?
Proziam•1h ago
If I steal hundreds of thousands of dollars (salary, plus research grants and other funds) and produce fake output, what do you think is appropriate?

To me, it's no different than stealing a car or tricking an old lady into handing over her fidelity account. You are stealing, and society says stealing is a criminal act.

WarmWash•58m ago
We have a civil court system to handle stuff like this already.
Proziam•46m ago
Stealing more than a few thousand dollars is a felony, and felonies are handled in criminal court, not civil.

EDIT - The threshold amount varies. Sometimes it's as low as a few hundred dollars. However, the point stands on its own, because there's no universe where the sum in question is in misdemeanor territory.

WarmWash•35m ago
It would fall under the domain of contract law, because maybe the contract of the grant doesn't prohibit what the researcher did. The way to determine that would be in court - civil court.

Most institutions aren't very chill with grant money being misused, so we already don't need to burden then state with getting Johnny muncipal prosecutor to try and figure out if gamma crystallization imaging sources were incorrect.

wat10000•41m ago
We also have a criminal court system to handle stuff like this.
WarmWash•33m ago
No we don't. I've never seen a private contract dispute go to criminal court, probably because it's a civil matter.

If they actually committed theft, well then that already is illegal too.

But right now, doing "shitty research" isn't illegal and it's unlikely it ever will be.

jacquesm•59m ago
You could make a good case for a white collar crime here, fraud for instance.
fulafel•1h ago
Is there a comparison to rate of reference errors in other forums?
dtartarotti•1h ago
It is very concerning that these hallucinations passed through peer review. It's not like peer review is a fool-proof method or anything, but the fact that reviewers did not check all references and noticed clearly bogus ones is alarming and could be a sign that the article authors weren't the only ones using LLMs in the process...
amanaplanacanal•1h ago
Is it common for peer reviewers to check references? Somehow I thought they mostly focused on whether the experiment looked reasonable and the conclusions followed.
emil-lp•1h ago
In journal publications it is, but without DOIs it's difficult.

In conference publications, it's less common.

Conference publications (like NEURips) is treated as announcement of results, not verified.

empiko•58m ago
Nobody in ML or AI is verifying all your references. Reviewers will point out if you miss a super related work, but that's it. This is especially true with the recent (last two decades?) inflation in citation counts. You regularly have papers with 50+ references for all kinds of claims and random semirelated work. The citation culture is really uninspiring.
smallpipe•1h ago
Could you run a similar analysis for pre-2020 papers? It'd be interesting to know how prevalent making up sources was before LLMs.
tasuki•44m ago
Also, it'd be interesting how many pre-2020 papers their "AI detector" marks as AI-generated. I distrust LLMs somewhat, but I distrust AI detectors even more.
theptip•41m ago
Yeah, it’s kind of meaningless to attribute this to AI without measuring the base rate.

It’s for sure plausible that it’s increasing, but I’m certain this kind of thing happened with humans too.

bonsai_spool•1h ago
This suggests that nobody was screening this papers in the first place—so is it actually significant that people are using LLMs in a setting without meaningful oversight?

These clearly aren't being peer-reviewed, so there's no natural check on LLM usage (which is different than what we see in work published in journals).

emil-lp•1h ago
As one who reviews 20+ papers per year, we don't have time to verify each reference.

We verify: is the stuff correct, and is it worthy of publication (in the given venue) given that it is correct.

There is still some trust in the authors to not submit made-up-stuff, albeit it is diminishing.

paulmist•44m ago
I'm surprised the conference doesn't provide tooling to validate all references automatically.
Sharlin•18m ago
How would you do that? Even in cases where there's a standard format, a DOI on every reference, and some giant online library of publication metadata, including everything that only exists in dead tree format, that just lets you check whether the cited work exists, not whether it's actually a relevant thing to cite in the context.
gcr•1h ago
Academic venues don't have enough reviewers. This problem isn't new, and as publication volumes increase, it's getting sharply worse.

Consider the unit economics. Suppose NeurIPS gets 20,000 papers in one year. Suppose each author should expect three good reviews, so area chairs assign five reviewers per paper. In total, 100,000 reviews need to be written. It's a lot of work, even before factoring emergency reviewers in.

NeurIPS is one venue alongside CVPR, [IE]CCV, COLM, ICML, EMNLP, and so on. Not all of these conferences are as large as NeurIPS, but the field is smaller than you'd expect. I'd guess there are 300k-1m people in the world who are qualified to review AI papers.

khuey•51m ago
Seems like using tooling like this to identify papers with fake citations and auto-rejecting them before they ever get in front of a reviewer would kill two birds with one stone.
gcr•47m ago
It's not always possible to distinguish between fake citations and citations that are simply hard to find (e.g. wonderful old books that aren't on the Internet).

Another problem is that conferences move slowly and it's hard to adjust the publication workflow in such an invasive way. CVPR only recently moved from Microsoft's CMT to OpenReview to accept author submissions, for example.

There's a lot of opportunity for innovation in this space, but it's hard when everyone involved would need to agree to switch to a different workflow.

(Not shooting you down. It's just complicated because the people who would benefit are far away from the people who would need to do the work to support it...)

alain94040•55m ago
When I was reviewing such papers, I didn't bother checking that 30+ citations were correctly indexed. I focused on the article itself, and maybe 1 or 2 citations that are important. That's it. For most citations, they are next to an argument that I know is correct, so why would I bother checking. What else do you expect? My job was to figure out if the article ideas are novel and interesting, not if they got all their citations right.
geremiiah•1h ago
A lot of research in AI/ML seems to me to be "fake it and never make it". Literally it's all about optics, posturing, connections, publicity. Lots of bullshit and little substance. This was true before AI slop, too. But the fact that AI slop can make it pass the review really showcases how much a paper's acceptance hinges on things, other than the substance and results of the paper.

I even know PIs who got fame and funding based on some research direction that supposedly is going to be revolutionary. Except all they had were preliminary results that from one angle, if you squint, you can envision some good result. But then the result never comes. That's why I say, "fake it, and never make it".

gcr•1h ago
I was getting completely AI-generated reviews for a WACV publication back in 2024. The area chairs are so overworked that authors don't have much recourse, which sucks but is also really hard to handle unless more volunteers step up to the bat to help organize the conference.

(If you're qualified to review papers, please email the program chair of your favorite conference and let them know -- they really need the help!)

As for my review, the review form has a textbox for a summary, a textbox for strengths, a textbox for weaknesses, and a textbox for overall thoughts. The review I received included one complete set of summary/strengths/weaknesses/closing thoughts in the summary text box, another distinct set of summary/strengths/weaknesses/closing thoughts in the strengths, another complete and distinct review in the weaknesses, and a fourth complete review in the closing thoughts. Each of these four reviews were slightly different and contradicted each other.

The reviewer put my paper down as a weak reject, but also said "the pros greatly outweigh the cons."

They listed "innovative use of synthetic data" as a strength, and "reliance on synthetic data" as a weakness.

Tom1380•1h ago
No ETH Zurich, let's go
gcr•1h ago
NeurIPS leadership doesn’t think hallucinated references are necessarily disqualifying; see the full article from Fortune for a statement from them: https://archive.ph/yizHN

> When reached for comment, the NeurIPS board shared the following statement: “The usage of LLMs in papers at AI conferences is rapidly evolving, and NeurIPS is actively monitoring developments. In previous years, we piloted policies regarding the use of LLMs, and in 2025, reviewers were instructed to flag hallucinations. Regarding the findings of this specific work, we emphasize that significantly more effort is required to determine the implications. Even if 1.1% of the papers have one or more incorrect references due to the use of LLMs, the content of the papers themselves are not necessarily invalidated. For example, authors may have given an LLM a partial description of a citation and asked the LLM to produce bibtex (a formatted reference). As always, NeurIPS is committed to evolving the review and authorship process to best ensure scientific rigor and to identify ways that LLMs can be used to enhance author and reviewer capabilities.”

Analemma_•1h ago
Kinda gives the whole game away, doesn’t it? “It doesn’t actually matter if the citations are hallucinated.”

In fairness, NeurIPS is just saying out loud what everyone already knows. Most citations in published science are useless junk: it’s either mutual back-scratching to juice h-index, or it’s the embedded and pointless practice of overcitation, like “Human beings need clean water to survive (Franz, 2002)”.

Really, hallucinated citations are just forcing a reckoning which has been overdue for a while now.

jacquesm•1h ago
There should be a way to drop any kind of circular citation ring from the indexes.
gcr•50m ago
It's tough because some great citations are hard to find/procure still. I sometimes refer to papers that aren't on the Internet (eg. old wonderful books / journals).
jacquesm•39m ago
But that actually strengthens those citiations. The I scratch your back you scratch mine ones are the ones I'm getting at and that is quite hard to do with old and wonderful stuff, the authors there are probably not in a position to reciprocate by virtue of observing the grass from the other side.
gcr•24m ago
I think it's a hard problem. The semanticscholar folks are doing the sort of work that would allow them to track this; I wonder if they've thought about it.

A somewhat-related parable: I once worked in a larger lab with several subteams submitting to the same conference. Sometimes the work we did was related, so we both cited each other's paper which was also under review at the same venue. (These were flavor citations in the "related work" section for completeness, not material to our arguments.) In the review copy, the reference lists the other paper as written by "anonymous (also under review at XXXX2025)," also emphasized by a footnote to explain the situation to reviewers. When it came time to submit the camera-ready copy, we either removed the anonymization or replaced it with an arxiv link if the other team's paper got rejected. :-) I doubt this practice improved either paper's chances of getting accepted.

Are these the sorts of citation rings you're talking about? If authors misrepresented the work as if it were accepted, or pretended it was published last year or something, I'd agree with you, but it's not too uncommon in my area for well-connected authors to cite manuscripts in process. I don't think it's a problem as long as they don't lean on them.

jacquesm•20m ago
No, I'm talking about the ones where the citation itself is almost or even completely irrelevant and used as a way to inflate the citation count of the authors. You could find those by checking whether or not the value as a reference (ie: contributes to the understanding of the paper you are reading) is exceeded by the value of the linkage itself.
fc417fc802•26m ago
> Most citations in published science are useless junk:

Can't say that matches my experience at all. Once I've found a useful paper on a topic thereafter I primarily navigate the literature by traveling up and down the citation graph. It's extremely effective in practice and it's continued to get easier to do as the digitization of metadata has improved over the years.

empath75•1h ago
I think a _single_ instance of an LLM hallucination should be enough to retract the whole paper and ban further submissions.
gcr•59m ago
Going through a retraction and blacklisting process is also a lot of work -- collecting evidence, giving authors a chance to respond and mediate discussion, etc.

Labor is the bottleneck. There aren't enough academics who volunteer to help organize conferences.

(If a reader of this comment is qualified to review papers and wants to step up to the plate and help do some work in this area, please email the program chairs of your favorite conference and let them know. They'll eagerly put you to work.)

pessimizer•50m ago
That's exactly why the inclusion of a hallucinated reference is actually a blessing. Instead going back and forth with the fraudster, just tell them to find the paper. If they can't, case closed. Massive amount of time and money saved.
gcr•43m ago
Isn't telling them to find the paper just "going back and forth with a fraudster"?

One "simple" way of doing this would be to automate it. Have authors step through a lint step when their camera-ready paper is uploaded. Authors would be asked to confirm each reference and link it to a google scholar citation. Maybe the easy references could be auto-populated. Non-public references could be resolved by uploading a signed statement or something.

There's no current way of using this metadata, but it could be nice for future systems.

Even the Scholar team within Google is woefully understaffed.

My gut tells me that it's probably more efficient to just drag authors who do this into some public execution or twitter mob after-the-fact. CVPR does this every so often for authors who submit the same paper to multiple venues. You don't need a lot of samples for deterrence to take effect. That's kind of what this article is doing, in a sense.

wing-_-nuts•54m ago
I dunno about banning them, humans without LLMs make mistakes all the time, but I would definitely place them under much harder scrutiny in the future.
pessimizer•44m ago
Hallucinations aren't mistakes, they're fabrications. The two are probably referred to by the same word in some languages.

Institutions can choose an arbitrary approach to mistakes; maybe they don't mind a lot of them because they want to take risks and be on the bleeding edge. But any flexible attitude towards fabrications is simply corruption. The connected in-crowd will get mercy and the outgroup will get the hammer. Anybody criticizing the differential treatment will be accused of supporting the outgroup fraudsters.

gcr•36m ago
Fabrications carry intent to decieve. I don't think hallucinations necessarily do. If anything, they're a matter of negligence, not deception.

Think of it this way: if I wanted to commit pure academic fraud maliciously, I wouldn't make up a fake reference. Instead, I'd find an existing related paper and merely misrepresent it to support my own claims. That way, the deception is much harder to discover and I'd have plausible deniability -- "oh I just misunderstood what they were saying."

I think most academic fraud happens in the figures, not the citations. Researchers are more likely to to be successful at making up data points than making up references because it's impossible to know without the data files.

andy99•50m ago

   For example, authors may have given an LLM a partial description of a citation and asked the LLM to produce bibtex
This is equivalent to a typo. I’d like to know which “hallucinations” are completely made up, and which have a corresponding paper but contain some error in how it’s cited. The latter I don’t think matters.
burkaman•22m ago
If you click on the article you can see a full list of the hallucinations they found. They did put in the effort to look for plausible partial matches, but most of them are some variation of "No author or title match. Doesn't exist in publication."
jklinger410•51m ago
> the content of the papers themselves are not necessarily invalidated. For example, authors may have given an LLM a partial description of a citation and asked the LLM to produce bibtex (a formatted reference)

Maybe I'm overreacting, but this feels like an insanely biased response. They found the one potentially innocuous reason and latched onto that as a way to hand-wave the entire problem away.

Science already had a reproducibility problem, and it now has a hallucination problem. Considering the massive influence the private sector has on the both the work and the institutions themselves, the future of open science is looking bleak.

paulmist•39m ago
Isn't disqualifying X months of potentially great research due to a misformed, but existing reference harsh? I don't think they'd be okay with references that are actually made up.
suddenlybananas•17m ago
It's a sign of dishonesty, not a perfect one, but an indicator.
orbital-decay•38m ago
The wording is not hand-wavy. They said "not necessarily invalidated", which could mean that innocuous reason and nothing extra.
derf_•41m ago
This will continue to happen as long as it is effectively unpunished. Even retracting the paper would do little good, as odds are it would not have been written if the author could not have used an LLM, so they are no worse off for having tried. Scientific publications are mostly a numbers game at this point. It is just one more example of a situation where behaving badly is much cheaper than policing bad behavior, and until incentives are changed to account for that, it will only get worse.
Aurornis•40m ago
> Even if 1.1% of the papers have one or more incorrect references due to the use of LLMs, the content of the papers themselves are not necessarily invalidated.

This statement isn’t wrong, as the rest of the paper could still be correct.

However, when I see a blatant falsification somewhere in a paper I’m immediately suspicious of everything else. Authors who take lazy shortcuts when convenient usually don’t just do it once, they do it wherever they think they can get away with it. It’s a slippery slope from letting an LLM handle citations to letting the LLM write things for you to letting the LLM interpret the data. The latter opens the door to hallucinated results and statistics, as anyone who has experimented with LLMs for data analysis will discover eventually.

mlmonkey•11m ago
Why not run every submitted paper through GPTZero (before sending to reviewers) and summarily reject any paper with a hallucination?
Molitor5901•1h ago
AI might just extinguish the entire paradigm of publish or perish. The sheer volume of papers makes it nearly impossible to properly decide which papers have merit, which are non-replicate and suspect, and which are just a desperate rush to publish. The entire practice needs to end.
shermantanktop•1h ago
But how could we possibly evaluate faculty and researcher quality without counting widgets on an assembly line? /s

It’s a problem. The previous regime prior to publishing-mania was essentially a clubby game of reputation amongst peers based on cocktail party socialization.

The publication metrics came out of the harder sciences, I believe, and then spread to the softest of humanities. It was always easy to game a bit if you wanted to try, but now it’s trivial to defeat.

TAULIC15•1h ago
OHHH IS GOOD
armcat•59m ago
This is awful but hardly surprising. Someone mentioned reproducible code with the papers - but there is a high likelihood of the code being partially or fully AI generated as well. I.e. AI generated hypothesis -> AI produces code to implement and execute the hypothesis -> AI generates paper based on the hypothesis and the code.

Also: there were 15 000 submissions that were rejected at NeurIPS; it would be very interesting to see what % of those rejected were partially or fully AI generated/hallucinated. Are the ratios comperable?

blackbear_•46m ago
Whether the code is AI generated or not is not important, what matters is that it really works.

Sharing code enables others to validate the method on a different dataset.

Even before LLMs came around there were lots of methods that looked good on paper but turned out not to work outside of accepted benchmarks

depressionalt•56m ago
This is nice and all, but what repercussion does GPTZero get when their bullshit AI detection hallucinates a student using AI? And when that student receives academic discipline because of it?

Many such cases of this. More than 100!

They claim to have custom detection for GPT-5, Gemini, and Claude. They're making that up!

freedomben•40m ago
Indeed. My son has been accused by bullshit AI detection as having used AI, and it has devastated his work quality. After being "disciplined" for using AI (when he didn't), he now intentionally tries to "dumb down" his writing so that it doesn't sound so much like AI. The result is he writes much worse. What a shitty, shitty outcome. I've even found myself leaving typos and things in (even on sites like HN) because if you write too well, inevitably some comment replier will call you out as being an LLM even when you aren't. I'm as annoyed by the LLM posts as everybody else, but the answer surely is not to dumb us down into Idiocracy.
Sharlin•15m ago
It's almost as if this whole LLM stuff wasn't a net benefit to the society after all.
theptip•51m ago
This is mostly an ad for their product. But I bet you can get pretty good results with a Claude Code agent using a couple simple skills.

Should be extremely easy for AI to successfully detect hallucinated references as they are semi-structured data with an easily verifiable ground truth.

leggerss•50m ago
I don't understand: why aren't there automated tools to verify citations' existence? The data for a citation has a structured styling (APA, MLA, Chicago) and paper metadata is available via e.g. a web search, even if the paper contents are not

I guess GPTZero has such a tool. I'm confused why it isn't used more widely by paper authors and reviewers

gh02t•40m ago
Citations are too open ended and prone to variation, and legitimate minor mistskes that wouldn't bother a human verifier but would break automated tools to easily verify in their current form. DOI was supposed to solve some of the literal mechanical variation of the existence of a source, but journal paywalls and limited adoption mean that is not a universal solution. Plus DOI still doesn't easily verify the factual accuracy of a citation, like "does the source say what the citation says it does," which is the most important part.

In my experience you will see considerable variation in citation formats, even in journals that strictly define it and require using BibTex. And lots of journals leave their citation format rules very vague. Its a problem that runs deep.

eichin•20m ago
Looks like GPTZero Source Finder was only released a year ago - if anything, I'm surprised slop-writers aren't using it preemptively, since they're "ahead of the curve" relative to reviewers on this sort of thing...
yepyeaisntityea•44m ago
No surprises. Machine learning has, at least since 2012, been the go-to field for scammers and grifters. Machine learning, and technology in general, is basically a few real ideas, a small number of honest hard workers, and then millions of fad chasers and scammers.
mt_•44m ago
It would be ironic if the very detection of hallucinations contained hallucinations of its own.
doug_durham•31m ago
Getting papers published is now more about embellishing your CV versus a sincere desire to present new research. I see this everywhere at every level. Getting a paper published anywhere is a checkbox in completing your resume. As an industry we need to stop taking this into consideration when reviewing candidates or deciding pay. In some sense it has become an anti-signal.
nerdjon•30m ago
The downstream effects of this are extremely concerning. We have already seen the damage caused by human written research that was later retracted like the “research” on vaccines causing autism.

As we get more and more papers that may be citing information that was originally hallucinated in the first place we have a major reliability issue here. What is worse is people that did not use AI in the first place will be caught in the crosshairs since they will be referencing incorrect information.

There needs to be a serious amount of education done on what these tools can and cannot do and importantly where they fail. Too many people see these tools as magic since that is what the big companies are pushing them as.

Other than that we need to put in actual repercussions for publishing work created by an LLM without validating it (or just say you can’t in the first place but I guess that ship has sailed) or it will just keep happening. We can’t just ignore it and hope it won’t be a problem.

And yes, humans can make mistakes too. The difference is accountability and the ability to actually be unsure about something so you question yourself to validate.

pandemic_region•29m ago
What if they would only accept handwritten papers? Basically the current system is beyond repair, so may as well go back to receiving 20 decent papers instead of 20k hallucinated ones.
ctoth•26m ago
How you know it's really real is that they clearly tell the FPR, and compare against a pre-llm baseline.

But I saw it in Apple News, so MISSION ACCOMPLISHED!

yobbo•26m ago
As long as these sorts of papers serve more important purposes for the careers of the authors than anything related to science or discovery of knowledge, then of course this happens and continues.

The best possible outcome is that these two purposes are disconflated, with follow-on consequences for the conferences and journals.

poulpy123•25m ago
All papers proved to have used a LLM beyond writing improvement should be automatically retracted
brador•21m ago
The problem isn’t scale.

The problem is consequences (lack of).

Doing this should get you barred from research. It won’t.

CrzyLngPwd•19m ago
This is not the AI future we dreamed of, or feared.
nospice•16m ago
We've been talking about a "crisis of reproducibility" for years and the incentive to crank out high volumes of low-quality research. We now have a tool that brings down the cost of producing plausibly-looking research down to zero. So of course we're going to see that tool abused on a galactic scale.

But here's the thing: let's say you're an university or a research institution that wants to curtail it. You catch someone producing LLM slop, and you confirm it by analyzing their work and conducting internal interviews. You fire them. The fired researcher goes public saying that they were doing nothing of the sort and that this is a witch hunt. Their blog post makes it to the front page of HN, garnering tons of sympathy and prompting many angry calls to their ex-employer. It gets picked up by some mainstream outlets, too. It happened a bunch of times.

In contrast, there are basically no consequences to institutions that let it slide. No one is angrily calling the employers of the authors of these 100 NeurIPS papers, right? If anything, there's the plausible deniability of "oh, I only asked ChatGPT to reformat the citations, the rest of the paper is 100% legit, my bad".

meindnoch•15m ago
Jamie, bring up their nationalities.