frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

GPTZero finds 100 new hallucinations in NeurIPS 2025 accepted papers

https://gptzero.me/news/neurips/
273•segmenta•2h ago•151 comments

In Europe, Wind and Solar Overtake Fossil Fuels

https://e360.yale.edu/digest/europe-wind-solar-fossil-fuels
234•speckx•3h ago•177 comments

Qwen3-TTS Family Is Now Open Sourced: Voice Design, Clone, and Generation

https://qwen.ai/blog?id=qwen3tts-0115
156•Palmik•3h ago•31 comments

Tree-sitter vs. Language Servers

https://lambdaland.org/posts/2026-01-21_tree-sitter_vs_lsp/
99•ashton314•2h ago•28 comments

It looks like the status/need-triage label was removed

https://github.com/google-gemini/gemini-cli/issues/16728
39•nickswalker•1h ago•10 comments

Design Thinking Books You Must Read

https://www.designorate.com/design-thinking-books/
188•rrm1977•5h ago•89 comments

AnswerThis (YC F25) Is Hiring

https://www.ycombinator.com/companies/answerthis/jobs/r5VHmSC-ai-agent-orchestration
1•ayush4921•22m ago

Show HN: isometric.nyc – giant isometric pixel art map of NYC

https://cannoneyed.com/isometric-nyc/
26•cannoneyed•30m ago•4 comments

Launch HN: Constellation Space (YC W26) – AI for satellite mission assurance

https://constellation-io.com/
4•kmajid•19m ago•0 comments

Miami, Your Waymo Ride Is Ready

https://waymo.com/blog/2026/01/miami-your-waymo-ride-is-ready
14•ChrisArchitect•55m ago•3 comments

Ubisoft cancels six games including Prince of Persia and closes studios

https://www.bbc.co.uk/news/articles/c6200g826d2o
58•piqufoh•53m ago•36 comments

ISO PDF spec is getting Brotli – ~20 % smaller documents with no quality loss

https://pdfa.org/want-to-make-your-pdfs-20-smaller-for-free/
89•whizzx•6h ago•38 comments

30 Years of ReactOS

https://reactos.org/blogs/30yrs-of-ros/
151•Mark_Jansen•9h ago•77 comments

Joe Armstrong and Jeremy Ruston – Intertwingling the Tiddlywiki with Erlang [video]

https://www.youtube.com/watch?v=Uv1UfLPK7_Q
21•kerim-ca•2d ago•1 comments

Show HN: Sweep, Open-weights 1.5B model for next-edit autocomplete

https://huggingface.co/sweepai/sweep-next-edit-1.5B
467•williamzeng0•18h ago•91 comments

Doctors in Brazil using tilapia fish skin to treat burn victims

https://www.pbs.org/newshour/health/brazilian-city-uses-tilapia-fish-skin-treat-burn-victims
219•kaycebasques•12h ago•71 comments

Your brain on ChatGPT: Accumulation of cognitive debt when using an AI assistant

https://www.media.mit.edu/publications/your-brain-on-chatgpt/
471•misswaterfairy•18h ago•343 comments

We will ban you and ridicule you in public if you waste our time on crap reports

https://curl.se/.well-known/security.txt
747•latexr•6h ago•454 comments

Show HN: Interactive physics simulations I built while teaching my daughter

https://www.projectlumen.app/
40•anticlickwise•3d ago•4 comments

In Praise of APL (1977)

https://www.jsoftware.com/papers/perlis77.htm
74•tosh•8h ago•41 comments

Douglas Adams on the English–American cultural divide over "heroes"

https://shreevatsa.net/post/douglas-adams-cultural-divide/
244•speckx•3h ago•244 comments

Pragmatic Bitmap Filters in Microsoft SQL Server

https://www.vldb.org/cidrdb/2026/i-cant-believe-its-not-yannakakis-pragmatic-bitmap-filters-in-mi...
4•tanelpoder•5d ago•0 comments

eBay explicitly bans AI "buy for me" agents in user agreement update

https://www.valueaddedresource.net/ebay-bans-ai-agents-updates-arbitration-user-agreement-feb-2026/
258•bdcravens•20h ago•275 comments

Threat actors expand abuse of Microsoft Visual Studio Code

https://www.jamf.com/blog/threat-actors-expand-abuse-of-visual-studio-code/
243•vinnyglennon•17h ago•247 comments

Meet the Alaska Student Arrested for Eating an AI Art Exhibit

https://www.thenation.com/article/society/alaska-student-arrested-eating-ai-art-exhibit/
74•petethomas•3h ago•33 comments

Claude's new constitution

https://www.anthropic.com/news/claude-new-constitution
534•meetpateltech•1d ago•622 comments

Waiting for dawn in search: Search index, Google rulings and impact on Kagi

https://blog.kagi.com/waiting-dawn-search
413•josephwegner•23h ago•228 comments

Downtown Denver's office vacancy rate grows to 38.2%

https://coloradosun.com/2026/01/22/denver-downtown-office-vacancy-rate-tenants-workplace/
7•mooreds•10m ago•2 comments

The Science of Life and Death in Mary Shelley's Frankenstein

https://publicdomainreview.org/essay/the-science-of-life-and-death-in-mary-shelleys-frankenstein/
13•Anon84•5d ago•1 comments

Gathering Linux Syscall Numbers in a C Table

https://t-cadet.github.io/programming-wisdom/#2026-01-17-gathering-linux-syscall-numbers
82•phi-system•5d ago•34 comments
Open in hackernews

GPTZero finds 100 new hallucinations in NeurIPS 2025 accepted papers

https://gptzero.me/news/neurips/
267•segmenta•2h ago

Comments

cogman10•1h ago
Yuck, this is going to really harm scientific research.

There is already a problem with papers falsifying data/samples/etc, LLMs being able to put out plausible papers is just going to make it worse.

On the bright side, maybe this will get the scientific community and science journalists to finally take reproducibility more seriously. I'd love to see future reporting that instead of saying "Research finds amazing chemical x which does y" you see "Researcher reproduces amazing results for chemical x which does y. First discovered by z".

godzillabrennus•1h ago
Have they solved the issue where papers that cite research already invalidated are still being cited?
cogman10•1h ago
AFAIK, no, but I could see there being cause to push citations to also cite the validations. It'd be good if standard practice turned into something like

Paper A, by bob, bill, brad. Validated by Paper B by carol, clare, charlotte.

or

Paper A, by bob, bill, brad. Unvalidated.

gcr•1h ago
Academics typically use citation count and popularity as a rough proxy for validation. It's certainly not perfect, but it is something that people think about. Semantic Scholar in particular is doing great work in this area, making it easy to see who cites who: https://www.semanticscholar.org/

Google Scholar's PDF reader extension turns every hyperlinked citation into a popout card that shows citation counts inline in the PDF: https://chromewebstore.google.com/detail/google-scholar-pdf-...

reliabilityguy•1h ago
Nope.

I am still reviewing papers that propose solutions based on a technique X, conveniently ignoring research from two years ago that shows that X cannot be used on its own. Both the paper I reviewed and the research showing X cannot be used are in the same venue!

b00ty4breakfast•1h ago
does it seem to be legitimate ignorance or maybe folks pushing ahead regardless of x being disproved?
freedomben•1h ago
IMHO, It's mostly ignorance coming a push/drive to "publish or perish." When the stakes are so high and output is so valued, and when reproducability isn't required, it disincentivizes thorough work. The system is set up in a way that is making it fail.

There is also the reality that "one paper" or "one study" can be found contradicted almost anything, so if you just went with "some other paper/study debunks my premise" then you'd end up producing nothing. Plus many inside know that there's a lot of slop out there that gets published, so they can (sometimes reasonably IMHO) dismiss that "one paper" even when they do know about it.

It's (mostly) not fraud or malicious intent or ignorance, it's (mostly) humans existing in the system in which they must live.

reliabilityguy•18m ago
Poor scholarship.

However, given the feedback by other reviewers, I was the only one who knew that X doesn’t work. I am not sure how these people mark themselves as “experts” in the field if they are not following the literature themselves.

f311a•1h ago
For ML/AI/Comp sci articles, providing reproducible code is a great option. Basically, PoC or GTFO.
StableAlkyne•42m ago
The most annoying ones are those which discuss loosely the methodology but then fail to publish the weights or any real algorithms.

It's like buying a piece of furniture from IKEA, except you just get an Allen key, a hint at what parts to buy, and blurry instructions.

j45•1h ago
It will better expose the behaviour of false scientists.
StableAlkyne•1h ago
> I'd love to see future reporting that instead of saying "Research finds amazing chemical x which does y" you see "Researcher reproduces amazing results for chemical x which does y. First discovered by z".

Most people (that I talk to, at least) in science agree that there's a reproducibility crisis. The challenge is there really isn't a good way to incentivize that work.

Fundamentally (unless you're independent wealthy and funding your own work), you have to measure productivity somehow, whether you're at a university, government lab, or the private sector. That turns out to be very hard to do.

If you measure raw number of papers (more common in developing countries and low-tier universities), you incentivize a flood of junk. Some of it is good, but there is such a tidal wave of shit that most people write off your work as a heuristic based on the other people in your cohort.

So, instead it's more common to try to incorporate how "good" a paper is, to reward people with a high quantity of "good" papers. That's quantifying something subjective though, so you might try to use something like citation count as a proxy: if a work is impactful, usually it gets cited a lot. Eventually you may arrive at something like the H-index, which is defined as "The highest number H you can pick, where H is the number of papers you have written with H citations." Now, the trouble with this method is people won't want to "waste" their time on incremental work.

And that's the struggle here; even if we funded and rewarded people for reproducing results, they will always be bumping up the citation count of the original discoverer. But it's worse than that, because literally nobody is going to cite your work. In 10 years, they just see the original paper, a few citing works reproducing it, and to save time they'll just cite the original paper only.

There's clearly a problem with how we incentivize scientific work. And clearly we want to be in a world where people test reproducibility. However, it's very very hard to get there when one's prestige and livelihood is directly tied to discovery rather than reproducibility.

warkdarrior•1h ago
> If you measure raw number of papers (more common in developing countries and low-tier universities), you incentivize a flood of junk.

This is exactly what rewarding replication papers (that reproduce and confirm an existing paper) will lead to.

pixl97•58m ago
And yet if we can't reproduce an existing paper, it's very possible that existing paper is junk itself.

Catch-22 is a fun game to get caught in.

maerF0x0•1h ago
> The challenge is there really isn't a good way to incentivize that work.

What if we got Undergrads (with hope of graduate studies) to do it? Could be a great way to train them on the skills required for research without the pressure of it also being novel?

StableAlkyne•54m ago
Those undergrads still need to be advised and they use lab resources.

If you're a tenure-track academic, your livelihood is much safer from having them try new ideas (that you will be the corresponding author on, increasing your prestige and ability to procure funding) instead of incrementing.

And if you already have tenure, maybe you have the undergrad do just that. But the tenure process heavily filters for ambitious researchers, so it's unlikely this would be a priority.

If instead you did it as coursework, you could get them to maybe reproduce the work, but if you only have the students for a semester, that's not enough time to write up the paper and make it through peer review (which can take months between iterations)

suddenlybananas•39m ago
Unfortunately, that might just lead to a bunch of type II errors instead, if an effect requires very precise experimental conditions that undergrads lack the expertise for.
jimbokun•53m ago
> The challenge is there really isn't a good way to incentivize that work.

Ban publication of any research that hasn't been reproduced.

wpollock•18m ago
> Ban publication of any research that hasn't been reproduced.

Unless it is published, nobody will know about it and thus nobody will try to reproduce it.

gcr•16m ago
lol, how would the first paper carrying some new discovery get published?
poulpy123•47m ago
> I'd love to see future reporting that instead of saying "Research finds amazing chemical x which does y" you see "Researcher reproduces amazing results for chemical x which does y. First discovered by z".

But nobody want to pay for it

geokon•36m ago
usually you reproduce previous research as a byproduct of doing something novel "on top" of the previous result. I dont really see the problem with the current setup.

sometimes you can just do something new and assume the previous result, but thats more the exception. youre almost always going to at least in part reproducr the previous one. and if issues come up, its often evident.

thats why citations work as a good proxy. X number of people have done work based around this finding and nobody has seen a clear problem

gcr•9m ago
It's often quite common to see a citation say "BTW, we weren't able to reproduce X's numbers, but we got fairly close number Y, so Table 1 includes that one next to an asterisk."

The difficult part is surfacing that information to readers of the original paper. The semantic scholar people are beginning to do some work in this area.

gcr•10m ago
I'd personally like to see top conferences grow a "reproducibility" track. Each submission would be a short tech report that chooses some other paper to re-implement. Cap 'em at three pages, have a lightweight review process. Maybe there could be artifacts (git repositories, etc) that accompany each submission.

This would especially help newer grad students learn how to begin to do this sort of research.

Maybe doing enough reproductions could unlock incentives. Like if you do 5 reproductions than the AC would assign your next paper double the reviewers. Or, more invasively, maybe you can't submit to the conference until you complete some reproduction.

agumonkey•37m ago
I think, at least I hope, that a part of the LLM value will be to create their retirement for specific needs. Instead of asking it to solve any problem, restrict the space to a tool that can help you then reach your goal faster without the statistical nature of LLMs.
mike_hearn•31m ago
Reproducibility is overrated and if you could wave a wand to make all papers reproducible tomorrow, it wouldn't fix the problem. It might even make it worse.

https://blog.plan99.net/replication-studies-cant-fix-science...

biophysboy•8m ago
? More samples reduces the variance of a statistic. Obviously it cannot identify systematic bias in a model, or establish causality, or make a "bad" question "good". Its not overrated though -- it would strengthen or weaken the case for many papers.
vld_chk•14m ago
In my mental model, the fundamental problem of reproducibility is that scientists have very hard time to find a penny to fund such research. No one wants to grant “hey I need $1m and 2 years to validate the paper from last year which looks suspicious”.

Until we can change how we fund science on the fundamental level; how we assign grants — it will be indeed very hard problem to deal with.

benob•13m ago
Maybe it will also change the whole publication as evaluation of science.
qwertox•1h ago
It would be great if those scientists who use AI without disclosing it get fucked for life.
direwolf20•1h ago
"scientists" FYI. Making shit up isn't science.
yesitcan•1h ago
One fuck seems appropriate.
oofbey•1h ago
Harsh sentiment. Pretty soon every knowledge worker will use AI every day. Should people disclose spellcheckers powered by AI? Disclosing is not useful. Being careful in how you use it and checking work is what matters.
ambicapter•1h ago
> Should people disclose spellcheckers powered by AI?

Thank you for that perfect example of a strawman argument! No, spellcheckers that use AI is not the main concern behind disclosing the use of AI in generating scientific papers, government reports, or any large block of nonfiction text that you paid for that is supposed to make to sense.

fisf•1h ago
People are accountable for the results they produce using AI. So a scientist is responsible for made up sources in their paper, which is plain fraud.
oofbey•1h ago
I completely agree. But “disclosing the use of AI” doesn’t solve that one bit.
barbazoo•1h ago
I don’t disclose what keyboard I use to write my code or if I applied spellcheck afterward. The result is 100% theirs.
eichin•35m ago
"responsible for made up sources" leads to the hilarious idea that if you cite a paper that doesn't exist, you're now obliged to write that paper (getting it retroactively published might be a challenge though)
Proziam•1h ago
False equivalence. This isn't about "using AI" it's about having an AI pretend to do your job.

What people are pissed about is the fact their tax dollars fund fake research. It's just fraud, pure and simple. And fraud should be punished brutally, especially in these cases, because the long tail of negative effects produces enormous damage.

freedomben•54m ago
I was originally thinking you were being way too harsh with your "punish criminally" take, but I must admit, you're winning me over. I think we would need to be careful to ensure we never (or realistically, very rarely) convict an innocent person, but this is in many cases outright theft/fraud when someone is making money or being "compensated" for producing work that is fraudulent.

For people who think this is too harsh, just remember we aren't talking about undergrads who cheat on a course paper here. We're talking about people who were given money (often from taxpayers) that committed fraud. This is textbook white collar crime, not some kid being lazy. At a minimum we should be taking all that money back from them and barring them from ever receiving grant money again. In some cases I think fines exceeding the money they received would be appropriate.

geremiiah•1h ago
What they are doing is plain cheating the system to get their 3 conference papers so they can get their $150k+ job at FAANG. It's plain cheating with no value.
barbazoo•1h ago
People that cheat with AI now probably found ways to cheat before as well.
shermantanktop•1h ago
Cheating by people in high status positions should get the hammer. But it gets the hand-wringing what-have-we-come-to treatment instead.
WarmWash•1h ago
We are only looking at one side of the equation here, in this whole thread.

This feels a bit like the "LED stoplights shouldn't be used because they don't melt snow" argument.

vimda•1h ago
"Pretty soon every knowledge worker will use AI every day" is a wild statement considering the reporting that most companies deploying AI solutions are seeing little to no benefit, but also, there's a pretty obvious gap between spell checkers and tools that generate large parts of the document for you
PunchyHamster•1h ago
nice job moving the goalpost from "hallucinated the research/data" to "spellchecker error"
duskdozer•1h ago
>Pretty soon every knowledge worker will use AI every day.

Maybe? There's certainly a push to force the perception of inevitability.

Sharlin•28m ago
In general we're pretty good at drawing a line between purely editorial stuff like using a spellchecker, or even the services a professional editor (no need to acknowledge), and independent intellectual contribution (must be acknowledged). There's no slippery slope.
bwfan123•1h ago
> It would be great if those scientists who use AI without disclosing it get fucked for life.

There need to be dis-incentives for sloppy work. There is a tension between quality and quantity in almost every product. Unfortunately academia has become a numbers-game with paper-mills.

pandemic_region•48m ago
Instead of publishing their papers in the prestigious zines - which is what they're after - we will publish them in "AI Slop Weekly" with name and picture. Up the submission risk a bit.
jordanpg•1h ago
If these are so easy to identify, why not just incorporate some kind of screening into the early stages of peer review?
DetectDefect•1h ago
Because real work takes time and effort, and there is no real incentive for it here.
tossandthrow•1h ago
What makes you believe that are easy to identify?
emil-lp•1h ago
One could require DOIs for each reference. That's both realistic to achieve and easy to verify.

Although then why not just cite existing papers for bogus reasons?

direwolf20•1h ago
Wow! They're literally submitting references to papers by Firstname Lastname, John Doe and Jane Smith and nobody is noticing or punishing them.
emil-lp•1h ago
They might (I hope) still be punished after discovery.
an0malous•1h ago
It’s the way of the future
heliumtera•1h ago
Maybe "muh science" was always a fucking joke and the only difference being now we can point to an undeniable proof it is a fucking joke?
azan_•1h ago
Yes, it only led to all advancements in the history of humanity, what a joke!
heliumtera•4m ago
I am sure all advancements in the history of humanity was properly peer reviewed!

Including coca cola and Linux!

Sharlin•48m ago
Aaand "the insane take of the day" award goes to…
sigbottle•17m ago
I'm a feyerabend sympathizer, but even he wouldn't have gone this far.

He was against establishment dogma, not pro-anti intellectualism.

CGMthrowaway•1h ago
Which is worse:

a) p-hacking and suppressing null results

b) hallucinations

c) falsifying data

Would be cool to see an analysis of this

Proziam•1h ago
All 3 of these should be categorized as fraud, and punished criminally.
internetter•1h ago
criminally feels excessive?
Proziam•1h ago
If I steal hundreds of thousands of dollars (salary, plus research grants and other funds) and produce fake output, what do you think is appropriate?

To me, it's no different than stealing a car or tricking an old lady into handing over her fidelity account. You are stealing, and society says stealing is a criminal act.

WarmWash•1h ago
We have a civil court system to handle stuff like this already.
Proziam•1h ago
Stealing more than a few thousand dollars is a felony, and felonies are handled in criminal court, not civil.

EDIT - The threshold amount varies. Sometimes it's as low as a few hundred dollars. However, the point stands on its own, because there's no universe where the sum in question is in misdemeanor territory.

WarmWash•55m ago
It would fall under the domain of contract law, because maybe the contract of the grant doesn't prohibit what the researcher did. The way to determine that would be in court - civil court.

Most institutions aren't very chill with grant money being misused, so we already don't need to burden then state with getting Johnny muncipal prosecutor to try and figure out if gamma crystallization imaging sources were incorrect.

wat10000•1h ago
We also have a criminal court system to handle stuff like this.
WarmWash•54m ago
No we don't. I've never seen a private contract dispute go to criminal court, probably because it's a civil matter.

If they actually committed theft, well then that already is illegal too.

But right now, doing "shitty research" isn't illegal and it's unlikely it ever will be.

wat10000•8m ago
The claim is that this would qualify as fraud, which is also illegal.

If you do a search for "contractor imprisoned for fraud" you'll find plenty of cases where a private contract dispute resulted in criminal convictions for people who took money and then didn't do the work.

I don't know if taking money and then merely pretending to do the research would rise to the level of criminal fraud, but it doesn't seem completely outlandish.

jacquesm•1h ago
You could make a good case for a white collar crime here, fraud for instance.
fulafel•1h ago
Is there a comparison to rate of reference errors in other forums?
dtartarotti•1h ago
It is very concerning that these hallucinations passed through peer review. It's not like peer review is a fool-proof method or anything, but the fact that reviewers did not check all references and noticed clearly bogus ones is alarming and could be a sign that the article authors weren't the only ones using LLMs in the process...
amanaplanacanal•1h ago
Is it common for peer reviewers to check references? Somehow I thought they mostly focused on whether the experiment looked reasonable and the conclusions followed.
emil-lp•1h ago
In journal publications it is, but without DOIs it's difficult.

In conference publications, it's less common.

Conference publications (like NEURips) is treated as announcement of results, not verified.

empiko•1h ago
Nobody in ML or AI is verifying all your references. Reviewers will point out if you miss a super related work, but that's it. This is especially true with the recent (last two decades?) inflation in citation counts. You regularly have papers with 50+ references for all kinds of claims and random semirelated work. The citation culture is really uninspiring.
smallpipe•1h ago
Could you run a similar analysis for pre-2020 papers? It'd be interesting to know how prevalent making up sources was before LLMs.
tasuki•1h ago
Also, it'd be interesting how many pre-2020 papers their "AI detector" marks as AI-generated. I distrust LLMs somewhat, but I distrust AI detectors even more.
theptip•1h ago
Yeah, it’s kind of meaningless to attribute this to AI without measuring the base rate.

It’s for sure plausible that it’s increasing, but I’m certain this kind of thing happened with humans too.

bonsai_spool•1h ago
This suggests that nobody was screening this papers in the first place—so is it actually significant that people are using LLMs in a setting without meaningful oversight?

These clearly aren't being peer-reviewed, so there's no natural check on LLM usage (which is different than what we see in work published in journals).

emil-lp•1h ago
As one who reviews 20+ papers per year, we don't have time to verify each reference.

We verify: is the stuff correct, and is it worthy of publication (in the given venue) given that it is correct.

There is still some trust in the authors to not submit made-up-stuff, albeit it is diminishing.

paulmist•1h ago
I'm surprised the conference doesn't provide tooling to validate all references automatically.
Sharlin•39m ago
How would you do that? Even in cases where there's a standard format, a DOI on every reference, and some giant online library of publication metadata, including everything that only exists in dead tree format, that just lets you check whether the cited work exists, not whether it's actually a relevant thing to cite in the context.
gcr•1h ago
Academic venues don't have enough reviewers. This problem isn't new, and as publication volumes increase, it's getting sharply worse.

Consider the unit economics. Suppose NeurIPS gets 20,000 papers in one year. Suppose each author should expect three good reviews, so area chairs assign five reviewers per paper. In total, 100,000 reviews need to be written. It's a lot of work, even before factoring emergency reviewers in.

NeurIPS is one venue alongside CVPR, [IE]CCV, COLM, ICML, EMNLP, and so on. Not all of these conferences are as large as NeurIPS, but the field is smaller than you'd expect. I'd guess there are 300k-1m people in the world who are qualified to review AI papers.

khuey•1h ago
Seems like using tooling like this to identify papers with fake citations and auto-rejecting them before they ever get in front of a reviewer would kill two birds with one stone.
gcr•1h ago
It's not always possible to distinguish between fake citations and citations that are simply hard to find (e.g. wonderful old books that aren't on the Internet).

Another problem is that conferences move slowly and it's hard to adjust the publication workflow in such an invasive way. CVPR only recently moved from Microsoft's CMT to OpenReview to accept author submissions, for example.

There's a lot of opportunity for innovation in this space, but it's hard when everyone involved would need to agree to switch to a different workflow.

(Not shooting you down. It's just complicated because the people who would benefit are far away from the people who would need to do the work to support it...)

khuey•6m ago
Sure, I agree that it's far from trivial to implement.
alain94040•1h ago
When I was reviewing such papers, I didn't bother checking that 30+ citations were correctly indexed. I focused on the article itself, and maybe 1 or 2 citations that are important. That's it. For most citations, they are next to an argument that I know is correct, so why would I bother checking. What else do you expect? My job was to figure out if the article ideas are novel and interesting, not if they got all their citations right.
geremiiah•1h ago
A lot of research in AI/ML seems to me to be "fake it and never make it". Literally it's all about optics, posturing, connections, publicity. Lots of bullshit and little substance. This was true before AI slop, too. But the fact that AI slop can make it pass the review really showcases how much a paper's acceptance hinges on things, other than the substance and results of the paper.

I even know PIs who got fame and funding based on some research direction that supposedly is going to be revolutionary. Except all they had were preliminary results that from one angle, if you squint, you can envision some good result. But then the result never comes. That's why I say, "fake it, and never make it".

gcr•1h ago
I was getting completely AI-generated reviews for a WACV publication back in 2024. The area chairs are so overworked that authors don't have much recourse, which sucks but is also really hard to handle unless more volunteers step up to the bat to help organize the conference.

(If you're qualified to review papers, please email the program chair of your favorite conference and let them know -- they really need the help!)

As for my review, the review form has a textbox for a summary, a textbox for strengths, a textbox for weaknesses, and a textbox for overall thoughts. The review I received included one complete set of summary/strengths/weaknesses/closing thoughts in the summary text box, another distinct set of summary/strengths/weaknesses/closing thoughts in the strengths, another complete and distinct review in the weaknesses, and a fourth complete review in the closing thoughts. Each of these four reviews were slightly different and contradicted each other.

The reviewer put my paper down as a weak reject, but also said "the pros greatly outweigh the cons."

They listed "innovative use of synthetic data" as a strength, and "reliance on synthetic data" as a weakness.

Tom1380•1h ago
No ETH Zurich, let's go
gcr•1h ago
NeurIPS leadership doesn’t think hallucinated references are necessarily disqualifying; see the full article from Fortune for a statement from them: https://archive.ph/yizHN

> When reached for comment, the NeurIPS board shared the following statement: “The usage of LLMs in papers at AI conferences is rapidly evolving, and NeurIPS is actively monitoring developments. In previous years, we piloted policies regarding the use of LLMs, and in 2025, reviewers were instructed to flag hallucinations. Regarding the findings of this specific work, we emphasize that significantly more effort is required to determine the implications. Even if 1.1% of the papers have one or more incorrect references due to the use of LLMs, the content of the papers themselves are not necessarily invalidated. For example, authors may have given an LLM a partial description of a citation and asked the LLM to produce bibtex (a formatted reference). As always, NeurIPS is committed to evolving the review and authorship process to best ensure scientific rigor and to identify ways that LLMs can be used to enhance author and reviewer capabilities.”

Analemma_•1h ago
Kinda gives the whole game away, doesn’t it? “It doesn’t actually matter if the citations are hallucinated.”

In fairness, NeurIPS is just saying out loud what everyone already knows. Most citations in published science are useless junk: it’s either mutual back-scratching to juice h-index, or it’s the embedded and pointless practice of overcitation, like “Human beings need clean water to survive (Franz, 2002)”.

Really, hallucinated citations are just forcing a reckoning which has been overdue for a while now.

jacquesm•1h ago
There should be a way to drop any kind of circular citation ring from the indexes.
gcr•1h ago
It's tough because some great citations are hard to find/procure still. I sometimes refer to papers that aren't on the Internet (eg. old wonderful books / journals).
jacquesm•59m ago
But that actually strengthens those citiations. The I scratch your back you scratch mine ones are the ones I'm getting at and that is quite hard to do with old and wonderful stuff, the authors there are probably not in a position to reciprocate by virtue of observing the grass from the other side.
gcr•45m ago
I think it's a hard problem. The semanticscholar folks are doing the sort of work that would allow them to track this; I wonder if they've thought about it.

A somewhat-related parable: I once worked in a larger lab with several subteams submitting to the same conference. Sometimes the work we did was related, so we both cited each other's paper which was also under review at the same venue. (These were flavor citations in the "related work" section for completeness, not material to our arguments.) In the review copy, the reference lists the other paper as written by "anonymous (also under review at XXXX2025)," also emphasized by a footnote to explain the situation to reviewers. When it came time to submit the camera-ready copy, we either removed the anonymization or replaced it with an arxiv link if the other team's paper got rejected. :-) I doubt this practice improved either paper's chances of getting accepted.

Are these the sorts of citation rings you're talking about? If authors misrepresented the work as if it were accepted, or pretended it was published last year or something, I'd agree with you, but it's not too uncommon in my area for well-connected authors to cite manuscripts in process. I don't think it's a problem as long as they don't lean on them.

jacquesm•40m ago
No, I'm talking about the ones where the citation itself is almost or even completely irrelevant and used as a way to inflate the citation count of the authors. You could find those by checking whether or not the value as a reference (ie: contributes to the understanding of the paper you are reading) is exceeded by the value of the linkage itself.
fc417fc802•46m ago
> Most citations in published science are useless junk:

Can't say that matches my experience at all. Once I've found a useful paper on a topic thereafter I primarily navigate the literature by traveling up and down the citation graph. It's extremely effective in practice and it's continued to get easier to do as the digitization of metadata has improved over the years.

empath75•1h ago
I think a _single_ instance of an LLM hallucination should be enough to retract the whole paper and ban further submissions.
gcr•1h ago
Going through a retraction and blacklisting process is also a lot of work -- collecting evidence, giving authors a chance to respond and mediate discussion, etc.

Labor is the bottleneck. There aren't enough academics who volunteer to help organize conferences.

(If a reader of this comment is qualified to review papers and wants to step up to the plate and help do some work in this area, please email the program chairs of your favorite conference and let them know. They'll eagerly put you to work.)

pessimizer•1h ago
That's exactly why the inclusion of a hallucinated reference is actually a blessing. Instead going back and forth with the fraudster, just tell them to find the paper. If they can't, case closed. Massive amount of time and money saved.
gcr•1h ago
Isn't telling them to find the paper just "going back and forth with a fraudster"?

One "simple" way of doing this would be to automate it. Have authors step through a lint step when their camera-ready paper is uploaded. Authors would be asked to confirm each reference and link it to a google scholar citation. Maybe the easy references could be auto-populated. Non-public references could be resolved by uploading a signed statement or something.

There's no current way of using this metadata, but it could be nice for future systems.

Even the Scholar team within Google is woefully understaffed.

My gut tells me that it's probably more efficient to just drag authors who do this into some public execution or twitter mob after-the-fact. CVPR does this every so often for authors who submit the same paper to multiple venues. You don't need a lot of samples for deterrence to take effect. That's kind of what this article is doing, in a sense.

wing-_-nuts•1h ago
I dunno about banning them, humans without LLMs make mistakes all the time, but I would definitely place them under much harder scrutiny in the future.
pessimizer•1h ago
Hallucinations aren't mistakes, they're fabrications. The two are probably referred to by the same word in some languages.

Institutions can choose an arbitrary approach to mistakes; maybe they don't mind a lot of them because they want to take risks and be on the bleeding edge. But any flexible attitude towards fabrications is simply corruption. The connected in-crowd will get mercy and the outgroup will get the hammer. Anybody criticizing the differential treatment will be accused of supporting the outgroup fraudsters.

gcr•57m ago
Fabrications carry intent to decieve. I don't think hallucinations necessarily do. If anything, they're a matter of negligence, not deception.

Think of it this way: if I wanted to commit pure academic fraud maliciously, I wouldn't make up a fake reference. Instead, I'd find an existing related paper and merely misrepresent it to support my own claims. That way, the deception is much harder to discover and I'd have plausible deniability -- "oh I just misunderstood what they were saying."

I think most academic fraud happens in the figures, not the citations. Researchers are more likely to to be successful at making up data points than making up references because it's impossible to know without the data files.

andy99•1h ago

   For example, authors may have given an LLM a partial description of a citation and asked the LLM to produce bibtex
This is equivalent to a typo. I’d like to know which “hallucinations” are completely made up, and which have a corresponding paper but contain some error in how it’s cited. The latter I don’t think matters.
burkaman•42m ago
If you click on the article you can see a full list of the hallucinations they found. They did put in the effort to look for plausible partial matches, but most of them are some variation of "No author or title match. Doesn't exist in publication."

Here's a random one I picked as an example.

Paper: https://openreview.net/pdf?id=IiEtQPGVyV

Reference: Asma Issa, George Mohler, and John Johnson. Paraphrase identification using deep contextual- ized representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 517–526, 2018.

Asma Issa and John Johnson don't appear to exist. George Mohler does, but it doesn't look like he works in this area (https://www.georgemohler.com/). No paper with that title exists. There are some with sort of similar titles (https://arxiv.org/html/2212.06933v2 for example), but none that really make sense as a citation in this context. EMNLP 2018 exists (https://aclanthology.org/D18-1.pdf), but that page range is not a single paper. There are papers in there that contain the phrases "paraphrase identification" and "deep contextualized representations", so you can see how an LLM might have come up with this title.

jklinger410•1h ago
> the content of the papers themselves are not necessarily invalidated. For example, authors may have given an LLM a partial description of a citation and asked the LLM to produce bibtex (a formatted reference)

Maybe I'm overreacting, but this feels like an insanely biased response. They found the one potentially innocuous reason and latched onto that as a way to hand-wave the entire problem away.

Science already had a reproducibility problem, and it now has a hallucination problem. Considering the massive influence the private sector has on the both the work and the institutions themselves, the future of open science is looking bleak.

paulmist•1h ago
Isn't disqualifying X months of potentially great research due to a misformed, but existing reference harsh? I don't think they'd be okay with references that are actually made up.
suddenlybananas•37m ago
It's a sign of dishonesty, not a perfect one, but an indicator.
orbital-decay•58m ago
The wording is not hand-wavy. They said "not necessarily invalidated", which could mean that innocuous reason and nothing extra.
derf_•1h ago
This will continue to happen as long as it is effectively unpunished. Even retracting the paper would do little good, as odds are it would not have been written if the author could not have used an LLM, so they are no worse off for having tried. Scientific publications are mostly a numbers game at this point. It is just one more example of a situation where behaving badly is much cheaper than policing bad behavior, and until incentives are changed to account for that, it will only get worse.
Aurornis•1h ago
> Even if 1.1% of the papers have one or more incorrect references due to the use of LLMs, the content of the papers themselves are not necessarily invalidated.

This statement isn’t wrong, as the rest of the paper could still be correct.

However, when I see a blatant falsification somewhere in a paper I’m immediately suspicious of everything else. Authors who take lazy shortcuts when convenient usually don’t just do it once, they do it wherever they think they can get away with it. It’s a slippery slope from letting an LLM handle citations to letting the LLM write things for you to letting the LLM interpret the data. The latter opens the door to hallucinated results and statistics, as anyone who has experimented with LLMs for data analysis will discover eventually.

mlmonkey•31m ago
Why not run every submitted paper through GPTZero (before sending to reviewers) and summarily reject any paper with a hallucination?
gcr•19m ago
That's how GPTZero wants to situate themselves.

Who would pay them? Conference organizers are already unpaid and undestaffed, and most conferences aren't profitable.

I think rejections shouldn't be automatic. Sometimes there are just typos. Sometimes authors don't understand BibTeX. This needs to be done in a way that reduces the workload for reviewers.

One way of doing this would be for GPTZero to annotate each paper during the review step. If reviewers could review a version of each paper with yellow-highlighted "likely-hallucinated" references in the bibliography, then they'd bring it up in their review and they'd know to be on their guard for other probably LLM-isms. If there's only a couple likely typos in the references, then reviewers could understand that, and if they care about it, they'd bring it up in their reviews and the author would have the usual opportunity to rebut.

I don't know if GPTZero is willing to provide this service "for free" to the academic community, but if they are, it's probably worth bringing up at the next PAMI-TC meeting for CVPR.

Molitor5901•1h ago
AI might just extinguish the entire paradigm of publish or perish. The sheer volume of papers makes it nearly impossible to properly decide which papers have merit, which are non-replicate and suspect, and which are just a desperate rush to publish. The entire practice needs to end.
shermantanktop•1h ago
But how could we possibly evaluate faculty and researcher quality without counting widgets on an assembly line? /s

It’s a problem. The previous regime prior to publishing-mania was essentially a clubby game of reputation amongst peers based on cocktail party socialization.

The publication metrics came out of the harder sciences, I believe, and then spread to the softest of humanities. It was always easy to game a bit if you wanted to try, but now it’s trivial to defeat.

SJC_Hacker•18m ago
Its not publish or perish so much as get grant money or perish.

Publishing is just the way to get grants.

A PI explained it to me once, something like this

Idea(s) -> Grant -> Experiments -> Data -> Paper(s) -> Publication(s) -> Idea(s) -> Grant(s)

Thats the current cycle ... remove any step and its a dead end

TAULIC15•1h ago
OHHH IS GOOD
armcat•1h ago
This is awful but hardly surprising. Someone mentioned reproducible code with the papers - but there is a high likelihood of the code being partially or fully AI generated as well. I.e. AI generated hypothesis -> AI produces code to implement and execute the hypothesis -> AI generates paper based on the hypothesis and the code.

Also: there were 15 000 submissions that were rejected at NeurIPS; it would be very interesting to see what % of those rejected were partially or fully AI generated/hallucinated. Are the ratios comperable?

blackbear_•1h ago
Whether the code is AI generated or not is not important, what matters is that it really works.

Sharing code enables others to validate the method on a different dataset.

Even before LLMs came around there were lots of methods that looked good on paper but turned out not to work outside of accepted benchmarks

depressionalt•1h ago
This is nice and all, but what repercussion does GPTZero get when their bullshit AI detection hallucinates a student using AI? And when that student receives academic discipline because of it?

Many such cases of this. More than 100!

They claim to have custom detection for GPT-5, Gemini, and Claude. They're making that up!

freedomben•1h ago
Indeed. My son has been accused by bullshit AI detection as having used AI, and it has devastated his work quality. After being "disciplined" for using AI (when he didn't), he now intentionally tries to "dumb down" his writing so that it doesn't sound so much like AI. The result is he writes much worse. What a shitty, shitty outcome. I've even found myself leaving typos and things in (even on sites like HN) because if you write too well, inevitably some comment replier will call you out as being an LLM even when you aren't. I'm as annoyed by the LLM posts as everybody else, but the answer surely is not to dumb us down into Idiocracy.
Sharlin•36m ago
It's almost as if this whole LLM stuff wasn't a net benefit to the society after all.
theptip•1h ago
This is mostly an ad for their product. But I bet you can get pretty good results with a Claude Code agent using a couple simple skills.

Should be extremely easy for AI to successfully detect hallucinated references as they are semi-structured data with an easily verifiable ground truth.

leggerss•1h ago
I don't understand: why aren't there automated tools to verify citations' existence? The data for a citation has a structured styling (APA, MLA, Chicago) and paper metadata is available via e.g. a web search, even if the paper contents are not

I guess GPTZero has such a tool. I'm confused why it isn't used more widely by paper authors and reviewers

gh02t•1h ago
Citations are too open ended and prone to variation, and legitimate minor mistskes that wouldn't bother a human verifier but would break automated tools to easily verify in their current form. DOI was supposed to solve some of the literal mechanical variation of the existence of a source, but journal paywalls and limited adoption mean that is not a universal solution. Plus DOI still doesn't easily verify the factual accuracy of a citation, like "does the source say what the citation says it does," which is the most important part.

In my experience you will see considerable variation in citation formats, even in journals that strictly define it and require using BibTex. And lots of journals leave their citation format rules very vague. Its a problem that runs deep.

eichin•41m ago
Looks like GPTZero Source Finder was only released a year ago - if anything, I'm surprised slop-writers aren't using it preemptively, since they're "ahead of the curve" relative to reviewers on this sort of thing...
yepyeaisntityea•1h ago
No surprises. Machine learning has, at least since 2012, been the go-to field for scammers and grifters. Machine learning, and technology in general, is basically a few real ideas, a small number of honest hard workers, and then millions of fad chasers and scammers.
mt_•1h ago
It would be ironic if the very detection of hallucinations contained hallucinations of its own.
doug_durham•52m ago
Getting papers published is now more about embellishing your CV versus a sincere desire to present new research. I see this everywhere at every level. Getting a paper published anywhere is a checkbox in completing your resume. As an industry we need to stop taking this into consideration when reviewing candidates or deciding pay. In some sense it has become an anti-signal.
londons_explore•8m ago
I'd like to see a financial approach to deciding pay by giving researchers a small and perhaps nonlinear or time bounded share of any profits that arise from their research.

Then peoples CV's could say "My inventions have led to $1M in licensing revenue" rather than "I presented a useless idea at a decent conference because I managed to make it sound exciting enough to get accepted".

nerdjon•51m ago
The downstream effects of this are extremely concerning. We have already seen the damage caused by human written research that was later retracted like the “research” on vaccines causing autism.

As we get more and more papers that may be citing information that was originally hallucinated in the first place we have a major reliability issue here. What is worse is people that did not use AI in the first place will be caught in the crosshairs since they will be referencing incorrect information.

There needs to be a serious amount of education done on what these tools can and cannot do and importantly where they fail. Too many people see these tools as magic since that is what the big companies are pushing them as.

Other than that we need to put in actual repercussions for publishing work created by an LLM without validating it (or just say you can’t in the first place but I guess that ship has sailed) or it will just keep happening. We can’t just ignore it and hope it won’t be a problem.

And yes, humans can make mistakes too. The difference is accountability and the ability to actually be unsure about something so you question yourself to validate.

pandemic_region•50m ago
What if they would only accept handwritten papers? Basically the current system is beyond repair, so may as well go back to receiving 20 decent papers instead of 20k hallucinated ones.
ctoth•47m ago
How you know it's really real is that they clearly tell the FPR, and compare against a pre-llm baseline.

But I saw it in Apple News, so MISSION ACCOMPLISHED!

yobbo•47m ago
As long as these sorts of papers serve more important purposes for the careers of the authors than anything related to science or discovery of knowledge, then of course this happens and continues.

The best possible outcome is that these two purposes are disconflated, with follow-on consequences for the conferences and journals.

poulpy123•46m ago
All papers proved to have used a LLM beyond writing improvement should be automatically retracted
brador•42m ago
The problem isn’t scale.

The problem is consequences (lack of).

Doing this should get you barred from research. It won’t.

CrzyLngPwd•40m ago
This is not the AI future we dreamed of, or feared.
nospice•37m ago
We've been talking about a "crisis of reproducibility" for years and the incentive to crank out high volumes of low-quality research. We now have a tool that brings down the cost of producing plausibly-looking research down to zero. So of course we're going to see that tool abused on a galactic scale.

But here's the thing: let's say you're an university or a research institution that wants to curtail it. You catch someone producing LLM slop, and you confirm it by analyzing their work and conducting internal interviews. You fire them. The fired researcher goes public saying that they were doing nothing of the sort and that this is a witch hunt. Their blog post makes it to the front page of HN, garnering tons of sympathy and prompting many angry calls to their ex-employer. It gets picked up by some mainstream outlets, too. It happened a bunch of times.

In contrast, there are basically no consequences to institutions that let it slide. No one is angrily calling the employers of the authors of these 100 NeurIPS papers, right? If anything, there's the plausible deniability of "oh, I only asked ChatGPT to reformat the citations, the rest of the paper is 100% legit, my bad".

meindnoch•35m ago
Jamie, bring up their nationalities.
neom•18m ago
I wrote before about my embarrassing time with ChatGPT during a period (https://news.ycombinator.com/item?id=44767601) - I decided to go back through those old 4o chats with 5.2 pro extended thinking, the reply was pretty funny because it first slightly ridiculed me, heh - but what it showed was: basically I would say "what 5 research papers from any area of science talk to these ideas" and it would find 1 and invent 4 if it didn't know 4 others, and not tell me, and then I'd keep working with it and it would invent what it thought might be in the papers long the way, making up new papers in it's own work to cite to make it's own work valid, lol. Anyway, I'm a moron, sure, and no real harm came of it for me, just still slightly shook I let that happen to me.
londons_explore•14m ago
And this is the tip of the iceberg, because these are the easy to check/validate things.

I'm sure plenty of more nuanced facts are also entirely without basis.

techIA•10m ago
They will turn it into a party drug.