Show HN: Scholar Sidekick – citation verifier for the "real DOI, wrong paper"

https://scholar-sidekick.com/tools/citation-verifier

1•ProductivePhys•42m ago

One of the harder AI citation failures is quite simple: the identifier is real, but the citation is still fake. The DOI resolves, but to a different paper - not the paper the citation claims it is.

Topaz et al. reported their findings on citation hallucination in May in The Lancet. They scanned 2.5 million PubMed Central articles and estimated that 1 in 277 contained a fabricated citation. Some of their examples were this exact pattern: real identifier, fabricated title.

I originally built Scholar Sidekick as a formatter for my own use as a clinician-educator preparing talks, articles, etc. After reading the Topaz paper, I added a verifier to catch the most common pattern they found: a real identifier attached to the wrong paper.

My tool resolves the identifier, and then compares the title in your reference with the returned metadata (i.e. does this DOI, PMID, or arXiv ID actually point to the right paper?). It does not attempt to judge whether the cited paper actually supports the claim you make in your text. That still needs judgment, preferably human judgment.

I ran 350 previously unseen citations through the API once each in a test. It correctly identified all 37 fabricated references, but wrongly flagged 5 of 285 real references: 1.8% (95% CI 0.8–4.0%). (Plain similarity comparison, without the optional LLM screening - I would expect the LLM to rescue some of those borderline cases. A handful of citations returned no result on upstream timeouts and weren't scorable either way.) The test suite, results and failures are public, so you do not have to take my word for it. You can check them yourself.

The web version is free and anonymous. The REST API and MCP server use a RapidAPI key, with a free rate-limited tier and paid tiers above that. The MCP server is on npm, Smithery and Glama, and the Obsidian plugin is in the community store. Chrome/Firefox/Edge browser extensions in their stores as well.

I'm very open to feedback and look forward to hearing from anyone who tries it - what works? What fails? Thanks in advance.

Comments

ProductivePhys•35m ago

Direct link to the verifier itself: https://scholar-sidekick.com/tools/citation-verifier - I accidentally pointed the post at the homepage.

---

A few more points that didn't quite fit in my main post:

My citation verifier is not a wrapper around a language model. It is deterministic. It takes identifier(s), looks them up in authoritative lists (Crossref, NCBI eutils, DataCite, arXiv, ADS, WHO IRIS), and then compares their associated title and author(s) to yours.

I do normalise tricky things: html markup, unicode characters, punctuation, different cases, stop words etc. Then, a similarity score is calculated using token overlap and edit distance. This is harder than it looks! The biggest difficulty was determining reasonable thresholds. Too sensitive and you will flag legitimate variations; too loose and you will fail to catch fabrication. I used the validation fixture to tune this but am deliberately publishing the confidence level it produces rather than claiming a hard pass/fail binary.

The verifier actually performed less well the first time that I did a blind eval; with 5.3% of real citations flagging as mismatches. The problem was extremely simple - I hadn't allowed for author names recorded with initials first. After I fixed that, drew a new citation set, (so it couldn't have been tuned to that test set) and re-ran; this is the result published above which flags 1.8% as false positives. I've published both runs and the receipts, not just the latter.

The web SaaS addresses one of the two potential problems with citation verification: 'Real DOI but wrong title' can be mechanically checked against the underlying system. 'Real article but doesn't support claim' is far harder. To address that requires reading the claim and the paper. I'm deliberately not trying to solve that problem. The furthest automation can easily go at that level appears to be something like: 'the abstract to the cited article appears to not contain any of the concepts contained in the claim'. Sometimes useful, but easy to overstate.

The web SaaS is closed source; due to ongoing hosting and service costs which the anonymous free tier subsidises.

Yes, I am aware there are other tools that solve different problems: retraction watch for withdrawn papers; unpaywall for open-access; Scite for context analysis of citations. However, none directly answer what Topaz et al. Identified as the most common pattern of fabrication: "Is this citation real and correctly attributable to this identifier?"

Areas for ongoing work: the edge-cases will be addressed, and the validation corpus expanded. Later; possibly a streaming / batch verifier for large reference lists, or a conservative semantic-layer flag based on abstract-vs-claim concept overlap. Both of those carry significant risks of over-promising, particularly the last.

Keen to hear thoughts on the project.

Databow: a Rust CLI for any database with an ADBC driver

Pluto.jl 1.0 release – reactive notebook for Julia

The Mathematics of Multi-Tenancy

Normetrics: A unified API for norm-based linear models (white paper)

Why China got rich and India didn't

Audio software to increase focus (EEG)

We're going to put Codex inside ChatGPT

Show HN: Junco, turn newsletters into short audio episodes

Generating Random Factored Numbers, Easily [pdf]

DeepSeek-V4-Flash (official FP8) running across 2x DGX Spark

FBI charges two NIH researchers with smuggling monkeypox to US from Congo

Python The Good Stuff: Humble Book Bundle

Use your Nvidia GPU's VRAM as swap space on Linux

FullPAC files S-1 [pdf]

Always Be Blaming: how Git blame answers the wrong question

Show HN: Reloops – Open-Source Frame.io Alternative for AI Agents and Teams

Show HN: Ordinary and Ordinaryd v0.6.0

Feds failing in bid to take a supercomputer from a climate research center

I hadn't coded in 30 years. Then I built a space game with Godot

AI enthusiasts are in race against time, AI skeptics are in race against entropy

Eupago for Python – The First Python SDK for Portugal's MB Way/Multibanco

This creepy blob robot will keep going even if you break its legs

Law Professors Prefer AI over Peer Answers [pdf]

Titan Network claims 5% of Asia's AI data market using crowdsourced home devices

Paseo – Beautiful open-source coding agent interface (desktop, mobile, CLI)

The Empty Field That Wasn't: GPS, OTAD and Two Decades of Encrypted Broadcasts

WinUtils: Shell-powered CLI tools for Windows 95

We tore down our no-code site and went back to code

ContextWall – Context firewall for AI agents and RAG pipelines

Show HN: Scholar Sidekick – citation verifier for the "real DOI, wrong paper"

Databow: a Rust CLI for any database with an ADBC driver

Pluto.jl 1.0 release – reactive notebook for Julia

The Mathematics of Multi-Tenancy

Normetrics: A unified API for norm-based linear models (white paper)

Why China got rich and India didn't

Audio software to increase focus (EEG)

We're going to put Codex inside ChatGPT

Show HN: Junco, turn newsletters into short audio episodes

Generating Random Factored Numbers, Easily [pdf]

DeepSeek-V4-Flash (official FP8) running across 2x DGX Spark

FBI charges two NIH researchers with smuggling monkeypox to US from Congo

Python The Good Stuff: Humble Book Bundle

Use your Nvidia GPU's VRAM as swap space on Linux

FullPAC files S-1 [pdf]

Always Be Blaming: how Git blame answers the wrong question

Show HN: Reloops – Open-Source Frame.io Alternative for AI Agents and Teams

Show HN: Ordinary and Ordinaryd v0.6.0

Feds failing in bid to take a supercomputer from a climate research center

I hadn't coded in 30 years. Then I built a space game with Godot

AI enthusiasts are in race against time, AI skeptics are in race against entropy

Eupago for Python – The First Python SDK for Portugal's MB Way/Multibanco

This creepy blob robot will keep going even if you break its legs

Law Professors Prefer AI over Peer Answers [pdf]

Titan Network claims 5% of Asia's AI data market using crowdsourced home devices

Paseo – Beautiful open-source coding agent interface (desktop, mobile, CLI)

The Empty Field That Wasn't: GPS, OTAD and Two Decades of Encrypted Broadcasts

WinUtils: Shell-powered CLI tools for Windows 95

We tore down our no-code site and went back to code

ContextWall – Context firewall for AI agents and RAG pipelines

Show HN: Scholar Sidekick – citation verifier for the "real DOI, wrong paper"

Show HN: Scholar Sidekick – citation verifier for the "real DOI, wrong paper"

Comments