frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Databow: a Rust CLI for any database with an ADBC driver

https://columnar.tech/blog/introducing-databow//
1•hckshr•1m ago•0 comments

Pluto.jl 1.0 release – reactive notebook for Julia

https://discourse.julialang.org/t/pluto-1-0-release/137296
1•fons-p•3m ago•0 comments

The Mathematics of Multi-Tenancy

https://www.bitsxpages.com/p/the-mathematics-of-multi-tenancy
1•birdculture•8m ago•0 comments

Normetrics: A unified API for norm-based linear models (white paper)

https://github.com/PPenelle/-NORMETRICS-
1•ppenelle•9m ago•0 comments

Why China got rich and India didn't

https://davidoks.blog/p/why-china-got-rich-and-india-didnt
2•rochansinha•9m ago•0 comments

Audio software to increase focus (EEG)

https://www.hosaka.fi/
2•cslr•10m ago•1 comments

We're going to put Codex inside ChatGPT

https://openai.com/business/intelligence-at-work/
1•marcuschong•10m ago•1 comments

Show HN: Junco, turn newsletters into short audio episodes

https://www.tryjunco.com/
3•alex-onecard•12m ago•2 comments

Generating Random Factored Numbers, Easily [pdf]

https://link.springer.com/content/pdf/10.1007/s00145-003-0051-5.pdf
1•luu•12m ago•0 comments

DeepSeek-V4-Flash (official FP8) running across 2x DGX Spark

https://forums.developer.nvidia.com/t/deepseek-v4-flash-official-fp8-running-across-2x-dgx-spark-...
1•pilooch•13m ago•0 comments

FBI charges two NIH researchers with smuggling monkeypox to US from Congo

https://www.justice.gov/usao-edmi/pr/feds-charge-foreign-nationals-working-national-institutes-he...
3•delichon•13m ago•0 comments

Python The Good Stuff: Humble Book Bundle

https://www.humblebundle.com/books/python-good-stuff-no-starch-books
2•teleforce•14m ago•0 comments

Use your Nvidia GPU's VRAM as swap space on Linux

https://github.com/c0dejedi/nbd-vram
5•tanelpoder•16m ago•0 comments

FullPAC files S-1 [pdf]

https://d1io3yog0oux5.cloudfront.net/gotv/sec/0001493152-26-026911/0001493152-26-026911.pdf
1•naryJane•18m ago•1 comments

Always Be Blaming: how Git blame answers the wrong question

https://matklad.github.io/2026/05/18/always-be-blaming.html
1•pgedge_postgres•22m ago•0 comments

Show HN: Reloops – Open-Source Frame.io Alternative for AI Agents and Teams

https://github.com/Reloops-App/reloops/
1•dheerajbhatia27•22m ago•0 comments

Show HN: Ordinary and Ordinaryd v0.6.0

https://codeberg.org/ordinarylabs/Ordinary/src/branch/main/docs/quick-start.md
1•seanwatters•23m ago•0 comments

Feds failing in bid to take a supercomputer from a climate research center

https://arstechnica.com/science/2026/06/judge-blocks-part-of-trump-admins-effort-to-hurt-colorado...
2•yodon•25m ago•0 comments

I hadn't coded in 30 years. Then I built a space game with Godot

2•CosmicGoldRush•26m ago•0 comments

AI enthusiasts are in race against time, AI skeptics are in race against entropy

https://charitydotwtf.substack.com/p/ai-enthusiasts-are-in-a-race-against
1•wapasta•26m ago•0 comments

Eupago for Python – The First Python SDK for Portugal's MB Way/Multibanco

https://github.com/bilouro/eupago-python
1•bilouro•27m ago•0 comments

This creepy blob robot will keep going even if you break its legs

https://www.popsci.com/technology/unstoppable-blob-robot/
1•mhb•33m ago•1 comments

Law Professors Prefer AI over Peer Answers [pdf]

https://law.stanford.edu/wp-content/uploads/2026/06/salinas_et_al.pdf
1•droidjj•34m ago•0 comments

Titan Network claims 5% of Asia's AI data market using crowdsourced home devices

https://www.coindesk.com/tech/2026/06/02/here-s-how-one-decentralized-cloud-provider-says-private...
1•Reaktornano•36m ago•1 comments

Paseo – Beautiful open-source coding agent interface (desktop, mobile, CLI)

https://github.com/getpaseo/paseo
5•timhigins•37m ago•1 comments

The Empty Field That Wasn't: GPS, OTAD and Two Decades of Encrypted Broadcasts

https://lsc-pagepro.mydigitalpublication.com/publication/?i=865273&p=62&view=issueViewer
1•ahlCVA•39m ago•0 comments

WinUtils: Shell-powered CLI tools for Windows 95

https://www.codenaked.com/winutils
3•code_naked•39m ago•1 comments

We tore down our no-code site and went back to code

https://twitter.com/chrismuccioli/status/2061909833893257389
5•nadis•40m ago•1 comments

ContextWall – Context firewall for AI agents and RAG pipelines

https://contextwall.io/
2•sumeshpk•40m ago•0 comments

Show HN: Scholar Sidekick – citation verifier for the "real DOI, wrong paper"

https://scholar-sidekick.com/tools/citation-verifier
1•ProductivePhys•42m ago•1 comments
Open in hackernews

Show HN: Scholar Sidekick – citation verifier for the "real DOI, wrong paper"

https://scholar-sidekick.com/tools/citation-verifier
1•ProductivePhys•42m ago
One of the harder AI citation failures is quite simple: the identifier is real, but the citation is still fake. The DOI resolves, but to a different paper - not the paper the citation claims it is.

Topaz et al. reported their findings on citation hallucination in May in The Lancet. They scanned 2.5 million PubMed Central articles and estimated that 1 in 277 contained a fabricated citation. Some of their examples were this exact pattern: real identifier, fabricated title.

I originally built Scholar Sidekick as a formatter for my own use as a clinician-educator preparing talks, articles, etc. After reading the Topaz paper, I added a verifier to catch the most common pattern they found: a real identifier attached to the wrong paper.

My tool resolves the identifier, and then compares the title in your reference with the returned metadata (i.e. does this DOI, PMID, or arXiv ID actually point to the right paper?). It does not attempt to judge whether the cited paper actually supports the claim you make in your text. That still needs judgment, preferably human judgment.

I ran 350 previously unseen citations through the API once each in a test. It correctly identified all 37 fabricated references, but wrongly flagged 5 of 285 real references: 1.8% (95% CI 0.8–4.0%). (Plain similarity comparison, without the optional LLM screening - I would expect the LLM to rescue some of those borderline cases. A handful of citations returned no result on upstream timeouts and weren't scorable either way.) The test suite, results and failures are public, so you do not have to take my word for it. You can check them yourself.

The web version is free and anonymous. The REST API and MCP server use a RapidAPI key, with a free rate-limited tier and paid tiers above that. The MCP server is on npm, Smithery and Glama, and the Obsidian plugin is in the community store. Chrome/Firefox/Edge browser extensions in their stores as well.

I'm very open to feedback and look forward to hearing from anyone who tries it - what works? What fails? Thanks in advance.

Comments

ProductivePhys•35m ago
Direct link to the verifier itself: https://scholar-sidekick.com/tools/citation-verifier - I accidentally pointed the post at the homepage.

---

A few more points that didn't quite fit in my main post:

My citation verifier is not a wrapper around a language model. It is deterministic. It takes identifier(s), looks them up in authoritative lists (Crossref, NCBI eutils, DataCite, arXiv, ADS, WHO IRIS), and then compares their associated title and author(s) to yours.

I do normalise tricky things: html markup, unicode characters, punctuation, different cases, stop words etc. Then, a similarity score is calculated using token overlap and edit distance. This is harder than it looks! The biggest difficulty was determining reasonable thresholds. Too sensitive and you will flag legitimate variations; too loose and you will fail to catch fabrication. I used the validation fixture to tune this but am deliberately publishing the confidence level it produces rather than claiming a hard pass/fail binary.

The verifier actually performed less well the first time that I did a blind eval; with 5.3% of real citations flagging as mismatches. The problem was extremely simple - I hadn't allowed for author names recorded with initials first. After I fixed that, drew a new citation set, (so it couldn't have been tuned to that test set) and re-ran; this is the result published above which flags 1.8% as false positives. I've published both runs and the receipts, not just the latter.

The web SaaS addresses one of the two potential problems with citation verification: 'Real DOI but wrong title' can be mechanically checked against the underlying system. 'Real article but doesn't support claim' is far harder. To address that requires reading the claim and the paper. I'm deliberately not trying to solve that problem. The furthest automation can easily go at that level appears to be something like: 'the abstract to the cited article appears to not contain any of the concepts contained in the claim'. Sometimes useful, but easy to overstate.

The web SaaS is closed source; due to ongoing hosting and service costs which the anonymous free tier subsidises.

Yes, I am aware there are other tools that solve different problems: retraction watch for withdrawn papers; unpaywall for open-access; Scite for context analysis of citations. However, none directly answer what Topaz et al. Identified as the most common pattern of fabrication: "Is this citation real and correctly attributable to this identifier?"

Areas for ongoing work: the edge-cases will be addressed, and the validation corpus expanded. Later; possibly a streaming / batch verifier for large reference lists, or a conservative semantic-layer flag based on abstract-vs-claim concept overlap. Both of those carry significant risks of over-promising, particularly the last.

Keen to hear thoughts on the project.