frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

LaTeXpOsEd: A Systematic Analysis of Information Leakage in Preprint Archives

https://arxiv.org/abs/2510.03761
71•oldfuture•3mo ago

Comments

SiempreViernes•3mo ago
As far as I can tell they trawled a big archive for sensitive information, (unsurprisingly) found some, and then didn't try to contact anyone affected before telling the world "hey, there are login credentials to be found in here".
crote•3mo ago
Don't forget giving it a fancy name in the hope that it'll go viral!

I am getting so tired of every vulnerability getting a cutesy pet name trying to pretend being the new Heartbleed / Spectre / Meltdown...

wongarsu•3mo ago
Beats having to remember and communicate CVE numbers
KeplerBoy•3mo ago
It's not like every datapoint comes with the email of the corresponding author.
mseri•3mo ago
Google has a great aid to reduce the attack surface: https://github.com/google-research/arxiv-latex-cleaner
Y_Y•3mo ago
I use this before submission and recommend others do too. If ai was in charge of arXiv Id have it integrated as an optional part of the submission process.
Jaxan•3mo ago
I do this by hand. In the whole process, cleaning up the TeX before submission is a small step. And I like to keep some comments, like explaining how some Tikz figures are made. Might help someone some day.
andrepd•3mo ago
Yes but then we wouldn't have https://xcancel.com/LeaksPh
JohnKemeny•3mo ago
Most people I know simply use `latexpand` which "flattens" all files into one tex-file and by default removes all comments.
barthelomew•3mo ago
Paper LaTeX files often contain surprising details. When a paper lacks code, looking at latex source has become a part of my reproduction workflow. The comments often reveal non-trivial insights. Often, they reveal a simpler version of the methodology section (which for poor "novelty" purposes is purposely obscured via mathematical jargon).
seg_lol•3mo ago
Reading the LaTex equations also makes for easier (llm) translation into code rather than trying to read the pdf.
kmm•3mo ago
I sort of understand the reasoning on why Arxiv prefers tex to pdf[1], even though I feel it's a bit much to make it mandatory to submit the original tex file if they detect a submitted pdf was produced from one. But I've never understood what the added value is in hosting the source publicly.

Though I have to admit, when I was still in academia, whenever I saw a beautiful figure or formatting in a preprint, I'd often try to take some inspiration from the source for my own work, occasionally learning a new neat trick or package.

1: https://info.arxiv.org/help/faq/whytex.html

irowe•3mo ago
A huge value in having authors upload the original source, is it divorces the content from the presentation (mostly). That the original sources were available was sufficient for a large majority of the corpus to be automatically rendered into HTML for easier reading on many devices: https://info.arxiv.org/about/accessible_HTML.html. I don't think it would have been as simple if they had to convert PDFs.
cozzyd•3mo ago
This is why my forarxiv.tex make targets always include a call to latexpand --empty-comments

Though I doubt all my collaborators do something similar.

sneela•3mo ago
I agree with other comments that this research treads a fine, unethical line. Did the authors responsibly disclose this, as is often done in the security research community? I cannot find any mention of it in the paper. The researchers seem to be involved in security-related research (first author is doing a PhD, last author holds a PhD).

At least arxiv could have run the cleaner [1] before the print of this pre-print (lol). If there was no disclosure, then I think this pre-print becomes unethical to put up.

> leading to the identification of nearly 1,200 images containing sensitive metadata. The types of data represented vary significantly. While device information (e.g., the camera used) or software details (such as the exact version of Photoshop) may already raise concerns, in over 600 cases the metadata contained GPS coordinates, potentially revealing the precise location where a photo was taken. In some instances, this could expose a researcher’s home address (when tied to a profile picture) or the location of research facilities (when images capture experimental equipment)

Oof, that's not too great.

[1] https://github.com/google-research/arxiv-latex-cleaner

calvinmorrison•3mo ago
They responsibly disclosed it in their research paper. An unethical use would be to use those coordinates to gain state secrets about say, research facilities
michaelmior•3mo ago
Having arXiv run the cleaner automatically would definitely be cool. Although I've found it non-trivial to get working consistently for my own papers. That said, it would be nice if this was at least an option.
cycomanic•3mo ago
Leaks of read/write access to documents and GitHub, Dropbox etc credentials is certainly worrying, but location and author/photographer details in photo metadata? That's quite a stretch, and seems like the authors here are just trying to boost the numbers.

The vast majority (I would wager >(100 - 1e-4)) of location of research institutions is public knowledge and can be found out by simply googling the institution address (I am not aware of a single research institution that publishes publically where the location is confidential).

agarttha•3mo ago
I offer free beer in a comment in my arxiv tex source.
fcpk•3mo ago
while exif might be bad for private photos, I do think research should not tamper with it unless there is a clear security rationale (ie private photos or things that are meant to o b hidden.. leave the data alone there...

I Write Games in C (yes, C)

https://jonathanwhiting.com/writing/blog/games_in_c/
45•valyala•2h ago•19 comments

We Mourn Our Craft

https://nolanlawson.com/2026/02/07/we-mourn-our-craft/
226•ColinWright•1h ago•241 comments

SectorC: A C Compiler in 512 bytes

https://xorvoid.com/sectorc.html
30•valyala•2h ago•4 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
128•AlexeyBrin•8h ago•25 comments

Brookhaven Lab's RHIC Concludes 25-Year Run with Final Collisions

https://www.hpcwire.com/off-the-wire/brookhaven-labs-rhic-concludes-25-year-run-with-final-collis...
8•gnufx•1h ago•1 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
130•1vuio0pswjnm7•8h ago•160 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
71•vinhnx•5h ago•9 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
836•klaussilveira•22h ago•251 comments

U.S. Jobs Disappear at Fastest January Pace Since Great Recession

https://www.forbes.com/sites/mikestunson/2026/02/05/us-jobs-disappear-at-fastest-january-pace-sin...
179•alephnerd•2h ago•124 comments

Al Lowe on model trains, funny deaths and working with Disney

https://spillhistorie.no/2026/02/06/interview-with-sierra-veteran-al-lowe/
57•thelok•4h ago•8 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
1064•xnx•1d ago•613 comments

Reinforcement Learning from Human Feedback

https://rlhfbook.com/
85•onurkanbkrc•7h ago•5 comments

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
493•theblazehen•3d ago•178 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
215•jesperordrup•12h ago•77 comments

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

https://github.com/Momciloo/fun-with-clip-path
14•momciloo•2h ago•0 comments

Coding agents have replaced every framework I used

https://blog.alaindichiappari.dev/p/software-engineering-is-back
231•alainrk•7h ago•365 comments

France's homegrown open source online office suite

https://github.com/suitenumerique
575•nar001•6h ago•261 comments

Selection Rather Than Prediction

https://voratiq.com/blog/selection-rather-than-prediction/
8•languid-photic•3d ago•1 comments

A Fresh Look at IBM 3270 Information Display System

https://www.rs-online.com/designspark/a-fresh-look-at-ibm-3270-information-display-system
41•rbanffy•4d ago•8 comments

72M Points of Interest

https://tech.marksblogg.com/overture-places-pois.html
30•marklit•5d ago•3 comments

History and Timeline of the Proco Rat Pedal (2021)

https://web.archive.org/web/20211030011207/https://thejhsshow.com/articles/history-and-timeline-o...
19•brudgers•5d ago•4 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
114•videotopia•4d ago•35 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
80•speckx•4d ago•90 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
278•isitcontent•22h ago•38 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
289•dmpetrov•23h ago•156 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
201•limoce•4d ago•112 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
558•todsacerdoti•1d ago•272 comments

Making geo joins faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
155•matheusalmeida•2d ago•48 comments

Microsoft Account bugs locked me out of Notepad – are Thin Clients ruining PCs?

https://www.windowscentral.com/microsoft/windows-11/windows-locked-me-out-of-notepad-is-the-thin-...
6•josephcsible•29m ago•1 comments

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

https://github.com/sandys/kappal
22•sandGorgon•2d ago•12 comments