frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Scientific datasets are riddled with copy-paste errors

https://www.sciencedetective.org/scientific-datasets-are-riddled-with-copy-paste-errors/
29•jruohonen•6h ago

Comments

steve_adams_86•1h ago
This is legitimately so challenging to avoid, because loads of scientific processes are—to some degrees or others—bespoke and difficult to fully streamline and introduce efficient, well-structured, comprehensive QA.

A LOT of labour goes into making it work. Most scientists I know and work with are very diligent people who care a lot about the outputs being as correct as possible, but wow, their workflows aren't great.

My job is to try and address this in whatever ways are practical for the data and the people doing the science, and it's kind of like Saas in that you think it should be easy enough to spot problems, solve them, and carry on/become a billionaire, but... The world is much more complicated than that, and it's easier to fail in this endeavour than it is to break even.

The classic "DropBox is just rsync" or "I could build Airbnb in a weekend" sentiments have their commonalities and counterparts in science, and the reality is similarly defeating and punishing on both sides. Making science go faster while maintaining correctness is exceedingly difficult. There are so many moving parts. So many disparate participants who are wildly technical and capable, or brilliant at studying bacteria in starfish yet terrified to run a command in a terminal. Your user base has virtually nothing in common in terms of ability and willingness to do anything other than get their own work done. It's brutal.

So, I sympathize with the authors of these papers and I hope readers don't assume they're bad at what they do or that it's done in bad faith. It's genuinely difficult.

An anecdote: I created a tool for validating biodiversity data against a specification called Darwin Core. Initially our published data was failing to validate so much that I thought I'd made the tool wrong. Rather, the spec is so complex and vast that the people I work with were unable to manage to get valid data into the public repositories. And yet! They were able to publish, because the public repositories' own validation is... Invalid. That's the state of things.

Granted, the data is still correct enough to be useful, and the errors don't cause the results to indicate anything that they shouldn't. It's more like minor metadata issues, failures to maintain referential integrity across different datasets, etc. But it's a very real, very difficult problem.

Science isn't easy at all. So many hoops to jump through, so much rigor, so much data. Mistakes are inevitable.

cyanydeez•35m ago
just imagine you scan private insustry. this is a generic problem that LLMs wont solve in generative capabilities.

Vercel April 2026 security incident

https://www.bleepingcomputer.com/news/security/vercel-confirms-breach-as-hackers-claim-to-be-sell...
522•colesantiago•11h ago•313 comments

A Brief History of Fish Sauce

https://www.legalnomads.com/fish-sauce/
58•vinhnx•16h ago•23 comments

The Bromine Chokepoint

https://warontherocks.com/cogs-of-war/the-bromine-chokepoint-how-strife-in-the-middle-east-could-...
147•crescit_eundo•7h ago•72 comments

Show HN: TRELLIS.2 image-to-3D running on Mac Silicon – no Nvidia GPU needed

https://github.com/shivampkumar/trellis-mac
17•shivampkumar•1h ago•0 comments

The race to build the next WordPress

https://opencomputer.dev/blog/the-race-to-build-the-next-wordpress/
4•iacguy•15m ago•1 comments

2,100 Swiss municipalities showing which provider handles their official email

https://mxmap.ch/
57•doener•2h ago•15 comments

Show HN: A working reference implementation of context engineering

https://github.com/outcomeops/context-engineering
21•linsys•2d ago•8 comments

Uber’s Anthropic AI push hits a wall

https://finance.yahoo.com/sectors/technology/articles/ubers-anthropic-ai-push-hits-223109852.html
58•dakiol•7h ago•66 comments

Swiss AI Initiative (2023)

https://www.swiss-ai.org
10•doener•2h ago•2 comments

Ex-CEO, ex-CFO of bankrupt AI company charged with fraud

https://www.reuters.com/legal/government/ex-ceo-ex-cfo-bankrupt-ai-company-charged-with-fraud-202...
78•1vuio0pswjnm7•2h ago•29 comments

Turtle WoW classic server announces shutdown after Blizzard wins injunction

https://www.pcgamer.com/games/world-of-warcraft/turtle-wow-classic-server-announces-shutdown-afte...
114•Brajeshwar•9h ago•87 comments

Changes in the system prompt between Claude Opus 4.6 and 4.7

https://simonwillison.net/2026/Apr/18/opus-system-prompt/
198•pretext•14h ago•115 comments

Show HN: Faceoff – A terminal UI for following NHL games

https://www.vincentgregoire.com/faceoff/
97•vcf•7h ago•33 comments

Six Levels of Dark Mode (2024)

https://cssence.com/2024/six-levels-of-dark-mode/
42•Akcium•6h ago•12 comments

Prove you are a robot: CAPTCHAs for agents

https://browser-use.com/posts/prove-you-are-a-robot
42•lukasec•4d ago•28 comments

Scientific datasets are riddled with copy-paste errors

https://www.sciencedetective.org/scientific-datasets-are-riddled-with-copy-paste-errors/
29•jruohonen•6h ago•2 comments

Archive of BYTE magazine, starting with issue #1 in 1975

https://archive.org/details/byte-magazine-1975-09
528•DamnInteresting•2d ago•136 comments

The RAM shortage could last years

https://www.theverge.com/ai-artificial-intelligence/914672/the-ram-shortage-could-last-years
188•omer_k•18h ago•199 comments

The seven programming ur-languages (2022)

https://madhadron.com/programming/seven_ur_languages.html
284•helloplanets•17h ago•110 comments

Got an Old Kindle? It Might Not Work Anymore

https://www.nytimes.com/wirecutter/reviews/older-kindle-support-ending/
40•eigenhombre•2h ago•22 comments

Nanopass Framework: Clean Compiler Creation Language

https://nanopass.org/
114•NordStreamYacht•4d ago•26 comments

I wrote a CHIP-8 emulator in my own programming language

https://github.com/navid-m/chip8emu
38•pizza_man•6h ago•13 comments

SPEAKE(a)R: Turn Speakers to Microphones for Fun and Profit [pdf] (2017)

https://www.usenix.org/system/files/conference/woot17/woot17-paper-guri.pdf
159•Eridanus2•16h ago•67 comments

Notion leaks email addresses of all editors of any public page

https://twitter.com/weezerOSINT/status/2045849358462222720
323•Tiberium•10h ago•109 comments

What are skiplists good for?

https://antithesis.com/blog/2026/skiptrees/
261•mfiguiere•2d ago•64 comments

Recovering Windows Live Writer Files

https://benovermyer.com/blog/2026/04/recovering-windows-live-writer-files/
4•bovermyer•5d ago•2 comments

Interesting Map Geometry and Mathematics

https://www.markrjohnsongames.com/2026/04/11/ultima-ratio-regum-0-11-update-57-interesting-map-ge...
4•Hooke•1d ago•0 comments

Hot-wiring the Lisp machine

https://scheatkode.com/blog/019d463d-38b3-7e63-80fd-6ed97bd8815e/hot-wiring-the-lisp-machine/
24•spudlyo•8h ago•2 comments

The creative software industry has declared war on Adobe

https://www.theverge.com/tech/913765/adobe-rivals-free-creative-software-app-updates
183•tambourine_man•11h ago•142 comments

Show HN: Prompt-to-Excalidraw demo with Gemma 4 E2B in the browser (3.1GB)

https://teamchong.github.io/turboquant-wasm/draw.html
88•teamchong•14h ago•43 comments