frontpage.

I open-sourced a dataset of 5 synthetic bank and credit card statement PDFs designed for testing extraction/parsing accuracy. Each PDF uses a fictional bank with realistic formatting from a different country

I've been building a bank statement converter (Bankstatemently) and kept discovering edge cases across different banks. At some point, I started cataloging them as "quirks" and I'm currently at 36 documented challenges and counting (think: dates without years across year boundaries, credit card charges shown as positive instead of negative, dates hiding inside description text etc)

Real bank data is private, so there's no shared dataset to test parsers against. Once I had these quirks, I realized I can use them to reconstruct statements that deliberately include these challenges so more people can use them

There's also a free evaluation API: submit your parsed JSON and get field-level accuracy scores back. Ground truth is held server-side, but that's not necessarily bullet-proof against overfitting

Would appreciate feedback on which edge cases are missing. I'm planning to make the next 10 statements a bit harder (scanned PDFs, multi-currency across multi-table, Buddhist era dates)

https://github.com/bankstatemently/bank-statement-parsing-be...

You can browse all of the quirks here with real-world examples: https://bankstatemently.com/benchmark/challenges

Show HN: AskAudience – Ask 16,500 AI personas built from real survey data

John O'Hurley on "Seinfeld" audition [video]

Beholder

UTM tracking parameters on internal links waste crawl budget and fracture

macOS 26 breaks custom DNS settings including .internal:(

Maintenance: Of Everything – The End of Combustion Vehicles

Show HN: Blazeway – A/B testing tool that builds a connected experiment history

The Specification Gap: Coordination Failure Under Partial Knowledge in Agents

AI is programmed to hijack human empathy

Cut AI debugging tokens by 60% by grouping test failures

Show HN: MDX Docs – a lightweight React framework for documentation sites

Ask HN: Why isn't using your home network as a VPN more common?

Introducing GPU Acceleration

Make.com Is a Bad Idea for Your Business

Gait Analysis

Is it Really Impossible To Cool A Datacenter In Space?

Is a random human peer better than a chatbot in reducing loneliness over time?

Mnist-Lean4

Psychedelic Therapy vs. Antidepressants for the Treatment of Depression

Folio: PDF generation for Go with an in-browser WASM playground

Street Fighter 6's Incestuous New Storyline Divides Opinion

Iran attack wipes out 17% of Qatar's LNG capacity for up to 5 yrs

Show HN: Starspelled – Turn Words into Constellations

Amazon CEO sees AI doubling prior AWS sales projections to $600B by 2036

Mac on-screen camera indicator light

4Chan attorney replies to UK Ofcom fine with picture of giant hamster

Iran, Gabbard Turned Intelligence Duties over to Trump

SAP's grand cloud escape plan €2B short of the runway

ArXiv leaves partnership with Cornell to become an independent non-profit

Ask HN: How do you vibe code in microservices without breaking everything?

Show HN: Open-source synthetic bank statements for testing parsers