frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Validated Table Extractor–Verify PDF Tables Using Docling+Vision LLMs

https://github.com/2dogsandanerd/validated-table-extractor
2•2dogsanerd•57m ago
Hey HN,

I built this because I got tired of "silent failures" in traditional PDF table extraction tools.

In my day job working with financial and legal documents, tools like Camelot or Tabula often return data that looks plausible but has shifted columns or missing decimal points. In regulated environments, you can't afford to guess.

I built a pipeline that treats extraction as a hypothesis to be verified:

1. *Extraction:* Uses IBM’s Docling to parse the layout and get the structure (Markdown).

2. *Visual Verification:* Captures a screenshot of the specific table region from the PDF.

3. *Validation:* Feeds both the Markdown and the Screenshot into a local Vision LLM (Llama 3.2 via Ollama).

4. *Scoring:* The LLM compares pixel truth vs. extracted text and outputs a confidence score + audit trail.

The trade-off is speed (it takes ~5s per table) vs. confidence. It's designed to run 100% locally for privacy-critical documents.

Repo is here: https://github.com/2dogsandanerd/validated-table-extractor

Would love to hear how you handle data integrity in RAG pipelines!

Little Rascals Go-Kart

https://enombic.com/little-rascals-go-kart
1•vxxzy•48s ago•0 comments

Removing juries: 'A move towards an authoritarian state'

https://www.theguardian.com/law/2025/dec/07/authoritarian-state-trial-by-jury-uk
2•binning•2m ago•0 comments

Solution to US debt crisis is severe austerity triggered by a fiscal calamity

https://fortune.com/2025/12/06/us-debt-crisis-soution-severe-austerity-fiscal-calamity-default-ra...
1•mohi-kalantari•3m ago•0 comments

The Data on Self-Driving Cars Is Clear. We Have to Change Course

https://www.nytimes.com/2025/12/02/opinion/self-driving-cars.html
1•alexcos•4m ago•1 comments

Cameroon's conflict is making 'widows' of women whose husbands are still alive

https://minorityafrica.org/invisible-widows-cameroons-conflict-is-making-widows-out-of-women-whos...
1•binning•7m ago•0 comments

How to Leave the U.S.A.

https://www.newyorker.com/magazine/2025/12/15/how-to-leave-the-usa
2•mitchbob•7m ago•1 comments

The secure open source fallacy

https://ulveon.net/p/2025-12-03-the-secure-open-source-fallacy/
1•kevin061•9m ago•0 comments

Being a Writer in the Age of the Influencer

https://www.robkhenderson.com/p/being-a-writer-in-the-age-of-the
1•binning•9m ago•0 comments

How the Brain Parses Language

https://www.quantamagazine.org/the-polyglot-neuroscientist-resolving-how-the-brain-parses-languag...
1•mylifeandtimes•9m ago•0 comments

Checkpointing the Message Processing

https://event-driven.io/en/checkpointing_message_processing/
1•ingve•11m ago•0 comments

The main features of Atlas Autocode [pdf]

https://www.ancientgeek.org.uk/EMAS/EMAS_Papers/The_Main_Features_Of_Atlas_Autocode.pdf
1•fanf2•14m ago•0 comments

Show HN: Chorus – Multi-agent debate through epistemological framework collision

https://chorusai.replit.app/
1•efoobz•15m ago•0 comments

Inadequate Equilibria: Where and How Civilizations Get Stuck

https://equilibriabook.com/toc/
1•jstanley•16m ago•0 comments

Show HN: Realisticaichecker.com, realistic AI generated text detector

https://realisticaichecker.com/
1•Tarmo362•17m ago•0 comments

Tier6 - Build global Ethernet networks using the sanctum protocol

https://github.com/jorisvink/tier6
1•jvink•17m ago•0 comments

Should You Trust Your VPN Location?

https://ipinfo.io/blog/vpn-location-mismatch-report
3•reincoder•18m ago•0 comments

Ask HN: Co-Founder Salary Dispute

2•throwawayround•18m ago•1 comments

Indie SaaS product GummySearch is winding down

https://newsletter.failory.com/p/when-reddit-pulls-over
1•nocodebcn•19m ago•0 comments

Show HN: Early-stage browser extension for Amazon Associates workflows

https://aadp.agilehero.com.br/en/
1•aadp-agilehero•20m ago•0 comments

Ask HN: Do LLMs know when you submit a different chat history?

2•Bombthecat•26m ago•0 comments

John Bowlby's Attachment Theory

https://www.simplypsychology.org/bowlby.html
1•thunderbong•27m ago•0 comments

Docker layers are a horrible dependency model

https://schizo.cooking/effort/docker-layers.html
2•RGBCube•30m ago•0 comments

Rebuilding Visi On reveals how Apple defined the GUI era

https://www.theregister.com/2025/12/08/visi_on_deep_dive/
1•giuliomagnifico•30m ago•0 comments

Why do these companies abuse us? [video]

https://www.youtube.com/shorts/lzdpS4s4xuI
1•eecc•30m ago•1 comments

Show HN: LLM Newsletter Kit – Automate expert newsletters for $0.20/issue

https://github.com/kimhongyeon/llm-newsletter-kit-core
2•hongyeon•31m ago•0 comments

Show HN: A tool to mass-leave Slack channels

https://github.com/SkyfallWasTaken/slack-mass-leave
1•JustSkyfall•32m ago•0 comments

Slax: Live Pocket Linux

https://www.slax.org/
1•Ulf950•32m ago•0 comments

Cory Doctorow on Channel 4 (UK) about how the door is open a crack [video]

https://www.youtube.com/watch?v=tZQaEeuuI3Q
1•t43562•34m ago•1 comments

Does anyone actually boot off NFS shares anymore?

https://old.reddit.com/r/linux/comments/1ikcd4g/does_anyone_actually_boot_off_nfs_shares_anymore/
1•sipofwater•36m ago•3 comments

Show HN: DocBeacon – See how people read your documents

https://docbeacon.io
1•howardshaw•37m ago•1 comments