news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Robust ways to extract bank statements from PDF to CSV beyond raw LLMs?

https://exactstatement.com/

1•alexfefun1•1h ago

Comments

alexfefun1•1h ago

I’ve built a tool called ExactStatement to help users convert PDF bank statements into specific CSV formats.

Currently, I’m using the Gemini API (Pro/Flash) to directly transform PDF content into structured JSON. While it works surprisingly well for 95% of cases, the "last 5%" is a headache:

Hallucinations: Occasionally, the AI misinterprets a digit or skips a line item, which is unacceptable for financial data.

Context Limits: Very long statements (50+ pages) sometimes lead to degraded performance or missing rows.

I'm looking for a more robust engineering approach. Should I:

Stick with LLMs but add a validation layer (e.g., checking if the calculated balance matches the statement's final balance)?

Switch to a hybrid approach? (e.g., using LayoutLM or Amazon Textract for OCR/Layout analysis first, then using LLMs for cleaning).

Go back to rule-based parsing for major banks (though maintaining templates seems like a nightmare)?

How are you guys solving the "precision" problem in document extraction today? Would love to hear your experiences with specific libraries or workflows.

A Geometric Solution to the Coulomb Barrier via 10D Phase-Alignment

https://sharetext.io/d0bm1suz

1•diametricsound•1m ago•1 comments

ICE Masks Up in More Ways Than One

https://www.kenklippenstein.com/p/exclusive-ice-masks-up-in-more-ways

1•computerliker•6m ago•0 comments

Better hardware means OpenAI, Anthropic, etc. are doomed in the future?

1•kart23•6m ago•0 comments

Cloud-Claw: Run OpenClaw with 1 Click on Cloudflare to Create Personal Agent

https://github.com/miantiao-me/cloud-claw

1•ms7892•9m ago•0 comments

RBC – It Stands for Big Chicken

https://www.reallybigchicken.com/

1•frenchie4111•12m ago•0 comments

Goldman's India Push Pays Off in Crowded Wall Street Field

https://www.bloomberg.com/news/articles/2026-02-10/goldman-s-push-bears-fruit-in-india-s-crowded-...

1•vismit2000•16m ago•0 comments

I built JoyPass: surprise gestures like breakfast in bed, now in Apple Wallet

https://joypass.co

3•arron-taylor•25m ago•1 comments

LLM Reasoning Failures

https://arxiv.org/abs/2602.06176

1•gradus_ad•25m ago•0 comments

Megalancer.com

https://megalancer.com/

1•Megalancer•26m ago•1 comments

I improved 15 LLMs at coding in one afternoon. Only the harness changed

https://twitter.com/_can1357/status/2021828033640911196

1•amardeep•28m ago•1 comments

One of my managers demanding a 25% share of the project bonus pool

https://old.reddit.com/r/founder/comments/1r3d332/a_discussion_about_one_of_my_managers_demanding_a/

1•fanux•28m ago•0 comments

The Filter, Not the Bar

https://k2xl.substack.com/p/the-filter-not-the-bar

1•k2xl•29m ago•0 comments

Private-equity barons have a giant AI problem

https://www.economist.com/business/2026/02/12/private-equity-barons-have-a-giant-ai-problem

1•petethomas•32m ago•0 comments

Discord walks back age verification fears for most users

https://www.techbuzz.ai/articles/discord-walks-back-age-verification-fears-for-most-users

2•brie22•33m ago•0 comments

Built a skill that hugs my agents

https://hugllm.com/

1•zeahoo•34m ago•0 comments

The Wonder of Modern Drywall

https://worksinprogress.co/issue/the-wonder-of-modern-drywall/

2•zdw•37m ago•0 comments

Yee Launcher: Play Minecraft in the Browser Using WASM and TeaVM

https://yee.pages.dev/

2•Jotalea•37m ago•0 comments

Anthropic's Chief on A.I.: 'We Don't Know If the Models Are Conscious'

https://www.nytimes.com/2026/02/12/opinion/artificial-intelligence-anthropic-amodei.html

1•goplayoutside•37m ago•0 comments

David Deutsch on AGI, Alignment and Existential Risk [video]

https://www.youtube.com/watch?v=CU2yj826NHk

2•ReubenAdams•40m ago•0 comments

Interference Pattern Formed in a Finger Gap Is Not Single Slit Diffraction

https://note.com/hydraenids/n/nbe89030deaba

1•uolmir•41m ago•0 comments

How The Times Is Digging Into Millions of Pages of Epstein Files

https://www.nytimes.com/2026/02/12/insider/jeffrey-epstein-files-documents.html

2•jbegley•48m ago•0 comments

OpenClaw bot writes blog post shaming maintainer after rejected PR

https://twitter.com/callebtc/status/2022046669710491991

2•nsedlet•57m ago•1 comments

Defining causal mechanism in dual process theory and 2 types of feedback control

https://arxiv.org/abs/2602.11478

1•s6i•58m ago•0 comments

Show HN: Software Design – ADRs, arch tests, patterns

https://github.com/QDenka/awesome-software-design

2•qdenka•59m ago•0 comments

IterX: AI can optimize code reated to infrastructure, CUDA, DBs, and AI/ML Ops

https://iterx.deep-reinforce.com

1•kathyxiao•1h ago•1 comments

Top Goldman Sachs Lawyer Kathy Ruemmler to Resign over Epstein Links

https://www.ft.com/content/c9c4ea8a-f806-4a04-b409-a8ec03c00b15

4•petethomas•1h ago•1 comments

Colorado AI Builders

https://www.boulderaibuilders.org/

1•mooreds•1h ago•0 comments

WhatsApp Basically Wiped from Russian Internet

https://gizmodo.com/whatsapp-basically-wiped-from-russian-internet-2000720488

1•mooreds•1h ago•0 comments

AI-native software factory with the Phoenix Architecture

https://gist.github.com/mikegehard/1385345f81c26458f311356fd9bfeefa

1•mooreds•1h ago•0 comments

Smithers - Declarative AI Orchestration with React

https://github.com/evmts/smithers

2•roninjin10•1h ago•1 comments