frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Hybrid local and cloud LLM stack for regulated financial document processing?

2•rem_cam•39m ago
I'm scoping a hybrid AI pipeline for a consulting client in a regulated industry (GLBA-covered, NPI involved). Trying to validate the architecture before bringing on an engineer to build it.

The workflow: ingest financial PDFs (bank, brokerage, retirement statements, tax returns), classify by asset type, extract data, apply domain-specific business logic, populate Excel templates and fillable PDF forms. Compliance constraint: no NPI can hit a cloud API without ZDR-style controls.

Current architecture sketch: - Local LLM (Ollama or LM Studio) on dedicated hardware for OCR and first-pass extraction - Local PII scrubber/tokenizer (Presidio or Skyflow) replaces identifiers with tokens before any cloud call - Cloud LLM under enterprise terms (Claude API with ZDR, or Bedrock equivalent) for the reasoning layer - Local de-tokenization and template population

Questions for anyone who's actually shipped this pattern: 1. What stack did you land on, and what would you do differently? 2. Local model for financial document OCR + structured extraction - is Qwen2.5-VL still the move, or has something better landed? 3. Tokenization layer: roll your own with Presidio, or pay for Skyflow / Private AI? 4. Orchestration: LangGraph, n8n, or custom Python? 5. Is an M4 Max Mac realistic for a single-user workflow at 50-200 PDFs per case, or do I need to plan for proper inference hardware?

Already evaluated turnkey hybrid platforms (LLM.co, PremAI, Petronella) - leaning toward an assembled stack for cost and control reasons, but open to being talked out of it if someone's had a great experience with one of these.

Not looking for "just go fully local" (reasoning quality is important for this build) or "just use the API" (data constraints are real). Production-tested stacks only.

Comments

coreyp_1•24m ago
There are so many variables here. My question is how much do you have to invest into getting it done right?

Local has come a long way, but it is still limited and slow. And while there are some people who have done stuff like this, the field is so new that you're probably going to get someone that doesn't have direct experience with everything. In other words, they're going to get stuff wrong. You will have to rebuild some part of it. You might not purchase the right hardware. Can you live with this?

In all fairness, though, if you have someone who has experience in evaluating new systems and using them to build something, then you can still be in good shape. I mentioned this, simply because it's a skill that is not as common as we would like in this world. Just look for someone with a track record of delivering functional software using new technologies.

My personal bias is that I love to keep as much local as possible, but I also realize that I bought a $3,000 machine that so far has saved me $5 in tokens from an external API. As I see it, the only real reasons to have local AI at the moment is privacy, but that does fit your use case.

As for a turnkey solution, they have their benefits, but their moat is significantly smaller now than it used to be. Quite frankly, you can vibe code the majority of TurnKey solutions in a weekend. Well, at least the parts that you need.

Sorry to not give more specific answers, but a lot of your questions may depend on whichever developer you decide to use. There's not necessarily a wrong answer in many cases, there are multiple paths to achieve what you are trying to do. If I were you, I would focus on long-term maintainability and security of your system. For example, you can have the best thing in the world, but if you can't pass a SOC2 (or, even worse, your developer has never heard of something like that) then you are going to be in a lot of pain.

Quine revives Hyper Terminal

https://hyper.quineglobal.com
2•ironmagma•2m ago•0 comments

Financial Models as Code

https://github.com/Orcaset/orcaset-py
1•jrdnocs•5m ago•0 comments

FYI: Dreamina is shady; do not use

1•ronyeh•5m ago•0 comments

Apple's Finder App [video]

https://www.youtube.com/shorts/viUU2LAR8eg
2•Cider9986•8m ago•0 comments

Wikipedia doesn't need my cash

https://forkingmad.blog/wikipedia-doesnt-need-my-cash/
3•speckx•11m ago•0 comments

The Presences API: Track who is online, typing, and active in realtime

https://appwrite.io/blog/post/announcing-presences-api
1•codeguyakashdev•11m ago•0 comments

Redis-py sucks. It's time for something better

https://github.com/alisaifee/coredis
1•22graeme•13m ago•1 comments

A tiny microphone and site to track birds outside your window

https://theodore.net/projects/AvianVisitors/
2•Twarner•13m ago•1 comments

APL's Surprising Learning Curve (2017) [video]

https://www.youtube.com/watch?v=9xCJ3BCIudI
1•tosh•14m ago•0 comments

Clawtoberfest Contribute · Iterate · Molt

https://nesbitt.io/clawtoberfest/
1•lyoncy•16m ago•0 comments

Monty Hall Problem Simulation

https://nodesocket.github.io/monty-hall-problem-simulation/
1•nodesocket•17m ago•0 comments

Can someone explain this information theory puzzle paper in simple terms?

https://www.researchgate.net/publication/394259368_Exploring_Reinforcement_Learning_and_Informati...
1•JustSittingHere•19m ago•0 comments

Multi-Tenancy in Spring Boot: A Practical Guide

https://anomitra.me/blog/multi-tenancy-in-spring-boot-a-practical-guide/
1•shadeslayer_•20m ago•0 comments

Americans Are Falling Behind on Their $1.25T Credit-Card Bill

https://www.wsj.com/personal-finance/credit/us-credit-card-debt-af5c7c77
5•tcp_handshaker•20m ago•1 comments

Vidai – AI Gateway Written in Rust Community Edition Released

https://vidai.uk/community/
1•nagug•22m ago•0 comments

Decades of Effort Restore Steelhead and Salmon Passage on Alameda Creek

https://www.fisheries.noaa.gov/feature-story/decades-effort-restore-steelhead-and-salmon-passage-...
3•rawgabbit•25m ago•0 comments

La Fabbrica Del Terrore

https://drfmappa.substack.com/p/la-fabbrica-del-terrore
1•drpsymappa•28m ago•0 comments

ChatPaper: Explore and AI Chat with the Academic Papers

https://chatpaper.com
1•geox•30m ago•0 comments

Rothko for your current weather conditions

https://rothko.joonas.wtf/
21•jxmorris12•30m ago•2 comments

Why German trains are never on time anymore

https://www.lemonde.fr/en/international/article/2026/05/29/why-german-trains-are-never-on-time-an...
3•rawgabbit•31m ago•0 comments

Show HN: Heypi – Like OpenClaw but for Your Team (Slack, Discord, etc.)

https://github.com/hunvreus/heypi
1•hunvreus•32m ago•0 comments

Reproducible Infrastructure and Nix

https://www.heavybit.com/library/podcasts/open-source-ready/ep-38-reproducible-infrastructure-wit...
2•jmartens•33m ago•0 comments

ARM Open Sources AI-Powered Security Code Review

https://github.com/arm/metis
1•ARob109•33m ago•0 comments

What is to be done about MGLRU?

https://lwn.net/Articles/1072866/
2•infinet•33m ago•0 comments

DDS Vibe Academy – 47 free AI coding masterclasses, built by AI agents

https://ddsboston.com/pages/dds-vibe-academy
1•robert_dds•33m ago•0 comments

GNUtrition 0.33.0rc4

https://savannah.gnu.org/news/?id=10896
2•amcclure•33m ago•0 comments

DOE's Lockheed Martin nuclear-weapons M&O contract: $48B cumulative since 1993

https://www.usaspending.gov/award/27001/
2•thebuildout•35m ago•0 comments

Show HN: Heirlooms – pass your legacy to family after stop breathing

1•jojwong•35m ago•5 comments

Plume – Sensible HTTP Security Headers for Gleam Web Servers, Inspired by Helmet

https://github.com/scott-ray-wilson/plume
1•TheWiggles•37m ago•0 comments

How to make Unreal's Message Log 100 times faster

https://larstofus.com/2026/05/28/how-to-make-unreals-message-log-100-times-faster/
1•caminanteblanco•37m ago•0 comments