The workflow: ingest financial PDFs (bank, brokerage, retirement statements, tax returns), classify by asset type, extract data, apply domain-specific business logic, populate Excel templates and fillable PDF forms. Compliance constraint: no NPI can hit a cloud API without ZDR-style controls.
Current architecture sketch: - Local LLM (Ollama or LM Studio) on dedicated hardware for OCR and first-pass extraction - Local PII scrubber/tokenizer (Presidio or Skyflow) replaces identifiers with tokens before any cloud call - Cloud LLM under enterprise terms (Claude API with ZDR, or Bedrock equivalent) for the reasoning layer - Local de-tokenization and template population
Questions for anyone who's actually shipped this pattern: 1. What stack did you land on, and what would you do differently? 2. Local model for financial document OCR + structured extraction - is Qwen2.5-VL still the move, or has something better landed? 3. Tokenization layer: roll your own with Presidio, or pay for Skyflow / Private AI? 4. Orchestration: LangGraph, n8n, or custom Python? 5. Is an M4 Max Mac realistic for a single-user workflow at 50-200 PDFs per case, or do I need to plan for proper inference hardware?
Already evaluated turnkey hybrid platforms (LLM.co, PremAI, Petronella) - leaning toward an assembled stack for cost and control reasons, but open to being talked out of it if someone's had a great experience with one of these.
Not looking for "just go fully local" (reasoning quality is important for this build) or "just use the API" (data constraints are real). Production-tested stacks only.
coreyp_1•24m ago
Local has come a long way, but it is still limited and slow. And while there are some people who have done stuff like this, the field is so new that you're probably going to get someone that doesn't have direct experience with everything. In other words, they're going to get stuff wrong. You will have to rebuild some part of it. You might not purchase the right hardware. Can you live with this?
In all fairness, though, if you have someone who has experience in evaluating new systems and using them to build something, then you can still be in good shape. I mentioned this, simply because it's a skill that is not as common as we would like in this world. Just look for someone with a track record of delivering functional software using new technologies.
My personal bias is that I love to keep as much local as possible, but I also realize that I bought a $3,000 machine that so far has saved me $5 in tokens from an external API. As I see it, the only real reasons to have local AI at the moment is privacy, but that does fit your use case.
As for a turnkey solution, they have their benefits, but their moat is significantly smaller now than it used to be. Quite frankly, you can vibe code the majority of TurnKey solutions in a weekend. Well, at least the parts that you need.
Sorry to not give more specific answers, but a lot of your questions may depend on whichever developer you decide to use. There's not necessarily a wrong answer in many cases, there are multiple paths to achieve what you are trying to do. If I were you, I would focus on long-term maintainability and security of your system. For example, you can have the best thing in the world, but if you can't pass a SOC2 (or, even worse, your developer has never heard of something like that) then you are going to be in a lot of pain.