frontpage.

I wanted to have all my emails (and files) scanned for financial data. Transactions, Bills (I may not have paid). I wanted this to run entirely locally and not depend on a Large Language Model from a cloud provider.

I initially started with Google Gemini 3 Flash but I switched to Ollama + Ministral 3:3b. The extraction is not exhaustive and there is much to improve but this is working.

dwata runs locally, runs a web backend and the gui runs in browser. Connects to emails, downloads them. Then we can run the financial template detection. It checks for similar looking emails, grouped by sender. Then sends a sample from each cluster to LLM agent. The LLM is asked to find out the parts of text that look like the data we are looking for. dwata then searches for the variables/values that LLM gave in the email, creates a template by replacing the data with template tags. Saves template to DB. dwata parse the data from each email when extracting data.

Roadmap: There is a long way to go, the extractor needs to work much, much, better. dwata will also work on files soon (bank/CC statements).

I want to extract vendors, businesses, contacts, events, places, etc. Connect to different APIs and process everything locally.

dwata will be able to download and process data from Hacker News API too (or other similar sources) - extract entities you care about.

Eventually, only use Ollama/Llama.cpp with models that fit 6-8GB graphics cards or 16GB unified memory only!!

Day Week Job Board

Auto-accept everything and nothing else

Windows 11 taskbar's new Internet Speed Test tool is a shortcut to Bing.com

The Process of Movie Casting Has Changed Drastically

New multimodal Gemini embeddings from Google (videos and PDFs supported)

Ten Thoughts on Government Data

Show HN: MoveAlerts.ai – AI that distills stock news in real-time

SQLite Concurrency in Go: What We Learned Building a Desktop AI IDE

YouTube Now Worlds Largest Media Company, Topping Disney

Show HN: SnapDrift – a pluggable visual regression workflow for GitHub Actions

Judge blocks Perplexity's bot Amazon shopping in early test of agentic commerce

Ask HN: What would a developer-first alternative to Shopify look like?

Benchmarking Culture

Slate Auto switches CEOs ahead of launch later this year

New ways to learn math and science in ChatGPT

Show HN: Emotive Engine – I wrote 8 elemental shaders to prove one pattern works

Turbopuffer: Object Storage-native Database for Search [video]

Who's a Better Writer: A.I. Or Humans? Take Our Quiz

Tommy DeCarlo, Boston Fan Who Became Their Lead Singer, Dead at 60

The Bay Area Considers the Unthinkable: Life Without BART

A Methodological Critique of "First Proof" (Abouzaid et al., 2026)

Umbra Open Data Tracker

Show HN: A tool for arranging photos for home-printing without wasting paper

I've never parented a 6-year-old child. But I've dealt with macOS system updates

Rising Air-Conditioning Use Intensifies Global Warming

Exigy Shareware Construction Kit

Every paper published in the Bridges Conference on art and mathematics

I built a RabbtiMQ UI alternative because its not 2005 anymore

How God Got So Great

Texas women used crow drones to fly drugs into Louisiana prison

Show HN: Extract (financial) data from emails with local LLM

Comments