The core idea: a .chat file IS the conversation. No SQLite, no JSON logs, no shadow state. What you see in the buffer is exactly what the model receives. Edit an assistant reply to fix a hallucination, delete a tangent, fork by duplicating the file - it all works because there's nothing to fall out of sync.
What's new since October:
- Tool calling. Models can run shell commands, read/edit/write files (same as Pi, just 4 tools). Results go straight into the buffer. There's an approval flow (Ctrl-] cycles: preview -> execute -> send) so nothing runs without your say-so. Parallel tool use also works.
- Prompt caching for Anthropic, OpenAI and Vertex AI. Flemma places cache breakpoints automatically. Long conversations are now significantly cheaper (this was a major pain point for me).
- Extended thinking / reasoning support for all 3 providers.
- Per-buffer overrides via frontmatter. `flemma.opt` lets you pick which tools a buffer can use, set provider parameters, switch models - all scoped to that one file.
- Open registration APIs for both providers and tools. Custom tools can resolve definitions asynchronously from CLI subprocesses or remote APIs. I plan on adding mcporter support at some point.
Flemma works with Anthropic, OpenAI and Vertex AI. You get cost tracking, presets, Lua template expressions, file attachments and a lualine.nvim component.
One thing I want to be upfront about: nearly every line of code in Flemma was written by AI (Claude Code as of late, Amp and Aider in the past). It says so in the README. Every change was personally architected, reviewed and tested by me. I decide what gets built and I vet every diff. I think this is where a lot of software development is heading and I'd rather be honest about it than pretend otherwise.
I'm @StanAngeloff on GitHub - long-time Neovim user and open source enthusiast. Happy to answer questions.