Building your own CLI coding agent with Pydantic-AI

https://martinfowler.com/articles/build-own-coding-agent.html

47•vinhnx•2h ago

Comments

binalpatel•1h ago

Pydantic-AI is lovely - I've been working on a forever, fun project to build a coding agent CLI for a year plus now. IMO it does make constructing any given agent very easy, though the lower level APIs are a little painful to use but they seem to be aware of that.

https://github.com/caesarnine/rune-code

Part of the reason I switched to it initially wasn't so much it's niceties versus just being disgusted at how poor the documentation/experience of using LiteLLM was and I thought the folks who make Pydantic would do a better job of the "universal" interface.

ziftface•57m ago

I had the opposite experience. I liked the niceties of Pydantic AI, but had trouble with it that I found difficult to deal with. For example, some of the models wouldn't stream, but the OpenAI models did. It took months to resolve, and well before that I switched to LiteLLM and just hand-rolled the agentic logic stuff. LiteLLM's docs were simple and everything worked as expected. The agentic code is simple enough that I'm not sure what the value-add for some of these libraries is besides adding complexity and the opportunity for more bugs. I'm sure for more complex use cases they can be useful, but for most of the applications I've seen, a simple translation layer like LiteLLM or maybe OpenRouter is more than enough.

bluecoconut•1h ago

After maintaining my own agents library for a while, I’ve switched over to pydantic ai recently. I have some minor nits, but overall it's been working great for me. I’ve especially liked combining it with langfuse.

Towards coding agents, I wonder if there are any good / efficient ways to measure how much different implementations work on coding? SWE-bench seems good, but expensive to run. Effectively I’m curious for things like: given tool definition X vs Y (eg. diff vs full file edit), prompt for tool X vs Y (how it’s described, does it use examples), model choice (eg. MCP with Claude, but python-exec inline with GPT-5), sub-agents, todo lists, etc. how much across each ablation, does it matter? And measure not just success, but cost to success too (efficiency).

Overall, it seems like in the phase space of options, everything “kinda works” but I’m very curious if there are any major lifts, big gotchas, etc.

I ask, because it feels like the Claude code cli always does a little bit better, subjectively for me, but I haven’t seen a LLMarena or clear A vs B, comparison or measure.

iLoveOncall•56m ago

I really wish Pydantic invested in... Pydantic, instead of some AI API wrapper garbage.

I've been using it a lot lately and anything beyond basic usage is an absolute chore.

drdeafenshmirtz•35m ago

how so? Was thinking of using it for my next project - would love to hear some of the risks