Ask HN: What are you building with AI coding agents / tooling?

3•giancarlostoro•1h ago

I think we can all agree, a sizable number of us are using AI at home to build side projects, or even experiment with one concept or another. The "Show HN" section is flooded more than ever. I figured maybe a thread would help gather a lot more interesting projects in one shot and anyone who doesn't want to talk or read about AI can skip it.

If you want some structure, feel free to share things like:

* Repo or website

* What the project does / problem it solves

* How you're using AI (models, agents, etc.)

* What actually works for you vs. what’s hype

* Any lessons learned or unexpected challenges

* Whether it’s in production and how real users are using it

* Costs or scaling concerns

* Anything you’re stuck on or want feedback on

I'm really curious to see what people are building, even if its not fully polished yet. If you have multiple projects, that's fine too, but pick the top ones you're currently working on. The idea is to see what's being built, as well as learn from other people's experiences with AI.

Comments

agentura•1h ago

I'm building a tool to catch AI agent regressions. For example, behavior can silently shift for a number of reasons -- a prompt tweak, model swap, context change, routing -- and the impact on output wont be obvious until a few weeks later when refunds for edge cases spike!

Agentura is like pytest for AI agents. Its 100% free.

Try here: https://agentura.run

vibe42•1h ago

Building my own home lab for local AI inference and general-purpose servers. Purpose is to learn more about hardware, Linux, networking, open source AI tools.

Decided as a constraint to exclusively use local AI! This was fun in that the first step became assembling the first server able to run a small local model, that would then assist with everything else.

After I got the first one running it was used for almost everything, except it could not assemble the 42U steel server rack.. (shoulders hurt a bit now, probably good exercise!)

The first thing I tried on the new servers after first boot of debian was feeding the entire Linux dmesg log with one simple instruction: "Check all dmesg entries and provide recommendations for any errors, issues or other considerations".

This was very helpful even with smaller local models, as a complement to just searching for various errors (drivers etc). Learned a lot of new things like BMC network configs.

Home lab networking in general was incredible to work through using local AI. Being a bit rusty on various things like firewalls, local DNS etc it was refreshing asking questions so dumb that one might not want them in the logs of hosted AI providers given a history as a SWE...lol

And more complex things like how packets flow in mikrotik RouterOS.

Some general findings:

* The latest generation of local AI models are _way_ better than even just 6 months ago. In particular dense models 7B+ are surprisingly useful for anything Linux, network configs, small to medium sized scripts.

* Latest gen open models from small AI labs generally beat last gen models of the same size from larger labs.

* Don't trust recommendations for any specific model - try it for real stuff and get messy with it - feed it system/app logs, mad half-spelled ramblings late at night along with more clear and well written instructions the next day...

* Larger open models of decent quant (Q5 and up) are now so good enough that the bottleneck for many use cases is no longer the model, but your workflow.

* Simpler workflows beat complex prompts, skills, AGENT.md etc. I run most things with the pi-mono coding agent with no extensions.

* Have the same model verify a finding/claim in a fresh context. This drastically reduces false positives and improves correctness of findings. Going further, run a third verification with a different model.

* If you grew up with the sounds of floppy disks, 56k modems etc, you might just like the coil whine of local GPUs... it's oddly comforting and different models sound different when working on the same tasks.

nathan_douglas•44m ago

What's your GPU setup like?

I'm doing a vaguely similar thing - I have a 10" rack minilab [1] and I've vibe-coded an MCP server that runs in the cluster to introspect, etc, but the main longterm goal is to set up some ML pipelines and maybe work toward formal verification via TLA+ or smth. (_not_ vibecoding that... I'm thinking of moving into AI formal verification or compliance automation as a career move.)

I have a separate amd64 server with an RTX 2070 Super - which is obviously old and low-powered. Useful for some general ML stuff, but I don't think it's sufficient to run any non-trivial modern LLM.

I'm thinking about upgrading that GPU, but haven't committed to it or even really thought that hard about it.

[1] https://clog.goldentooth.net/