Is any of you using LLMs to create full features in big enterprise apps?

7•not_that_d•1mo ago

Let me be clear first. I don't dislike LLMs, I query them, trigger agents to do stuff where I kind of know what the end goal is and to make analisys of small parts of an application.

That said, everytime I give it something a little more complex that do something in a single file script it fails me horribly. Either the code is really bad, or the approach is as bad a someone who doesn't really know what to do or it plains start doing things that I explicitly said not to do in the initial prompt.

I have sometimes asked my LLM fan's coworkers to come and help when that happens and they also are not able to "fix it", but somehow I am the one doing it wrong due "wrong prompt" or "lack of correct context".

I have created a lot of "Agents.md" files, drop files into the context window... Nothing.

When I need to do green field stuff, or PoCs it delivers fast, but then applying it to work inside an existent big application fails.

The only place where I feel as "productive" as I heard from other people is when I do stuff in languages or technologies I don't know at all, but then again, I also don't know if that functional code I get at the end is broken in things I am not aware of.

Are any of you guys really using LLMs to create full features in big enterprise apps?

Comments

linesofcode•1mo ago

The quality of an LLM outputs is greatly dependent on how many guard rails you have setup to keep it on track and heuristics to point it on right direction (type checking + running tests after every change for example).

What is health of your enterprise code base? If it’s anything like ones I’ve experienced it’s a legacy mess then it’s absolutely understandable that an LLMs output is subpar when taking on larger tasks.

Also depends on the models and plan you’re on. There is a significant increase in quality when comparing Cursors default model on a free plan vs Opus 4.5 on a maximum Claude plan.

I think a good exercise is to prohibit yourself from writing any code manually and force yourself to do LLM only, might sound silly but it will develop that skill-set.

Try Claude code in thinking mode with the some super powers - https://github.com/obra/superpowers

I routinely make an implementation plan with Claude and then step away for 15 mins while it spins - the results aren’t perfect but fixing that remaining 10% is better than writing 100% of it myself.

not_that_d•1mo ago

The code is quite easy to follow to be honest, we have documented a lot of stuff and segmented functionality into libraries that follow an app/feature/models pattern. Almost every service we have, has unit tests explicitly describing what the public api is doing or supposed to do on several scenarios, we never test implementation details.

Given it to new people of course carry questions, but most of them (juniors) could just follow the code given an entry point for that task, this from BE to FE.

I use the github copilot premium models available.

> I routinely make an implementation plan with Claude and then step away for 15 mins while it spins - the results aren’t perfect but fixing that remaining 10% is better than writing 100% of it myself.

I have to be honest, I just did this two times and the amount of code that needed to be fixed, and the mental overload to find open bugs was much more than just guide the LLM on every step. But this was a couple of months ago.

not_that_d•1mo ago

Besides my other response, it can also be I am not smart enough for it.

journal•1mo ago

The quality of an LLM outputs is greatly dependent on the inputs. Your brain is Swiss cheese and LLMs are a filler.

raw_anon_1111•1mo ago

Rule #1 I don’t do agentic coding. I keep my hands on the steering wheel and have it build everything up step by step, validate the code, commit. Repeat.

kasey_junk•1mo ago

With agentic coding people underestimate the agent and over estimate the models value. So it’s important to be specific. What agent are you using? You will see radical performance differences between Claude code and codex compare to copilot for instance. You will also see pretty big differences if you have well groomed, agent specific agents files. Especially if the code base is very large, the agents files need to be able to guide the agent to make connections in the code.

But other than that what I’ve found to be the most important is static tooling. Do you have rules that require tests to be run, do you have linters and code formatters that enforce your standards? Are you using well known tools (build tools, dependency management tools etc) or is that bespoke.

But the less sexy answer is that no, you can’t drop an agent cold into a big codebase and expect it to perform miracles. You need to build out agentic flows as a process that you iterate and improve on. If you prompt an agent and it gets it wrong, evaluate why and build out the tools so next time it won’t get it wrong. You slowly level up the capabilities of the tool by improving it over time.

I can’t emphasize enough the difference in agents though. I’ve been doing a lot of ab tests with copilot against other agents and it’s wild how bad it is, even backed with the same models.

kevinherron•1mo ago

The problem is you still think that the perfect prompt or AGENTS.md or whatever is going to get you a one-shotted (or close) feature in return. There isn't (yet) a model or orchestration framework that is going to take a large feature from start to finish for you.

The reality is that LLMs/agents are just a new way to write code. You still need to understand, more-or-less, how this feature is going to actually work, and how it needs to be implemented, from start to finish.

The difference is that you don't write the code, you tell the LLM to write the code. Once you've figured out the right "chunk size" an LLM can handle it's faster than doing it yourself.

I've found it's actually a little _harder_ in green field projects because the LLM doesn't have guard rails and examples and existing patterns to follow.

rokoss21•1mo ago

Yes, but only when the LLM is treated as an implementation detail, not the feature itself.

In enterprise systems, “full features” built directly on model output tend to fail at the edges: permissions, retries, validation, and auditability. The teams that succeed put a deterministic layer around the model — schemas, tool boundaries, and explicit failure handling.

Once you do that, the LLM stops being the risky part. The architecture is.

Ask HN: Opus 4.6 ignoring instructions, how to use 4.5 in Claude Code instead?

Ask HN: Anyone Using a Mac Studio for Local AI/LLM?

Ask HN: Ideas for small ways to make the world a better place

Ask HN: Non AI-obsessed tech forums

Ask HN: 10 months since the Llama-4 release: what happened to Meta AI?

Ask HN: Who wants to be hired? (February 2026)

Ask HN: Who is hiring? (February 2026)

LLMs are powerful, but enterprises are deterministic by nature

AI Regex Scientist: A self-improving regex solver

Ask HN: Non-profit, volunteers run org needs CRM. Is Odoo Community a good sol.?

Tell HN: Another round of Zendesk email spam

Ask HN: Is Connecting via SSH Risky?

Ask HN: Has your whole engineering team gone big into AI coding? How's it going?

Ask HN: How does ChatGPT decide which websites to recommend?

Ask HN: Is there anyone here who still uses slide rules?

Ask HN: Why LLM providers sell access instead of consulting services?

Ask HN: Mem0 stores memories, but doesn't learn user patterns

Kernighan on Programming

Ask HN: Is it just me or are most businesses insane?

Ask HN: What is the most complicated Algorithm you came up with yourself?

Ask HN: Anyone Seeing YT ads related to chats on ChatGPT?

Ask HN: Does global decoupling from the USA signal comeback of the desktop app?

We built a serverless GPU inference platform with predictable latency

Ask HN: Does a good "read it later" app exist?

Ask HN: Have you been fired because of AI?

Ask HN: Anyone have a "sovereign" solution for phone calls?

Ask HN: Cheap laptop for Linux without GUI (for writing)

GitHub Actions Have "Major Outage"

Ask HN: How Did You Validate?

Ask HN: Any International Job Boards for International Workers?