> With Kiro, I spend more time upfront articulating what I want to build, but then I can step back and let it execute
This sounds like exactly the kind of exercise one does to /init a project with Claude, define tasks/spec etc.
The model I want to train is ME, so a one sentence sentiment analysis offers 0 value to me while a lot of distinct human perspectives are a gold mine.
It's kinda like the difference between being able to look at a single picture of a landscape and being able to walk around it.
I think only early bit of feedback I had was in that my tasks were also writing a lot of tests, and if the feedback loop to getting test results was neater this would be insanely powerful. Something like a sandboxed terminal, I am less keen on a YOLO mode and had to keep authorising the terminal to run.
the LLM however asks me clarifying questions that i wouldnt have thought about myself. the thinking is a step or two deeper than it was before, if the LLM comes up with good questions
So Kiro wrote whatever Kiro "decided", better said, guessed, what to write about and did most of the "content generation" - a weird but fitting term to use by a machine in writing a fake human blog. And the human kind of "directed it", but we dont really know for sure, because language is our primary interface and an author should be able to express their thoughts without using a machine?
I'd happier if the author shared their actual experience in writing the software with this tool.
Is it not high quality output when a ghost writer writes someone’s work? Can I use a thesaurus? What about a translator?
As long as the person who is putting their name on the article has provided the ideas and ensures that the output matches their intent, I see no reason to draw arbitrary lines.
Translator: Only if it's not creative writing and only if the output doesn't need to be accurate / there is margin for error.
Granted, the line is hard to place - see also the other post about where the line for adult content should be. But the thing is, when writing is AI generated, how can you know for sure that all the output was actually read and verified by the author? And text is not that important, but what about code?
The other argument: if an author didn't take the time and effort to write something, why should I take the time and effort to read and understand it? This applies to poorly written content as well. Ironically, a lot of people will already feed this article (and this comment) into an LLM to summarize it, if not to help them form an opinion about it. Summarizing isn't new though, there were tools some years ago to summarize webpages already.
Do you really care if Spielberg’s team manually edits the movie or uses an AI powered video editing tool? In the end Spielberg is responsible for the end quality of the movie.
I’m not sure that’s true in an iron-clad, first-principles way. I think that many of the insights created by humans are combinations/integrations of existing concepts. That category of insight does not seem to require carbon-based reasoning.
I don’t claim that it can be achieved by statistical text generation, but I doubt the typical blog author is creating something that forever will be human-only.
This seems to be the dividing line in the AI writing debate.
If one cares about the interpersonal connection formed with the author, generally they seem to strongly dislike machine-generated content.
If one cares about the content in isolation, then generally the perceived quality is more important than authorship. "The author is dead" and all that.
Both points are valid IMO. It's okay to dislike things, and it's okay to enjoy things.
> I want new and genuine insights that only another human can create.
This is a good illustration of what I mean: you personally value the connection with the author, and you can't get a human connection when there was never a human to begin with.
If you take a look at the others in the thread who had a positive view of the work, they generally focused on the content instead.
LLMs, in contrast, experience nothing. When they "write", they are not even vaguely approximating what a human writer does.
You’re making assumptions on the quality of the article just because it’s written with the help of ai, I think that not justified in general.
You see, a movie is fictional work, but a blog article most likely isn't (or shouldn't). In this case, I am reading the article because I want to know an objective, fair assessment of Kiro from a human, not random texts generated from an LLM.
1. The constant whiplash of paragraphs which describe an amazing feature followed by paragraphs which walk it back ("The shift is subtle but significant / But I want to be clear", "Kiro would implement it correctly / That said, it wasn't completely hands-off", "The onboarding assistance was genuinely helpful / However, this is also where I encountered", "It's particularly strong at understanding / However, there are times when");
2. Bland analogies that detract from, rather than enhance, understanding ("It's the difference between being a hands-on manager who needs to check every detail versus setting clear expectations and trusting the process.", "It's like having a very knowledgeable colleague who..."); and
3. literal content-free filler ("Here's where it got interesting", "Don't let perfect be the enemy of good", "Most importantly / More importantly"), etc etc.
Kiro is a new agentic IDE which puts much more of a focus on detailed, upfront specification than competitors like Cursor. That's great. Just write about that.
What they’re describing can also be done with Claude Code, and it’s way too broad in scope to get any benefit at all from approving code before it’s written. These tools are the way for now.
Cursor is in my opinion not geared for this level of hands-off.
Can't really get value out reading this if you don't compare it to the leading coding agent
> Each tool has carved out its own niche in the development workflow: Copilot excels at enhancing your typing speed with intelligent code completion, Cursor at debugging and helping you implement discrete tasks well, and recently pushing more into agentic territory.
Cursor's autocomplete blows Copilot's out of the water. And both Copilot and Cursor have pretty capable agents. Plus, Claude Code isn't even mentioned here.
This blog post is a Kiro advertisement, not a serious comparative analysis.
For the most part it’s unlimited right now. Vs Code’s Copilot Agent mode is basically the same thing , tell it to write a list of tasks , but I have to pay for it.
I’m much happier with both of these options, both are much cheaper than Claude Code.
IMO the real race is to get LLM cost down. Someone smarter than me is going to figure out how to run a top LLM model for next to nothing.
This person will be a billionaire. Nvidia and AMD are probably already working on it. I want Deepseek running on a 100$ computer that uses a nominal amount of power.
It's similar to how computing used to be restricted to mega corps 100 years ago, but today, a smartphone has more computing power than any old age mainframe. Today we need Elon Musk to buy 5 million GPUs to train a model. Tomorrow, we should be able to train a top of the line model using a budget RTX card.
I don't need my code assistant to be an expert on Greek myths. The future is probably highly specialized mini llms. I might train a model to code my way.
I'm not that smart enough to figure this out, but the solution can't be to just brute force training with more gpus.
There is another answer.
It wrote around 5000 LOC including tests and they... worked. It didn't look as nice as I would have liked, but I wasn't able to break it. However, 5000 lines was way too much code for such a simple task, the solution was over-engineered along every possible axis. I was able to (manually) get it down to ~800LOC without losing any important functionality.
This is funny. Why would you a) care how many LOC it generated and b) bother injecting tedious, manual process into something otherwise fully automated?
Also because it was an experiment. I wanted to see how it would do and how reasonable the code it wrote was.
Another reason to favour shorter code.
It sounds like a very PM type approach to coding.
Does that mean it fits PM types more than IC dev types?
I personally find code review more exhausting than code writing, and that goes 50x for when I'm reviewing code from an intern because they are trying to cheat me all the time. And I have always hated PM stuff, on both sides of that relationship
> What I found interesting is how it forced me to think differently about the development process itself. Instead of jumping straight into code, I found myself spending more time articulating what I actually wanted to build and high level software architectural choices.
This is what I already do with Claude Code. Case in point, I spent 2.5 hours yesterday planning a new feature - first working with an agent to build out the plan, then 4 cycles of having that agent spit out a prompt for another agent to critique the plan and integrate the feedback.
In the end, once I got a clean bill of health on the plan from the “crusty-senior-architect” agent, I had Claude build it - took 12 minutes.
Two passes of the senior-architect and crusty-senior-architect debating how good the code quality was / fixing a few minor issues and the exercise was complete. The new feature worked flawlessly. It took a shade over 3 hours to implement what would have taken me 2 days by myself.
I have been doing this workflow a while, but Claude Code released Agents yesterday (/agents) and I highly recommend them. You can define an agent on the basis of another agent, so crusty-architect is a clone of my senior-architect but it’s never happy unless code was super simple, maintainable, and uses well established patterns. The debates between the two remind me of sitting in conf rooms hashing an issue out with a good team.
....
That reminds me of when my manager (a very smart, very AI-bullish ex-IC) told us about how he used AI to implement a feature over the weekend and all it took him was 20 mins. It sounds absolutely magical to me and I make a note to use AI more. I then go to review the PR, and of course there are multiple bugs and unintended side-effects in the code. Oh and there are like 8 commits spread over a 60 hour window... I manually spin up a PR which accomplishes the same thing properly... takes me 30mins.
How long does it typically take to spec something out? I'd say more than 20 mins, and typical artifacts to define requirements are much lossier than actual code - even if that code is buggy and sloppy.
What was claimed was that a complete feature was built in record time with AI. What was actually built was a useless and buggy piece of junk that wasted reviewer time and was ultimately thrown out, and it took far longer than claimed.
There were no useful insights or speed up coming out of this code. I implemented the feature from scratch in 30 mins - because it was actually quite easy to do manually (<100 loc).
You're bringing up various completely unrelated factors seemingly as a way of avoiding the obvious point of the anecdotal story - that AI for coding just isn't that great (yet).
What I have noticed is the forcing function of needing to think through technical and business considerations of ones work up front, which can be tedious if you are the type that likes to jump in and hack at it.
For many types of coding needs, that is likely the smarter and ultimately more efficient approach. Measure twice, cut once.
What I have not yet figured out is how to reduce the friction in the UX of that process to make it more enjoyable. Perhaps sprinkling in some dopamine triggering gamification to answering questions.
Thanks for the tip!
I've been attempting to do this kind of thing manually w/ mcp - took a look at "claude swarm" https://github.com/parruda/claude-swarm - but in the short time I spent on it I wasn't having much success - admittedly I probably went a little too far into the "build an entire org chart of agents" territory
[EDIT]: looks like I should be paying attention to the changelog on the gh repo instead of the release notes
https://github.com/anthropics/claude-code/blob/main/CHANGELO...
[EDIT 2]: so far this seems to suffer from the same problem I had in my own attempts which is that I need to specifically tell it to use an agent when I would really like it to just figure that out on its own
like if I created an agent called "code-reviewer" and then I say - "review this code" ... use the agent!
I found the experience of using Kiro and replit really similar with one important difference: replit mostly worked.
Kiro tore off and wrote tonnes of code and tests. It asked me to peck at approval requests (one thing I liked was the regexp type trust this tool to do this requests) it spent half a day creating an app to do what I asked... and it was incomprehensible bullshit. I couldn't do anything with the project and I have not touched it since.
Replit was a bit more interactive, but pretty autonomous, and it got me to 90% of the solution and then stalled out - wouldn't correct some of the problems I identified to it. About 2 hrs with Cursor sorted that out though.
I did use Cursor to do it "AI assisted" and that took about the same amount of time. The advantage is that I really do know whats going on in the code base, but the Replit + Cursor solution is actually better in the sense that it looks better, and works better because the agent did some bits a bit more nicely than I did with Cursor - so I got those ideas for free.
Anyway :
Hand coding = a walk through the wilderness
Cursor = motor cross scrambler bike up the mountain
Replit = a helicopter ride to somewhere higher up the mountain selected at random that you didn't know about but now you have to get to the peak by yourself buddy, good luck
Kiro = you are blindfolded in a container of some sort and it's moving.
> Instead of jumping straight into code, I found myself spending more time articulating what I actually wanted to build and high level software architectural choices.
I don't want to sound rude, but isn't this is something that happens to you with experience after some years?
How can you even be a senior developer without "spending more time articulating what I actually wanted to build and high level software architectural choices"?
motbus3•1d ago
creakingstairs•23h ago