https://news.ycombinator.com/item?id=44627910
Basically environments/platforms that gives all the knobs,levers,throttles to humans while being tightly integrated with AI capabilities. This is hard work that goes far beyond a VSCode fork.
I would love to see other interfaces other than chats for interacting with AI.
Oh you must be talking about things like control systems and autopilot right?
Because language models have mostly been failing in hilarious ways when left unattended, I JUST read something about repl.it ...
I know that we can modify CLAUDE.md and maintain that as well as docs. But it would be awesome if CC had something built in for teams to collaborate more effectively
Suggestions are welcomed
Perhaps it could be implemented as a tool? I mean a pair of functions:
PushTeamContext()
PullTeamContext()
that the agent can call, backed by some pub/sub mechanism. It seems very complicated and I'm not sure we'd gain that much to be honest.Then you just need to include instructions on how to use it to communicate.
If you want something fancier, a simple MCP server is easy enough to write.
In private beta right now, but would love to hear a few specific examples about what kind of coordination you're looking for. Email hi [at] nmn.gl
That might have been what they tested at IMO.
The problem with unattended AI in these situations is precisely the lack of context, awareness, intuition, intention, and communication skills.
If you want automation in your disaster recovery system you want something that fails reliably and immediately. Non-determinism is not part of a good plan. Maybe it will recover from the issue or maybe it will delete the production database and beg for forgiveness later isn't what you want to lean on.
Humans have deleted databases before and will again, I'm sure. And we have backups in place if that happens. And if you don't then you should fix that. But we should also fix the part of the system that allows a human to accidentally delete a database.
But an AI could do that too! No. It's not a person. It's an algorithm with lots of data that can do neat things but until we can make sure it does one particular thing deterministically there's no point in using it for critical systems. It's dangerous. You don't want a human operator coming into a fire and the AI system having already made the fire worse for you... and then having to respond to that mess on top of everything else.
An extreme example: nuclear reactors. You don't want untrained people walking into a fire with the expectation that they can manage the situation.
Less extreme example: financial systems. You don't want untrained people walking into a fire losing your customers' funds and expect them to manage the situation.
But there are plenty of active investigative steps you'd want to take in generating hypotheses for an outage. Weakly's piece strongly suggests AI tools not take these actions, but rather suggest them to operators. This is a waste of time, and time is the currency of incident resolution.
1) I can't remember the last time I write something meaningfully long with an actual pen/pencil. My handwriting is beyond horrible.
2) I can't no longer find my way driving without a GPS. Reading a map? lol
That's a skill that depends on motor functions of your hands, so it makes sense that it degrades with lack of practice.
> I can't no longer find my way driving without a GPS. Reading a map? lol
Pretty sure what that actually means in most cases is "I can go from A to B without GPS, but the route will be suboptimal, and I will have to keep more attention to street names"
If you ever had a joy of printing map quest or using a paper map, I'm sure you still these people skill can do, maybe it will take them longer. I'm good at reading mall maps tho.
The last time I dealt with integrals by hand or not was before node.js was announced (just a point in time).
Sure, you can probably forget a mental skill from lack of practicing it, but in my personal experience it takes A LOT longer than for a motor skill.
Again, you're still writing code, but with a different tool.
I wonder if you’d make this kind of mistake writing by hand
Most people would still be able to. But we fantasize about the usefulness of maps. I remember myself on the Paris circular highway (at the time 110km/h, not 50km/h like today), the map on the driving wheel, super dangerous. You say you’d miss GPS features on a paper map, but back then we had the same problems: It didn’t speak, didn’t have the blinking position, didn’t tell you which lane to take, it simplified details to the point of losing you…
You won’t become less clever with AI: You already have Youtube for that. You’ll just become augmented.
A 1990s driver without a map is probably a lot more capable of muddling their way to the destination than a 2020s driver without their GPS.
That's the right analogy. Whether you think it matters how well people can navigate without GPS in a world of ubiquitous phones (and, to bring the analogy back, how well people will be able to program without an LLM after a generation or two of ubiquitous AI) is, of course, a judgment call.
Course, then there's lovable, which spits out the front-end I describe, which it is very impressively good at. I just want a starting point, then I get going, if I get stuck I'll ask clarifying questions. For side projects where I have limited time, LLMs are perfect for me.
If AI tools continue to improve, there will be less and less need for humans to write code. But -- perhaps depending on the application -- I think there will still be need to review code, and thus still need to understand how to write code, even if you aren't doing the writing yourself.
I imagine the only way we will retain these skills is be deliberately choosing to do so. Perhaps not unlike choosing to read books even if not required to do so, or choosing to exercise even if not required to do so.
Maybe, but I don't think it's that easy.
I don't know what future we're looking at. I work in aerospace, and being around more safety-critical software, I find it hard to fathom just giving up software development to non-deterministic AI tools. But who knows? I still foresee humans being involved, but in what capacity? Planning and testing, but not coding? Why? I've never really seen coding being the bottleneck in aerospace anyway; code is written more slowly here than in many other industries due to protocols, checks and balances. I can see AI-assisted programming being a potentially splendid idea, but I'm not sold on AI replacing humans. Some seem to be determined to get there, though.
I like this zooming in and zooming out, mentally. At some point i can zoom out another level. I miss coding. While i still code a lot.
I dare say there are more individuals who have soldered something today than there were 100 years ago.
People say the same thing about code but there's been a big conflation between "writing code" and "thinking about the problem". Way too often people are trying to get AI to "think about the problem" instead of simply writing the code.
For me, personally, the writing the code part goes pretty quick. I'm not convinced that's my bottleneck.
My issue with applying this reasoning to AI is that prior technologies addressed bottlenecks in distribution, whereas this more directly attacks the creative process itself. Stratechery has a great post on this, where he argues that AI is attempting to remove the "substantiation" bottleneck in idea generation.
Doing this for creative tasks is fine ONLY IF it does not inhibit your own creative development. Humans only have so much self-control/self-awareness
I also think that even with expertise, people relying too much on AI are going to erode their expertise
If you can lift heavy weights, but start to use machines to lift instead, your muscles will shrink and you won't be able to lift as much
The brain is a muscle it must be exercised to keep it strong too
assuming you were referencing "bicycle for the mind"
So if the printing press stunted our writing what will the thinking press stunt.
https://gizmodo.com/microsoft-study-finds-relying-on-ai-kill...
It's being an executor for those who doesn't think but can make up rules and laws.
A better analogy than the printing press, would be synthesizers. Did their existence kill classical music? Does modern electronic music have less creativity put into it than pre-synth music? Or did it simply open up a new world for more people to express their creativity in new and different ways?
"Code" isn't the form our thinking must take. To say that we all will stunt our thinking by using natural language to write code, is to say we already stunted our thinking by using code and compilers to write assembly.
Importing an external library into your code is like using a player piano.
Heck, writing in a language you didn't personally invent is like using a player piano.
Using AI doesn't make someone "not a programmer" in any new way that hasn't already been goalpost-moved around before.
Do you actually believe that any arbitrary act of writing is necessarily equivalent in creative terms to flipping a switch on a machine you didn't build and listening to it play music you didn't write? Because that's frankly insane.
Importing a library someone else wrote basically is flipping a switch and getting software behavior you didn't write.
Frankly I don't see a difference in creative terms between writing an app that does <thing> that relies heavily on importing already-written libraries for a lot of the heavy lifting, and describing what you have in mind for <thing> to an LLM in sufficient detail that it is able to create a working version of whatever it is.
Actually can see an argument that both of those are also potentially equal, in creative terms, to writing the whole thing from scratch. If the author's goal was to write beautiful software, that's one thing, but if the author's goal is to create <thing>? Then the existence and characteristics of <thing> is the measure of their creativity, not the method of construction.
> If the author's goal was to write beautiful software, that's one thing, but if the author's goal is to create <thing>? Then the existence and characteristics of <thing> is the measure of their creativity, not the method of construction.
What you are missing is that the nature of a piece of art (for a very loose definition of 'art') made by humans is defined as much by the process of creating it (and by developing your skills as an artist to the point where that act of creation is possible) as by whatever ideas you had about it before you started working on it. Vastly more so, generally, if you go back to the beginning of your journey as an artist.
If you just use genai, you are not taking that journey, and the product of the creative process is not a product of your creative process. Therefore, said product is not descended from your initial idea in the same way it would have been if you'd done the work yourself.
You could hook both of those things up to servos and make a machine do it, but it's the notes being played that are where creativity comes in.
I've liked some AI generated music, and it even fooled me for a little while but only up to a point, because after a few minutes it just feels very "canned". I doubt that will change, because most good music is based on human emotion and experience, something an "AI" is not likely to understand in our lifetimes.
The concept that "every augmentation is an amputation" is best captured in Chapter 4, "THE GADGET LOVER: Narcissus as Narcosis." The chapter explains that any extension of ourselves is a form of "autoamputation" that numbs our senses.
Technology as "Autoamputation": The text introduces research that regards all extensions of ourselves as attempts by the body to maintain equilibrium against irritation. This process is described as a kind of self-amputation. The central nervous system protects itself from overstimulation by isolating or "amputating" the offending function. This theory explains "why man is impelled to extend various parts of his body by a kind of autoamputation".
The Wheel as an Example: The book uses the wheel as an example of this process. The pressure of new burdens led to the extension, or "'amputation,'" of the foot from the body into the form of the wheel. This amplification of a single function is made bearable only through a "numbness or blocking of perception".
etc
If you write it by hand you don't need to "learn it thoroughly", you wrote it
There is no way you understand code between by reading it than by creating it. Creating it is how you prove you understand it!
For beginners my I think this is a very important step in learning how to break down problems (into smaller components) and iterating.
Which theoretically could actually be a benefit someday: if your company does many similar customer deployments, you will eventually be more efficient. But if you are doing custom code meant just for your company... there may never be efficiency increase
I imagine people can start making code (probably already are) where functions/modules are just boxes as a UI and the code is not visible, test it with in/out, join it to something else.
When I'm tasked to make some CRUD UI I plan out the chunks of work to be done in order and I already feel the rote-ness of it, doing it over and over. I guess that is where AI can come in.
But I do enjoy the process of making something even like a POSh camera GUI/OS by hand..
What is different about LLM-created code is that compilers work. Reliably and universally. I can just outsource the job of writing the assembly to them and don't need to think about it again. (That is, unless you are in one of those niches that require hyper-optimized software. Compilers can't reliably give you that last 2x speed-up.)
LLMs by their turn will never be reliable. Their entire goal is opposite to reliability. IMO, the losses are still way higher than the gains, and it's questionable if this is an architectural premise that will never change.
For me, refactoring is really the essence of coding. Getting the initial version of a solution that barely works —- that’s necessary but less interesting to me. What’s interesting is the process of shaping that v1 into something that’s elegant and fits into the existing architecture. Sanding down the rough edges, reducing misfit, etc. It’s often too nitpicky for an LLM to get right.
On the other hand I do a lot more fundamental coding than the median. I do quite a few game jams, and I am frequently the only one in the room who is not using a game engine.
Doing things like this I have written so many GUI toolkits from scratch now that It's easy enough for me to make something anew in the middle of a jam.
For example https://nws92.itch.io/dodgy-rocket In my experience it would have been much harder to figure out how to style scrollbars to be transparent with in-theme markings using an existing toolkit than writing a toolkit from scratch. This of course changes as soon as you need a text entry field. I have made those as well, but they are subtle and quick to anger.
I do physics engines the same way, predominantly 2d, (I did a 3d physics game in a jam once but it has since departed to the Flash afterlife). They are one of those things that seem magical until you've done it a few times, then seem remarkably simple. I believe John Carmack experienced that with writing 3d engines where he once mentioned quickly writing several engines from scratch to test out some speculative ideas.
I'm not sure if AI presents an inhibiter here any more than using an engine or a framework. They both put some distance between the programmer and the result, and as a consequence the programmer starts thinking in terms of the interface through which they communicate instead of how the result is achieved.
On the other hand I am currently using AI to help me write a DMA chaining process. I initially got the AI to write the entire thing. The final code will use none of that emitted output, but it was sufficient for me to see what actually needed to be done. I'm not sure if I could have done this on my own, AI certainly couldn't have done it on it's own. Now that I have (almost (I hope)) done it once in collaboration with AI, I think I could now write it from scratch myself should I need to do it again.
I think AI, Game Engines, and Frameworks all work against you if you are trying to do something abnormal. I'm a little amazed that Monument Valley got made using an engine. I feel like they must have fought the geometry all the way.
I think this jam game I made https://lerc.itch.io/gyralight would be a nightmare to try and implement in an engine. Similarly I'm not sure if an AI would manage the idea of what is happening here.
If these are the priors why would I keep reading?
Why even ask this question?
AI requires a holistic revision. When the OS's catch up, we'll have some real fun.
The author is good to call out the differences in UX. Sad that design has always been given less attention.
When I first saw the title, my initial thought was this may relate to AX, which I think compliments the topic very well: https://x.com/gregisenberg/status/1947693459147526179
A simple UX change makes the difference between education and dumbing users of your service.
The reason I think that is because it often ask about things I already took great care to explicitly type out. I honestly don't think those extra questions add much to the actually searching it does.
I definetly sometimes ask really specialized questions and in that case i just say "do the search" and ignore the questions, but a lot of times it helps me determine what i am really asking.
I suspect people with execellent communication abilities might find less utility from the questions
I think it'll be like driving: the automatic transmission, power brakes, and other tech made it more accessible but in the process we forgot how to drive. that doesn't mean nobody owns a manual anymore, but it's not a growing % of all drivers
That combined with having to manually do it has helped me be able to learn how to do things on my own, compared to when I just copy paste or use agents.
And the more concepts you can break things in to, the better. From now on, I’ve started projects working with AI to make “phases” for projects for testability, traceability, and over understanding
My defacto has become using AI on my phone with pictures of screens and voicing questions, to try to force myself to use it right. When you can’t mindlessly copy paste, even though it might feel annoying in the moment, the learning that happens from that process saves so much time later from hallucination-holes!
I've spent the last 15 years doing R&D on (non-programmer) domain-expert-augmenting ML applications and have never delivered an application that follows the principles the author outlines. The fact that I have such a different perspective indicates to me that the design space is probably massive and it's far too soon to say that any particular methodology is "backwards." I think the reality is we just don't know at this point what the future holds for AI tooling.
But I agree that the space is wide enough that different interpretations arise depending on where we stand.
However, I still find it good practice to keep humans (and their knowledge/retrieval) as much in the loop as possible.
* Sophisticated find and replace i.e. highlight a bunch of struct initalisations and saying "Convert all these to Y". (Regex was always a PITA for this, though it is more deterministic.)
* When in an agentic workflow, treating it as a higher level than ordinary code and not so much as a simulated human. I.e. the more you ask it to do at once, the less it seems to do it well. So instead of "Implement the feature" you'd want to say "Let's make a new file and create stub functions", "Let's complete stub function 1 and have it do x", "Complete stub function 2 by first calling stub function 1 and doing Y", etc.
* Finding something in an unfamiliar codebase or asking how something was done. "Hey copilot, where are all the app's routes defined?" Best part is you can ask a bunch of questions about how a project works, all without annoying some IRC greybeard.
The fulcrum of Weakly's argument is that agents should stay in their lane, offering helpful Clippy-like suggestions and letting humans drive. But what exactly is the value in having humans grovel through logs to isolate anomalies and create hypotheses for incidents? AI tools are fundamentally better at this task than humans are, for the same reason that computers are better at playing chess.
What Weakly seems to be doing is laying out a bright line between advising engineers and actually performing actions --- any kind of action, other than suggestions (and only those suggestions the human driver would want, and wouldn't prefer to learn and upskill on their own). That's not the right line. There are actions AI tools shouldn't perform autonomously (I certainly wouldn't let one run a Terraform apply), but there are plenty of actions where it doesn't make sense to stop them.
The purpose of incident resolution is to resolve incidents.
Also:
> some people will want to work the way she spells out, especially earlier in their career
If you're going to be insulting by implying that only newbies should be cautious about AI preventing them from learning, be explicit about it.
I disagree with you that incident responders learn best by e.g. groveling through OpenSearch clusters themselves. In fact, I think the opposite thing is true: LLM agents do interesting things that humans don't think to do, and also can put more hypotheses on the table for incident responders to consider, faster, rather than the ordinary process of rabbitholing serially down individual hypothesis, 20-30 minutes at a time, never seeing the forest for the trees.
I think the same thing is probably true of things like "dumping complicated iproute2 routing table configurations" or "inspecting current DNS state". I know it to be the case for LVM2 debugging†!
Note that these are all active investigation steps, that involve the LLM agent actually doing stuff, but none of it is plausibly destructive.
† Albeit tediously, with me shuttling things to and from an LLM rather than an agent doing things; this sucks, but we haven't solved the security issues yet.
Consider, by way of example, the classic problem of teaching someone to find information. If someone asks "how do I X" and you answer "by doing Y", they have learned one thing (and will hopefully retain it). If someone asks "how do I X" and you answer "here's the search I did to find the answer of Y", they have now learned two things, and one of them reinforces a critical skill they should be using throughout their career.
I am not suggesting that incident response should be done entirely by hand, or that there's zero place for AI. AI is somewhat good at, for instance, looking at a huge amount of information at once and pointing towards things that might warrant a closer look. I'm nonetheless agreeing with the point that the human should be in the loop to a large degree.
That also partly addresses the fundamental security problems of letting AI run commands in production, though in practice I do think it likely that people will run commands presented to them without careful checking.
> none of it is plausibly destructive
In theory, you could have a safelist of ways to gather information non-destructively. In practice, it would not surprise me at all if pople don't. I think it's very likely that many people will deploy AI tools in production and not solve any of the security issues, and incidents will result.
I am all for the concept of having a giant dashboard that collects and presents any non-destructive information rapidly. That tool is useful for a human, too. (Along with presenting the commands that were used to obtain that information.)
I don't see you materially disagreeing with me about anything. I read Weakly to be saying that AI incident response tools --- the main focus of her piece --- should operate with hands tied behind their back, delegating nondestructive active investigation steps back to human hands in order to create opportunities for learning. I think that's a bad line to draw. In fact, I think it's unlikely to help people learn --- seeing the results of investigative steps all lined up next to each other and synthesized is a powerful way to learn those techniques for yourself.
I think the point the article is making is to observe the patterns humans (hopefully good ones) follow to resolve issues and build paths to make that quicker.
So at first the AI does almost nothing, it observes that in general the human will search for specific logs. If it observes that behaviour enough it then, on its own or through a ticket, builds a Ui flow that enables that behaviour. So now it doesn’t search the log but offers a button to search the log with some prefilled params.
The human likely wanted to perform that action and it has now become easier.
This reinforces good behaviour if you don’t know the steps usually followed and doesn’t pigeonhole someone into an action plan if it is unrelated.
Is this much much harder, yes it is than just building an agent that does X. But it’s a significantly better tool because it doesn’t have humans lose the ability to reason about the process. It just makes them more efficient.
One intuitive way to think about this is that any human operator is prepared to bring a subset of investigative approaches to bear on a problem; they've had exposure to a tiny subset of all problems. Meanwhile, agents have exposure to a vast corpus of diagnostic case studies.
Further, agents can quickly operate on higher-order information: a human attempting to run down an anomaly first has to think about where to look for the anomaly, and then decide to do further investigation based on it. An AI agent can issue tool calls in parallel and quickly digest lots of information, spotting anomalies without any real intentionality or deliberation, which then get fed back into context where they're reasoned about naturally as if they were axioms available at the beginning of the incident.
As a simple example: you've got a corrupted DeviceMapper volume somewhere, you're on the host with it, all you know is you're seeing dmesg errors about it; you just dump a bunch of lvs/dmsetup output into a chat window. 5-10 seconds later the LLM is cross referencing lines and noticing block sizes aren't matching up. It just automatically (though lossily) spots stuff like this, in ways humans can't.
It's important to keep perspective: the value add here is that AI tools can quickly, by taking active diagnostic steps, surface several hypotheses about the cause of an incident. I'm not claiming they one-shot incidents, or that their hypotheses all tend to be good. Rather, it's just that if you're a skilled operator, having a menu of instantly generated hypotheses to start from, diligently documented, is well worth whatever the token cost is to generate it.
> Humans learn collectively and innovate collectively via copying, mimicry, and iteration on top of prior art. You know that quote about standing on the shoulders of giants? It turns out that it's not only a fun quote, but it's fundamentally how humans work.
Creativity is search. Social search. It's not coming from the brain itself, it comes from the encounter between brain and environment, and builds up over time in the social/cultural layer.
That is why I don't ask myself if LLMs really understand. As long as they search, generating ideas and validating them in the world, it does not matter.
It's also why I don't think substrate matters, only search does. But substrate might have to do with the search spaces we are afforded to explore.
i do wonder if you could make a prompt to force your LLM to always respond like this and if that would already be a sort of dirty fix... im not so clever at prompting yet :')
MS Clippy was the AI tool we should all aspire to build
I was very happy to see AWS release Kiro. It was quite validating to me seeing them release it and follow up with discussions on how this methodology of integrating AI with software development was effective for them
However, I could not help but get caught up on this totally bonkers statement, which detracted from the point of the article:
> Also, innovation and problem solving? Basically the same thing. If you get good at problem solving, propagating learning, and integrating that learning into the collective knowledge of the group, then the infamous Innovator’s Dilemma disappears.
This is a fundamental misunderstanding of what the innovator's dilemma is about. It's not about the ability to be creative and solve problems, it is about organizational incentives. Over time, an incumbent player can become increasingly disincentivized from undercutting mature revenue streams. They struggle to diversify away from large, established, possibly dying markets in favor of smaller, unproven ones. This happens due to a defensive posture.
To quote Upton Sinclair, "it is difficult to get a man to understand something when his salary depends upon his not understanding it." There are lots of examples of this in the wild. One famous one that comes to mind is AT&T Bell Labs' invention of magnetic recording & answering machines that AT&T shelved for decades because they worried that if people had answering machines, they wouldn't need to call each other quite so often. That is, they successfully invented lots of things, but the parent organization sat on those inventions as long as humanly possible.
We are in the middle of peer vs pair sort of abstraction. Is the peer reliable enough to be delegated the task? If not, the pair design pattern should be complementary to human skill set. I sensed the frustration with ai agents came from being not fully reliable. That means a human in the loop is absolutely needed, and if there is a human, dont have ai being good at what human can do, instead it be good assistant by doing things human would need. I agree on that part, though if reliability is ironed out, for most of my tasks, i am happy ai can do the whole thing. Other frustrations stem from memory or lack of(in research), hallucinations and overconfidence, lack of situational awareness (somehow situational awareness is what agents market themselves on). If these are fixed, treating agents as a pair vs treating agents as a peer might tilt more towards the peer side.
askafriend•6h ago