I still have some parts of the old Rei-net forum archived on an external somewhere.
SirPugger's youtube channel has loads of videos monitoring various bot farms.
Biggest downside was it's inability to see (literally), getting lists of interact-able game objects, NPCs, etc was fine when it decided to do something that didn't require any real-time input. Sailing, or anything that required it to react to what's on screen was pretty much impossible without more tooling to manage the reacting part for it (e.g. tool to navigate automatically to some location).
The only thing is you would need a description of the worlmap on each tick (i.e. where npcs are, where objects are, where players are)
Gemini models are a little bit better about spatial reasoning, but we’re still not there yet because these models were not designed to do spatial reasoning they were designed to process text
In my development, I also use the ascii matrix technique.
It really seems to me that the first AI company getting to implement "spatial awareness" vector tokens and integrating them neatly with the other conventional text, image and sound tokens will be reaping huge rewards. Some are already partnering with robot companies, it's only a matter of time before one of those gets there.
As far as 3d I don't have experience however it could be quite awful at that
what a world!
People don’t appreciate what they have
Also out of nowhere an invasive species of spiders that was inside the seed starts replicating geometrically and within seconds wraps the whole forest with webs and asks for a ransom in order to produce the secret enzyme that can dissolve it. Trying to torch it will set the whole forest on fire, brute force is futile. Unfortunately, you assumed the process would only plagiarize the good bits, but seems like it also sometimes plagiarizes the bad bits too, oops.
Code is a project that has to be updated, fixed, etc.
So when something breaks - you have to ask the contractor again. It may not find an issue, or mess things up when it tries to fix it making project useless, etc.
Its more like a car. Every time something goes wrong you will pay for it - sometimes it will get back in even worse shape (no refunds though), sometimes it will cost you x100 because there is nothing you can do, you need it and you can't manage it on your own.
LLM outputs are akin to a mutant tree that can decide to randomly sprout a giant mushroom instead of a branch. And you won't have any idea why despite your input parameters being deterministic.
I am not talking about controllable/uncontrollable variables. That has no bearing on whether a process is deterministic in theory or not. If you can theoretically control all variables (even if you practically cannot), you have a deterministic process as you can reproduce the entire path: from input to output. LLMs are currently a black box. You have no access to the billions of parameters within the model, making it non-deterministic. The day we have tools where we can control all the billions of parameters within the model, then we can retrace the exact path taken, thereby making it deterministic.
> You haven't done a lot of gardening if you don't know plants
I grow "herbs".
> there's a biological explanation
Exactly. There is always an explanation for every phenomena that occurs in this observable, physical World. There is a defined cause and effect. Even if it "feels random". That's not how it is with LLMs. Because in between your deterministic input parameters and the output that is generated, there is a black box: the model itself. You have no access to the billions of parameters within the models which means you are not sure you can always reproduce the output. That black box is what causes non-determinism.
EDIT: just wanted to add - "attacked by parasites all the time", is why I said if you have control over the environment. Controlling environment encompasses dealing with parasites as well. Think of well-controlled environment like a lab.
A machine generating code you don't understand is not the way to learn a programming language. It's a way to create software without programming.
These tools can be used as learning assistants, but the vast majority of people don't use them as such. This will lead to a collective degradation of knowledge and skills, and the proliferation of shoddily built software with more issues than anyone relying on these tools will know how to fix. At least people who can actually program will be in demand to fix this mess for years to come.
Just like how we still need assembly and C programmers for the most critical use cases, we'll still need Python and Golang programmers for things that need to be more efficient than what was vibe coded.
But do you really need your $whatever to be super efficient, or is it good enough if it just works?
That's not what determinism means though. A human coding something, irrespective of whether the code is right or wrong, is deterministic. We have a well defined cause and effect pathway. If I write bad code, I will have a bug - deterministic. If I write good code, my code compiles - still deterministic. If the coder is sick, he can't write code - deterministic again. You can determine the cause from the effect.
Every behavior in the physical World has a cause and effect chain.
On the other hand, you cannot determine why a LLM hallucinated. There is no way to retrace the path taken from input parameters to generated output. At least as of now. Maybe it will change in the future where we have tools that can retrace the path taken.
People are inherently nondeterministic.
The code they (and AI) writes, once written, executes deterministically.
very rarely :)
Agreed.
> People are inherently nondeterministic.
We are getting into the realm of philosophy here. I, for one, believe in the idea of living organisms having no free will (or limited will to be more precise. but can also go so far as to say "dependent will"). So one can philosophically explain that people are deterministic, via concepts of Karma and rebirth. Of course none of this can be proven. So your argument can be true too.
> The code they (and AI) writes, once written, executes deterministically.
Yes. Execution is deterministic. I am however talking only about determinism in terms of being able to know the entire path: input to output. Not just the outputs characteristic (which is always going to be deterministic). It is the path from input to output that is not deterministic due to presence of a black box - the model.
If you consider a human programmer as a "black box", in the sense that you feed it a set of inputs—the problem that needs to be solved, vague requirements, etc.—and expect a functioning program as output that solves the problem, then that process is similarly nondeterministic as an LLM. Ensuring that the process is reliable in both scenarios boils down to creating detailed specifications, removing ambiguity, and iterating on the product until the acceptance tests pass.
Where I think there is a disconnect is that humans are far more capable at producing reliable software given a fuzzy set of inputs. First of all, they have an understanding of human psychology, and can actually reason about semantics in ways that a pattern matching and token generation tool cannot. And in the best case scenario of experienced programmers, they have an intuitive grasp of the problem domain, and know how to resolve ambiguities in meatspace. LLMs at their current stage can at best approximate these capabilities by integrating with other systems and data sources, so their nondeterminism is a much bigger problem. We can hope that the technology will continue to improve, as it clearly has in the past few years, but that progress is not guaranteed.
> Where I think there is a disconnect is that humans are far more capable at producing reliable software given a fuzzy set of inputs.
Yes true. Another thought that comes to my mind is I feel it might also have to do with us recognizing other humans as not as alien to us as LLMs are. So there is an inherent trust deficit when it comes to LLMs vs when it comes to humans. Inherent trust in human beings, despite being less capable, is what makes the difference. In everything else we inherently want proper determinism and trust is built on that. I am more forgiving if a child computes 2 + 1 = 4, and will find it in me to correct the child. I won't consider it a defect. But if a calculator computes 2 + 1 = 4 even once, I would immediately discard it and never trust it again.
> We can hope that the technology will continue to improve, as it clearly has in the past few years, but that progress is not guaranteed.
Agreed.
The investors who paid for the CEO who hired your project manager to hire you to figure that out, didn't.
I think in this analogy, vibe coders are project managers, who may indeed still benefit from understanding computers, but when they don't the odds aren't anywhere near as poor as a lottery. Ignorance still blows up in people's faces. I'd say the analogy here with humans would be a stereotypical PHB who can't tell what support the dev needs to do their job and then puts them on a PIP the moment any unclear requirement blows up in anyone's face.
I have no idea how an i386 works, let alone a modern cpu. Sure there are registers and different levels of cache before you get to memory.
My lack of knowledge of all this doesn’t prevent me from creating useful programs using higher abstraction layers like c.
There was a time when you had to know ‘as’, ‘ld’ and maybe even ‘ar’ to get an executable.
In the early days of g++, there was no guarantee the object code worked as intended. But it was fun working that out and filing the bug reports.
This new tool is just a different sort of transpiler and optimiser.
Treat it as such.
And, yes, I'm aware that most compilers are not entirely deterministic either, but LLMs are inherently nondeterministic. And I'm also aware that you can tweak LLMs to be more deterministic, but in practice they're never deployed like that.
Besides, creating software via natural language is an entirely different exercise than using a structured language purposely built for that.
We're talking about two entirely different ways of creating software, and any comparison between them is completely absurd.
Meanwhile, 9front users have read at least the plan9 intro and know about nm, 1-9c, 1-9l and the like. Wibe coders will be put on their place sooner or later. It´s just a matter of time.
They can function kind-of-the-same in the sense that they can both change things written in a higher level language into a lower level language.
100% different in every other way, but for coding in some circumstances if we treat it as a black box, LLMs can turn higher level pseudocode into lower level code (inaccurately), or even transpile.
Kind of like how email and the postal service can be kind of the same if you look at it from a certain angle.
But they're not the same at all, except somewhat by their end result, in that they are both ways of transmitting information. That similarity is so vague that comparing them doesn't make sense for any practical purpose. You might as well compare them to smoke signals at that point.
It's the same with LLMs and programming. They're both ways of producing software, but the process of doing that and even the end result is completely different. This entire argument that LLMs are just another level of abstraction is absurd. Low-Code/No-Code tools, traditional code generators, meta programming, etc., are another level of abstraction on top of programming. LLMs generate code via pattern matching and statistics. It couldn't be more different.
No, there wasn't: you could just run the shell script, or (a bit later) the makefile. But there were benefits to knowing as, ld and ar, and there still are today.
This is trivially true. The constraint for anything you do in your life is time it takes to know something.
So the far more interesting question is: At what level do you want to solve problems – and is it likely that you need knowledge of as, ld and ar over anything else, that you could learn instead?
That post is only true in the most vacuous sense.
“A similar project” discovered where, on BITNET?
"A similar project" as in: this isn't the first piece of software ever written, and many previous examples can be found on the computer you're currently using. Skim through them until you find one with a source file structure you like, then ruthlessly cannibalise its build script.
Everyone else are deluding themselves. Even the 9front intro requieres you to at least know the basics of nm and friends.
Assembly programmers from years gone by would likley be equally dismissive of the self-aggrandizing code block stitchers of today.
(on topic, RCT was coded entirely in assembly, quite the achievement)
Going to arcane websites, forum full of neckbeards to expect you to already understand everything isn’t exactly a great way to learn
The early Internet was unbelievably hostile to people trying to learn genuinely
(not a judgment, just mentioning in case the distinction is interesting to anyone)
Exciting when it works, but I think a much more exciting result for people with less experience who may not know that the "works for me" demo is the dreaded "first 90%", and even fairly small projects aren't done until the fifth-to-tenth 90%.
(That, and that vibe coding in the sense of "no code review" are prone to balls of mud, so you need to be above average at project management to avoid that after a few sprint-equivalents of output).
For real work, that phase is like starting from a template or a boilerplate repo. The real work begins after the basics are wired together.
> The park rating is climbing. Your flagship coaster is printing money. Guests are happy, for now. But you know what's coming: the inevitable cascade of breakdowns, the trash piling up by the exits, the queue times spiraling out of control.
HN second-chance pool shenanigans.
Genuinely interested.
And what would that reason be? You can git revert a git revert.
Of course if you give an agentic loop root access in yolo mode - then I am not sure how to help...
And the reason jj helps in that case is that for jj there is no such thing as an uncommitted change.
Why? What's the problem you see? The only problem I see is when you let these extra commits pollute the history reachable from any branch you care about.
Let's look at the following:
Internally, 'git stash' consists of two operations: one that makes an 'anonymous' commit of your files, and another that resets those files to whatever they were in HEAD. (That commit is anonymous in the sense that no branch points at it.)
The git libraries expose the two operations separately. And you can build something yourself that works similarly.
You can use these capabilities to build an undo/redo log in git, but without polluting any of the history you care about.
To be honest, I have no clue how Jujutsu does it. They might be using a totally different design.
The problem is git's index let's you write a bunch of unconnected code, then commit it separately. To different branches, even! This works great for stacking diffs but is terribly confusing if you don't know what you're doing.
You just build commits, and then later on you muck around with the mutable pointers that are branches.
"later on" makes it sound to a human like it takes any real amount of time or that it isn't basically instant and wrapped by up porcelean, and "muck around with" implies that there's anything more random or complicated to it then writing the sha to a file in the right place in the .git directory.
If it didn't undo git, it would do it with JJ either.
Then, `git notes` is better for signature metadata because it doesn't change the commit hash to add signatures for the commit.
And then, you'd need to run a local Rekor log to use Sigstore attestations on every commit.
Sigstore.dev is SLSA.dev compliant.
Sigstore grants short-lived release attestation signing keys for CI builds on a build farm to sign artifacts with.
So, when jujutsu autocommits agent-generated code, what causes there to be an {{AGENT_ID}} in the commit message or git notes? And what stops a user from forging such attestations?
> you can manually stage against @-: [with jujutsu]
Stop spamming
Especially not away from git.
Given that other posts solved the problem by scripting this feature on top of git, I guess you're telling them their solution isn't relevant too.
Although git revert is not a destructive operation, so it's surprising that it caused any loss of data. Maybe they meant git reset --hard or something like that. Wild if Codec would run that.
(Which, it's not wrong or anything -- I did say "revert that change" -- it's just annoying. And telling `CLAUDE.md` to commit more often doesn't work consistently, because Claude is a dummy sometimes).
Maybe this is obvious to Claude users but how do you know your remaining context level? There is UI for this?
There might be an input that would produce that sort of effect, perhaps it looks like nonsense (like reading zipped data) but when the LLM attempts to do interactive in it the outcome is close to consuming the context?
This approach means you can just kill the session and restart if you hit limits.
(If you hit context limits you probably also want to look into sub-agents to help prevent context bloat. For example any time you are running and debugging unit tests, it’s usually best to start with a subagent to handle the easy errors. )
I made a tool[1] that lets you just start a new session and injects the original session file path, so you can extract any arbitrary details of prior work from there using sub-agents.
[1] aichat tool https://github.com/pchalasani/claude-code-tools?tab=readme-o...
https://github.com/pchalasani/claude-code-tools?tab=readme-o...
A linear puzzle game like that I would just expect the ai to fly through first time, considering it has probably read 30 years of guides and walkthroughs.
The puzzles would probably be easy. Myst's puzzles are basically IQ tests, and LLMs ace traditional IQ tests: https://trackingai.org/home
On the other hand, navigating the environment, I think the models may fail spectacularly. From what we've seen from Claude Plays Pokemon, it would get in weird loops and try to interact with non-interactive elements of the environment.
You could take those, make the tools better, and repeat the experience, and I'd love to see how much better the run would go.
I keep thinking about that when it comes to things like this - the Pokemon thing as well. The quality of the tooling around the AI is only going to become more and more impactful as time goes on. The more you can deterministically figure out on behalf of the AI to provide it with accurate ways of seeing and doing things, the better.
Ditto for humans, of course, that's the great thing about optimizing for AI. It's really just "if a human was using this, what would they need"? Think about it: The whole thing with the paths not being properly connected, a human would have to sit down and really think about it, draw/sketch the layout to visualize and understand what coordinates to do things in. And if you couldn't do that, you too would probably struggle for a while. But if the tool provided you with enough context to understand that a path wasn't connected properly and why, you'd be fine.
For this to work the way people expect you’d need to somehow feed this info back into fine tuning rather than just appending to context. Otherwise the model never actually “learns”, you’re just applying heavy handed fudge factors to existing weights through context.
1. Being systematic. Having a system for adding, improving and maintaining the knoweldge base 2. Having feedback for that system 3. Implementing the feedback into a better system
I'm pretty happy I have an audit framework and documentation standards. I've refactored the whole knowledge base a few times. In the places where it's overly specific or too narrow in it's scope of use for the retained knowledge, you just have to prune it.
Any garden has weeds when you lay down fertile soil.
Sometimes they aren't weeds though, and that's where having a person in the driver's seat is a boon.
not just make up bullshit about events
pretty heavy/slow javascript but pretty functional nonetheless...
If not for SEO, it’s building quite a good reputation for this company, they got a lot of open positions.
I’m a big fan of transport tycoon, used to play it for hours as a kid and with Open Transport Tycoon it also might have been a good choice, but maybe not B2C?
Do you not think they’re charging enough or something?
It was interesting that the poster vibe-coded (I'm assuming) the CTL from scratch; Claude was probably pretty good at doing that, and that task could likely have been completed in an afternoon.
Pairing the CTL with the CLI makes sense, as that's the only way to gain feedback from the game. Claude can't easily do spatial recognition (yet).
A project like this would entirely depend on the game being open source. I've seen some very impressive applications of AI online with closed-source games and entire algorithms dedicated to visual reasoning.
I'm still trying to figure out how this guy: https://www.youtube.com/watch?v=Doec5gxhT_U
Was able to have AI learn to play Mario Kart nearly perfectly. I find his work to be very impressive.
I guess because RCT2 is more data-driven than visually challenging, this solution works well, but having an LLM try to play a racing game sounds like it would be disastrous.
Am I reading a Claude generated summary here?
> "This was surprising, but fits with Claude's playful personality and flexible disposition."
Is this sentance seriously about a computer? Have we gone so far that computers wont just do what we tell them to anymore?
There are fairly straightforward fixes, such as either using subagents or script a loop and feed the model each item instead of a list of items, as prompt compliance tends to drop the more stuff is in the context, but, yes, they will "get bored" and look for shortcuts.
Another frequent one is deciding to sample instead of working through every item.
I would take any descriptions like "comprehensive", "sophisticated" etc with a massive grain of salt. But the nuts and bolts of how it was done should be accurate.
i enjoy playing video games my own self. separately, i enjoy writing code for video games. i don't need ai for either of these things.
It's still a neat perspective on how to optimize for super-specific constraints.
You and I have _very_ different definitions for the word boring. A lot of effort goes into TAS runs.
This is a real console 0-star TAS: https://youtu.be/iUt840BUOYA
It's kind of like how people started watching Let's Plays and that turned into Twitch.
One of the coolest things recently is VTubers in mocap suits using AI performers to do single person improv performances with. It's wild and cool as hell. A single performer creating a vast fantasy world full of characters.
LLMs and agents playing Pokemon and StarCraft? Also a ton of fun.
Session transcript using Simon Willison's claude-code-transcripts
https://htmlpreview.github.io/?https://gist.githubuserconten...
Reddit post
https://www.reddit.com/r/ClaudeAI/comments/1q9fen5/claude_co...
OpenRCT2!!
https://github.com/jaysobel/OpenRCT2
Project repo
There are some potential solutions to this problem that come to mind. Use subagents to isolate the interesting bits about a screenshot and only feed that to the main agent with a summary. This will all still have a significantly higher token usage compared to a text based interface, but something like this could potentially keep the LLM out of the dumb zone a little longer.
You're totally right! I mean, aside from Anthropic launching "Cowork: Claude Code for the rest of your work" 5 days ago. :)
https://claude.com/blog/cowork-research-preview
https://news.ycombinator.com/item?id=46593022
More to the point though, you should be using Agents in Claude Code to limit context pollution. Agents run with their own context, and then only return salient details. Eg, I have an Agent to run "make" and return the return status and just the first error message if there is one. This means the hundreds/thousands of lines of compilation don't pollute the main Claude Code context, letting me get more builds in before I run out of context there.
> You're totally right! I mean, aside from Anthropic launching "Cowork: Claude Code for the rest of your work" 5 days ago. :)
Claude Cowork does not do "computer use" in the traditional sense. e.g. it cannot use your computer to drive the interface of Adobe Premiere. It is not taking screenshots of your computer desktop, like a traditional "Computer use" product does.
How hard would it be to use with OpenAI's offerings instead? Particularly, imo, OpenAI's better at "looking" at pictures than Claude.
The actual change to implement CC is: https://github.com/jaysobel/OpenRCT2/commit/5d49dc960fcfc133...
I find this very interesting of us humans interacting with AIs.
It is a curiosity, good for headlines, but the takeaway is if you really need an actual good AI, you are still better off not using an LLM powered solution.
And these are the same people that put countless engineers through gauntlets of bizarre interview questions and exotic puzzles to hire engineers.
But when it comes to C++ just vibe it obviously.
1) The map is a grid
2) Turn based
I see no reason why AoE2 would be any different.
Worth noting that openAI Five was mostly deep reinforcement learning and massive distributed training, it didn't use image to text and an LLM for reasoning about what it sees to make its "decisions". But that wouldn't be a good way to do an AI like that anyway.
Oh, and humans still play Dota. It's still a highly competitive community. So that wasn't destroyed at all, most teams now use AI to study tactics and strategy.
I’ve always found it crazy that my LLM has access to such terrible tools compared to mine.
It’s left with grepping for function signatures, sending diffs for patching, and running `cat` to read all the code at once.
I however, run an IDE and can run a simple refactoring tool to add a parameter to a function, I can “follow symbol” to see where something is defined, I can click and get all usages of a function shown at a glance, etc etc.
Is anyone working on making it so LLM’s get better tools for actually writing/refactoring code? Or is there some “bitter lesson”-like thing that says effort is always better spent just increasing the context size and slurping up all the code at once?
An LLM can likely approach compiler-level knowledge just from being smart and understanding what it’s reading, but it costs a lot of context to do this. Giving the LLM access to what the compiler knows as an API seems like it’s a huge area for improvement.
OTOH for a giant monorepo, grep probably won’t work very well.
If you are willing to go language-specific, the tooling can be incredibly rich if you go through the effort. I’ve written some rust compiler drivers for domain-specific use cases, and you can hook into phases of the compiler where you have amazingly detailed context about every symbol in the code. All manner of type metadata, locations where values are dropped, everything is annotated with spans of source locations too. It seems like a worthy effort to index all of it and make it available behind a standard query interface the LLM can use. You can even write code this way, I think rustfmt hooks into the same pipeline to produce formatted code.
I’ve always wished there were richer tools available to do what my IDE already does, but without needing to use the UI. Make it a standard API or even just CLI, and free it from the dependency on my IDE. It’d be very worth looking into I think.
The use cases I have in mind are for codebases with many millions of lines of code, where just dumping it all into the context is unreasonably expensive. In these scenarios, it’d be beneficial to give the LLM a sort of SQL-like language it can use to prod at the code base in small chunks.
In fact I keep thinking of SQL as an example in my head, but maybe it’s best to take it literally: why don’t we have a SQL for source code? Why can’t I do “select function.name from functions where parameters contains …” or similar (with clever subselects, joins, etc) to get back whatever exists in the code?
It’s something I always wanted in general, not just for LLM’s. But LLM’s could make excellent use of it if there’s simply not enough context size to reasonably slurp up all the code.
> Added LSP (Language Server Protocol) tool for code intelligence features like go-to-definition, find references, and hover documentation
https://github.com/anthropics/claude-code/blob/main/CHANGELO...
I am so surprised that all of the AI tooling mostly revolves around VSC or its forks and that JetBrains seem to not really have done anything revolutionary in the space.
With how good their refactoring and code inspection tools are, you’d really think they’d pass of that context information to AI models and that they’d be leaps and bounds ahead.
Go and Dart hardly get the love across their SDKs as Objective-C, Swift, C#, VB get on their owners.
Same with IDE tooling, fully dependant on JetBrains and Microsoft.
Ripgrep is much faster than grep. But the result is not more concise and tokens are wasted.
I think codex uses ast-grep by default, if installed; Claude has to be instructed?
And you can use whatever interface the language servers already use to expose that functionality to eg vscode?
With grep it's easy: Always shows everything that matches.
With grep you get lots of false positives, and for some languages you need a lot of extra rules to know what to grep for. (Eg in Python you might read `+` in the caller, but you actually need to grep for __add__ to find the definition.)
To provide it access to refactoring as a tool also risks confusing it via too many tools.
It's the same reason that waffling for a few minutes via speech to text with tangents and corrections and chaos is just about as good as a carefully written prompt for coding agents.
I think about it, to get these tools to be most effective you have to be able to page things in and out of their context windows.
What was once a couple of queries is now gonna be dozens or hundreds or even more from the LLM
For code that means querying the AST and query it in a way that allows you to limit the results of the output
I wonder which SAST vendor Anthropic will buy.
I think from training it's still biased towards simple tooling.
But also, there is real power to simple tools, a small set of general purpose tools beats a bunch of narrow specific use case tools. It's easier for humans to use high level tools, but for LLM's they can instantly compose the low level tools for their use case and learn to generalize, it's like writing insane perl one liners is second nature for them compared to us.
If you watch the tool calls you'll see they write a ton of one off small python programs to test, validate explore, etc...
If you think about it any time you use a tool there is probably a 20 line python program that is more fit to your use case, it's just that it would take you too long to write it, but for an LLM that's 0.5 seconds
Hard disagree; this wastes enormous amounts of tokens, and massively pollutes the context window. In addition to being a waste of resources (compute, money, time), this also significantly decreases their output quality. Manually combining painfully rudimentary tools to achieve simple, obvious things -- over and over and over -- is *not* an effective use of a human mind or an expensive LLM.
Just like humans, LLMs benefit from automating the things they need to do repeatedly so that they can reserve their computational capacity for much more interesting problems.
I've written[1] custom MCP servers to provide narrowly focused API search and code indexing, build system wrappers that filter all spurious noise and present only the material warnings and errors, "edit file" hooks that speculatively trigger builds before the LLM even has to ask for it, and a litany of other similar tools.
Due to LLM's annoying tendency to fall back on inefficient shell scripting, I also had to write a full bash syntax parser and shell script rewriting ruleset engine to allow me to silently and trivially rewrite their shell invocations to more optimal forms that use the other tools I've written, so that they don't have to do expensive, wasteful things like pipe build output through `head`/`tail`/`grep`/etc, which results in them invariably missing important information, and either wandering off into the weeds, or -- if they notice -- consuming a huge number of turns (and time) re-running the commands to get what they need.
Instead, they call build systems directly with arbitrary options, | filters, etc, and magically the command gets rewritten to something that will produce the ideal output they actually need, without eating more context and unnecessary turns.
LLMs benefit from an IDE just like humans do -- even if an "IDE" for them looks very different. The difference is night and day. They produce vastly better code, faster.
[1] And by "I've written", I mean I had an LLM do it.
However as parent comment said, it seems to always grep instead, unless explicitly said to use the LSP tool.
But they constantly ignore them and use their base CLI tools instead, it drives me batty. No matter what I put in AGENTS.md or similar, they always just ignore the more advanced tooling IME.
I used grep and simple ctags to program in vanilla vim for years. It can be more useful than you'd think. I do like the LSP in Neovim and use it a lot, but I don't need it.
It’s faster, too, as the model doesn’t need to scan for info, but again it really likes to try not to use it.
Of course I still use rg and fd to traverse things, cli tools are powerful. I just wish LLMs could be made to use more powerful tools reliably!
> Starting with version 2025.2, IntelliJ IDEA comes with an integrated MCP server, allowing external clients such as Claude Desktop, Cursor, Codex, VS Code, and others to access tools provided by the IDE. This provides users with the ability to control and interact with JetBrains IDEs without leaving their application of choice.
[1] https://www.jetbrains.com/help/idea/mcp-server.html#supporte...
- search all your code efficiently - search all documentation for libraries - access your database and get real data samples (not just abstract data types) - allows you to select design components from your figma project and implements them for you - allows Claude to see what is rendered in the browser
It’s basically the ide for your LLM client. It really closes the loop and has made Claude and myself so much more productive. Highly recommended and cheap at $10/month
Ps: my personal opinion. I have Zero affiliation with them
So cat, ripgrep, etc are the right tools for them. They need a command line, not a GUI.
1: Maybe you'd argue that Nano Banana is pretty good. But would you say its prompt adherence is good enough to produce, say, a working Scratch program?
What is this? A LinkedIn post?
From the transcript: https://htmlpreview.github.io/?https://gist.githubuserconten... :)
I've been trying to locate the dev of this game since a long time, so I can thank them for an amazing experience.
If anyone knows their social or anything, please do share, including OP.
Also, nice work on CC in this. May actually be interested in Claude Code now.
When I read things like this, I wonder if it's just me not understanding this brave new world, or half of AI developers are delusional and really believe that they are dealing with a sentient being.
nacozarina•3w ago
mcphage•2w ago
Forgeties79•2w ago
Deukhoofd•2w ago
An LLM could potentially make events far more aimed at your character, and could actually respond to things happening in the world far more than what the game currently does. It could really create some cool emerging gameplay.
Braini•2w ago
But isn't the criticism rather that there are too many (as you say repetitive, not relevant) events - its not like there are cool stories emerging from the underlying game mechanics anymore ("grand strategy") but players have to click through these boring predetermined events again and again.
Deukhoofd•2w ago
programd•2w ago
The other option is to have an AI play another AI which is working as an antagonist, trying to make the player fail. More global plagues! More scheming underlings! More questionable choices for relaxation! Bit of an arms race there.
Honestly I prefer Crusader Kings II if for no other reason that the UI is just so brilliantly insanely obtuse while also being very good looking.