What is ChatGPT Agent?
Just give us the API and stop trying OpenAI.
Can’t say much about the usage as I haven’t tried it yet.
The shitty code it comes up with helps me a lot, because fixing broken stuff and unraveling the first principles of why it's broken is how I learn best. It helps me think. When learning new areas, the goal is to grasp the subject matter enough to find out what's wrong with the generated code - and it's still a pretty safe bet that it will be wrong. Whenever I attempt to outsource the actual thinking (because of feeling lazy or even just to check the abilities of the model), the results are pretty bad and absolutely nowhere near the quality level of anything I'd want to sign my name under.
Of course, some people don't mind and end up wasting time of other people with their generated patches. It's not that hard to find them around. Agentic tools bring down the walls which could let you stop for a moment and notice the sloppiness of that output even further.
The output is very problematic. It breaks itself all the time, makes the same mistakes multiple times, I have to retread my steps. I’m going to have it write tests so it can better tell what it’s breaking.
But being able to say “take this GTK app and add a web server and browser based mode” and it just kinda does it with minimal manual debugging is something remarkable. I don’t fully understand it, it is a new capability. I do robotics and I wish we had this for PCB design and mechanical CAD, but those will take much longer to solve. Still, I am eager to point Claude at my hand written python robotics stack from my last major project [1] and have it clean up and document what was a years long chaotic prototyping process with results I was reasonably happy with.
The current systems have flaws but if you look at where LLMs were five years ago and you see the potential value in fixing the flaws with agentic coding, it is easy to imagine that those flaws will be addressed. There will be higher level flaws and those will eventually be addressed, etc. Maybe not, but I’m quite curious to see where this goes, and what it means for engineering as a human being at these times.
[1] https://github.com/sequoia-hope/acorn-precision-farming-rove...
Is for example Google’s crawl bot an agent?
Is there a prominent successful agent that I could test myself?
So many questions…
You can chain agents together into a string to accomplish larger tasks.
Think of everything involved in booking travel. You have set a budget, pick dates, chose a destinations, etc…. Each step can be defined as an agent and then you chain them together into a tool that handles the entire task for you.
Arranging these in a workflow to automate processes is common, but not agentic.
With LLMs, this went through two phases of shittifaction: first, there was a window where the safety people were hopeful about LLMs because the weren’t agents, so everyone and their mother declared that they would create an agent out if an LLM explicitly because they heard it was dangerous.
This pleased the VCs.
Second, they failed to satisfy the original definition, so they changed the definition of agent to the thing that they made and declared victory. This pleased the VCs
For instance, you could have an "agent" that can read/edit files on your computer by adding something like "to read a file, issue the `read_file $path`" to your prompt, and whenever a line of LLM output that starts with `read_file` is finished, the script running on your computer will read that file, paste it into the prompt, and let the LLM continue its autocomplete-on-steroids.
If you write enough tools and a complicated enough prompt, you end up with an LLM that can do stuff. By default, smart tools usually require user confirmation before actually doing stuff, but if you run the LLM in full agent mode, you trust the LLM not to do anything it shouldn't. curl2bash with LLMs, basically.
An LLM with significant training and access to file access, HTTP(S) API access, and access to some OS APIs can do a lot of work for you if you prompt it right. My experience with Claude/Copilot/etc. is that 75% of the time, the LLM will fail to do what it should be doing without manually repairing its mistakes, but in the other 25% of the time it does look rather sci-fi-ish.
With some tools you can tell your computer "take this directory, examine the EXIF data of each image, map the coordinates to the country and nearest town the picture was taken in, then make directories for each town and move the pictures to their corresponding locations". The LLM will type out shell commands (`ls /some/directory`), interpret the results as part of the prompt response that your computer sends back, and repeat that until its task has been completed. If you prepare a specific prompt and set of tools for the purpose of managing files, you could call that a "file management agent".
Generally, this works best for things you can do by hand in a couple of minutes or maybe an hour if it's a big set of images, but something the computer can now probably take care of you for you. That said, you're basically spending enough CO2 to drive to the store and back, so until we get more energy efficient data centers I'm not too fond of using these tools for banal interactions like that.
Topfi•4h ago
[0] https://news.ycombinator.com/item?id=44596320
andix•4h ago
Tried it once and it really sucked.
bl0rg•3h ago
FergusArgyll•1h ago
[0] https://chatgpt.com/share/68953a55-c5d8-8003-a817-663f565c6f...
diggan•59m ago
I think I've had it available with the separate website ("research preview"?) for months, but yeah, last few weeks it's been directly in ChatGPT.com, and I'm within the EU.