Show HN: I built an AI agent that turns ROS 2's turtlesim into a digital artist

https://github.com/Yutarop/turtlesim_agent

30•ponta17•8mo ago

I'm a grad student studying robotics, with a particular interest in the intersection of LLMs and mobile robots. Recently, I discovered how easily LangChain enables the creation of AI agents, and I wanted to explore how such agents could interact with simulated environments.

So, I built TurtleSim Agent, an AI agent that turns the classic ROS 2 turtlesim turtle into a creative artist.

With this agent, you can give plain English commands like “draw a triangle” or “make a red star,” and it will reason through the instructions and control the simulated turtle accordingly. I’ve included demo videos on GitHub. Behind the scenes, it uses an LLM to interpret the text, decide what actions are needed, and then call a set of modular tools (motion, pen control, math, etc.) to complete the task.

If you're interested in LLM+robotics, ROS, or just want to see a turtle become a digital artist, I'd love for you to check it out:

GitHub: https://github.com/Yutarop/turtlesim_agent

Looking ahead, I’m also exploring frameworks like LangGraph and MCP (Modular Chain of Thought Planning) to see whether they might be better suited for more complex planning and decision-making tasks in robotics. If anyone here is familiar with these frameworks or working in this space, I’d love to connect or hear your thoughts.

Comments

dpflan•8mo ago

Forgive me for asking, but im always curios about the definition of “agent”. What is an “agent” exactly? Is it a static prompt that is sent along with user input to an LLM service and then handles that resposne? And then it’s done? Is an agent a prompted LLM call? Or some entity that is changing its own prompt as it continues to exist?

karmakaze•8mo ago

It depends on how you look at it. If the output 'it' is a drawing, then the agent is the thing doing the drawing on the user's behalf. In more detail the output thing are commands, so then the agent would be what's generating those commands from the user's input. E.g. a web browser is a user agent that makes requests and renders resources that the user specifies.

ponta17•8mo ago

Thanks for the thoughtful question! The term “agent” definitely gets used in a lot of different ways, so I’ll clarify what I mean here.

In this project, an agent is an LLM-powered system that takes a high-level user instruction, reasons about what steps are needed to fulfill it, and then executes those steps using a set of tools. So it’s more than a single prompted LLM call — the agent maintains a kind of working state and can call external functions iteratively as it plans and acts.

Concretely, in turtlesim_agent, the agent receives an input like “draw a red triangle,” and then: 1. Uses the LLM to interpret the intent, 2. Decides which tools to use (like move forward, turn, set pen color), 3. Calls those tools step-by-step until the task is done.

Hope that clears it up a bit!

paxys•8mo ago

To put it more simply, "agent" is now just a generic term to describe any middleware that sits between user input and a base LLM.

latchkey•8mo ago

This really brings back memories. The first computer language I learned as a child was Logo. My grandfather gifted me a lesson from a local computer store where someone came out to his house and sat with me in front of his Apple II.

I was too young to understand the concepts around the math of steps or degrees. While the thought of programming on a computer was amazing (and later became an engineer), I couldn't grasp Logo, got frustrated, and lost interest.

If I could have had something like this, I'm sure it would have made more sense to me earlier on. It makes me think about how this will affect the learning rate in a positive way.

pj_mukh•8mo ago

Haha this is so incredibly cool.

One thing I might’ve missed, what are the “physics” universe? In the rainbow example the turtle seems to teleport between arcs?

ponta17•8mo ago

Thanks! Great question.

TurtleSim itself doesn't simulate real-world physics — it allows instant position updates when needed. In this project, the goal was to create a digital turtle artist, not to replicate physical realism. So when the agent wants to draw something, it puts the pen down and moves physically (i.e., using velocity commands). But when it doesn't need to draw and just wants to move quickly to another position, it uses a teleport function I provided as a tool.

That's why in the rainbow example, you might see the turtle "jump" between arcs — it's skipping the movement to get to the next drawing point faster.

moffkalast•8mo ago

That's pretty cool, but I feel like all of the LLM integrations with ROS so far have sort of entirely missed the point in terms of useful applications. Endless examples of models sending bare bone twist commands do a disservice to what LLMs are good at, it's like swatting flies with a bazooka in terms of compute used, too.

Getting the robot to move from point A to point B is largely a solved problem with traditional probabilistic methods, while niches where LLMs are the best fit I think are largely still unaddressed, e.g.:

- a pipeline for natural language commands to high level commands ("fetch me a beer" to [send nav2 goal to kitchen, get fridge detection from yolo, open fridge with moveit, detect beer with yolo, etc.]

- using a VLM to add semantic information to map areas, e.g. have the robot turn around 4 times in a room, and have the model determine what's there so it can reference it by location and even know where that kitchen and fridge is in the above example

- system monitoring, where an LLM looks at ros2 doctor, htop, topic hz, etc. and determines if something's crashed or isn't behaving properly, and returns a debug report or attempts to fix it with terminal commands

- handling recovery behaviours in general, since a lot of times when robots get stuck the resolution is simple, you just need something to take in the current situational information, reason about it, and pick one of the possible ways to resolve it

ponta17•8mo ago

Thanks a lot for the thoughtful feedback — I really appreciate it!

I think there might be a small misunderstanding regarding how the LLM is actually being used here (and in many agent-based setups). The LLM itself isn’t directly executing twist commands or handling motion; it’s acting as a decision-maker that chooses from a set of callable tools (Python functions) based on the task description and intermediate results.

In this case, yes — one of the tools happens to publish Twist commands, but that’s just one of many modular tools the LLM can invoke. Whether it’s controlling motion or running object detection, from the LLM’s point of view it’s simply choosing which function to call next. So the computational load really depends on what the tool does internally — not the LLM’s reasoning process itself.

Of course, I agree with your broader point: we should push toward more meaningful high-level tasks where LLMs can orchestrate complex pipelines — and I think your examples (like fetch-a-beer or map annotation via VLMs) are spot-on.

My goal with this project was to explore that decision-making loop in a minimal, creative setting — kind of like a sandbox for LLM-agent behavior.

Actually, I’m currently working on something along those lines using a TurtleBot3. I’m planning to provide the agent with tools that let it scan obstacles via 3D LiDAR and recognize objects through image processing, so that it can make more context-aware decisions.

Really appreciate the push for deeper use cases — that’s definitely where I want to go next!

Show HN: PalettePoint – AI color palette generator from text or images

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: Stacky – certain block game clone

Show HN: A toy compiler I built in high school (runs in browser)

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: Slack CLI for Agents

Show HN: Nginx-defender – realtime abuse blocking for Nginx

Show HN: ARM64 Android Dev Kit

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: MCP App to play backgammon with your LLM

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: Which chef knife steels are good? Data from 540 Reddit tread

Show HN: Horizons – OSS agent execution engine

Show HN: Daily-updated database of malicious browser extensions

Show HN: I Hacked My Family's Meal Planning with an App

Show HN: Slop News – HN front page now, but it's all slop

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: Compile-Time Vibe Coding

Show HN: PalettePoint – AI color palette generator from text or images

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

Show HN: If you lose your memory, how to regain access to your computer?

Show HN: I spent 4 years building a UI design tool with only the features I use

Show HN: I built a <400ms latency voice agent that runs on a 4gb vram GTX 1650"

Show HN: Smooth CLI – Token-efficient browser for AI agents

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

Show HN: Stacky – certain block game clone

Show HN: A toy compiler I built in high school (runs in browser)

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

Show HN: Slack CLI for Agents

Show HN: Nginx-defender – realtime abuse blocking for Nginx

Show HN: ARM64 Android Dev Kit

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

Show HN: MCP App to play backgammon with your LLM

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

Show HN: I built Divvy to split restaurant bills from a photo

Show HN: Micropolis/SimCity Clone in Emacs Lisp

Show HN: Which chef knife steels are good? Data from 540 Reddit tread

Show HN: Horizons – OSS agent execution engine

Show HN: Daily-updated database of malicious browser extensions

Show HN: I Hacked My Family's Meal Planning with an App

Show HN: Slop News – HN front page now, but it's all slop

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

Show HN: I built a free UCP checker – see if AI agents can find your store

Show HN: Local task classifier and dispatcher on RTX 3080

Show HN: Compile-Time Vibe Coding

Show HN: I built an AI agent that turns ROS 2's turtlesim into a digital artist

Comments