frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Only 1 LLM can fly a drone

https://github.com/kxzk/snapbench
69•beigebrucewayne•5h ago

Comments

bigfishrunning•2h ago
Why would you want an LLM to fly a drone? Seems like the wrong tool for the job -- it's like saying "Only one power drill can pound roofing nails". Maybe that's true, but just get a hammer
pavlov•2h ago
Yeah, it feels a bit like asking "which typewriter model is the best for swimming".
peterpost2•2h ago
Did you read his post?

He answers your question

macintux•2h ago
> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that".

https://news.ycombinator.com/newsguidelines.html

philipwhiuk•2h ago
I disagree. The nearest justification is:

> to see what happens

ceejayoz•2h ago
Isn't that the epitome of the hacker spirit?

"Why?" "Because I can!"

munchler•2h ago
Because we’re interested in AGI (emphasis on general) and LLM’s are the closest thing to AGI that we have right now.
notepad0x90•2h ago
There are almost endless reasons why. It's like asking why would you want a self-driving car. Having a drone to transport things would be amazing, or to patrol an area. LLMs can be helpful with object identification, reacting to different events, and taking commands from users.

The first thought I had was those security guard robots that are popping up all over the place. if they were drones instead, and LLM talked to people asking them to do/not-do things, that would be an improvement.

Or an waiter drone, that takes your order in a restaurant, flies to the kitchen, picks up a sealed and secured food container, flies it back to the table, opens it, and leaves. It will monitor for gestures and voice commands to respond to diners and get their feedback, abuse, take the food back if it isn't satisfactory,etc...

This is the type of stuff we used to see in futuristic movies. It's almost possible now. glad to see this kind of tinkering.

lewispollard•2h ago
The point is that you don't need an LLM to pilot the thing, even if you want to integrate an LLM interface to take a request in natural language.
notepad0x90•2h ago
We don't need a lot of things, but new tech should also address what people want, not just needs. I don't know how to pilot drones, nor do I care to learn how to, but I want to do things with drones, does that qualify as a need? Tech is there to do things for us we're too lazy to do.
volkercraig•45m ago
I don't think you understand what an "LLM" is. They're text generators. We've had autopilot since the 1930s that relies on measurable things... like PID loops, direct sensor input. You don't need the "language model" part to run an autopilot, that's just silly.
pixl97•27m ago
You see to be talking past him and ignoring what they are actually saying.

LLMs are a higher level construct than PID loops. With things like autopilot I can give the controller a command like 'Go from A to B', and chain constructs like this to accomplish a task.

With an LLM I can give the drone/LLM system complex command that I'd never be able to encode to a controller alone. "Fly a grid over my neighborhood, document the location of and take pictures of every flower garden".

And if an LLM is just a 'text generator' then it's a pretty damned spectacular one as it can take free formed input and turn it into a set of useful commands.

infecto•3m ago
My confusion maybe? Is this simulator just flying point a to b? Seems like it’s handling collisions while trying to locate the targets and identify them. That seems quite a bit more complex than what you are describing has been solved since the 1930s.
laffOr•27m ago
There are two different things:

1. a drone that you can talk to and fly on its own

2. a drone where the flying is controlled by an LLM

(2) is a specific instance of the larger concept of (1).

You make an argument that 1 should be addressed, which no one is denying in this thread - people are arguing that (2) is a bad way to do (1).

infecto•2h ago
That’s a pretty boring point for what looks like a fun project. Happy to see this project and know I am not the only one thinking about these kinds of applications.
laffOr•1h ago
You could have a program, not LLM-based but could be ANN, for flying and an LLM for overseeing; the LLM could give the program instructions to the pilot program as a (x,y,z) directions. I mean currently autopilots are typically not LLMs, right?

You describe why it would be useful to have an LLM in a drone to interact with it but do not explain why it is the very same LLM that should be doing the flying.

iso1631•1h ago
You want a self driving car

You don't want an LLM to drive a car

There is more to "AI" than LLMs

dan-bailey•2h ago
When your only tool is a hammer, every problem begins to resemble a nail.
infecto•2h ago
What’s the right tool then?

This looks like a pretty fun project and in my rough estimation a fun hacker project.

bob1029•1h ago
The system prompt for the drone is hilarious to me. These models are horrible at spatial reasoning tasks:

https://github.com/kxzk/snapbench/blob/main/llm_drone/src/ma...

I've been working with integrating GPT-5.2 in Unity. It's fantastic at scripting but completely worthless at managing transforms for scene objects. Even with elaborate planning phases it's going to make a complete jackass of itself in world space every time.

LLMs are also wildly unsuitable for real-time control problems. They never will be. A PID controller or dedicated pathfinding tool being driven by the LLM will provide a radically superior result.

ralusek•1h ago
Why would you want an LLM to identify plants and animals? Well, they're often better than bespoke image classification models at doing just that. Why would you want a language model to help diagnose a medical condition?

It would not surprise me at all if self-driving models are adopting a lot of the model architecture from LLMs/generative AI, and actually invoke actual LLMs in moments where they would've needed human intervention.

Imagine if there's a decision engine at the core of a self driving model, and it gets a classification result of what to do next. Suddenly it gets 3 options back with 33.33% weight attached to each of them and a very low confidence interval of which is the best choice. Maybe that's the kind of scenario that used to trigger self-driving to refuse to choose and defer to human intervention. If that can then first defer judgement to an LLM which could say "that's just a goat crossing the road, INVOKE: HONK_HORN," you could imagine how that might be useful. LLMs are clearly proving to be universal reasoning agents, and it's getting tiring to hear people continuously try to reduce them to "next word predictors."

avaer•1h ago
Using an LLM is the SOTA way to turn plain text instructions into embodied world behavior.

Charitably, I guess you can question why you would ever want to use text to command a machine in the world (simulated or not).

But I don't see how it's the wrong tool given the goal.

irl_zebra•6m ago
SOTA typically refers to achieving the best performance, not using the trendiest thing regardless of performance. There is some subtlety here. At some point an LLM might give the best performance in this task, but that day is not today, so an LLM is not SOTA, just trendy. It's kinda like rewriting something in Rust and calling it SOTA because that's the trend right now. Hope that makes sense.
infecto•1m ago
I don’t think trendy is really the right word and maybe it’s not state of the art but a lot of us in the industry are seeing emerging capabilities that might make it SOTA. Hope that makes sense.
smw1218•1h ago
It's a great feature to tell my drone to do a task in English. Like "a child is lost in the woods around here. Fly a search pattern to find her" or "film a cool panorama of this property. Be sure to get shots of the water feature by the pool." While LLMs are bad at flying, better navigation models likely can't be prompted in natural language yet.
volkercraig•42m ago
What you're describing is still ultimately the "view" layer of a larger autopilot system, that's not what OP is doing. He's getting the text generator to drive the drone. An LLM can handle parsing input, but the wayfinding and driving would (in the real world) be delegated to modern autopilot.
Mashimo•1h ago
> Why would you want an LLM to fly a drone?

We are on HACKER news. Using tools outside the scope is the ethos of a hacker.

antisthenes•2h ago
LLMs flying weaponized drones is exactly how it starts.
popcornricecake•55m ago
One day they'll fly to a drone factory, eliminate all the personnel, then start gently shooting at the machinery to create more weaponized drones and then it's all over before you know it!
accrual•1h ago
I think it's fascinating work even if LLMs aren't the ideal tool for this job right now.

There were some experiments with embodied LLMs on the front page recently (e.g. basic robot body + task) and SOTA models struggled with that too. And of course they would - what training data is there for embodying a random device with arbitrary controls and feedback? They have to lean on the "general" aspects of their intelligence which is still improving.

With dedicated embodiment training and an even tighter/faster feedback loop, I don't see why an LLM couldn't successfully pilot a drone. I'm sure some will still fall of the rails, but software guardrails could help by preventing certain maneuvers.

fsiefken•1h ago
I am curious how these models would perform and how much energy they'd take to semi-realtime detect objects: SmolVLM2-500M - Moondream 0.5B/2B/2.5B - Qwen3-VL (3B) https://huggingface.co/collections/Qwen/qwen3-vl

I am sure this is already worked on in Russia, Ukraine and The Netherlands. A lot can go wrong with autonomous flying. One could load the VLM on a high end android phone on the drone and have dual control.

avaer•1h ago
Gemini 3 is the only model I've found that can reason spatially. The results here are accurate to my experiments with putting LLM NPCs in simulated worlds.

I was surprised that most VLLMs cannot reliably tell if a character is facing left or right, they will confidently lie no matter what you do (even gemini 3 cannot do it reliably). I guess it's just not in the training data.

That said Qwen3VL models are smaller/faster and better "spatially grounded" in pixel space, because pixel coordinates are encoded in the tokens. So you can use them for detecting things in the scene, and where they are (which you can project to 3d space if you are running a sim). But they are not good reasoning models so don't ask them to think.

That means the best pipeline I've found at the moment is to tack a dumb detection prepass on before your action reasoning. This basically turns 3d sims into 1d text sims operating on labels -- which is something that LLMs are good at.

Krutonium•52m ago
Neuro-sama, the V-Tuber/AI actually does a decent job of it. Vedal seems to have cooked and figured out how to make an LLM move reasonably well in VRChat.

Not perfectly, there's a lot abuse of gravity or the lack thereof, but yeah. Neuro has also piloted a Robot Dog in the past.

volkercraig•49m ago
I don't understand. Surely training an LSTM with sensor input is more practical and reasonable way than trying to get a text generator to speak commands to a drone.
encrux•22m ago
Very much depends on what you want to do.

The fact that a language model can „reason“ (in the LLM-slang meaning of the term) about 3D space is an interesting property.

If you give a text description of a scene and ask a robot to perform a peg in hole task, modern models are able to solve them fairly easily based on movement primitives. I implemented this on a UR robot arm back in 2023

The next logical step is, instead of having the model output text (code representing movement primitives), outputting tokens in action space. This is what models like pi0 are doing.

eichin•29m ago
At least he's not feeding real drones to the coyotes... oh, there's a link in the readme https://github.com/kxzk/tello-bench
modeless•4m ago
This is what VLA models are for. They would work much better. Would need a bit of fine tuning but probably not much. Lots of literature out there on using VLAs to control drones.

Show HN: Only 1 LLM can fly a drone

https://github.com/kxzk/snapbench
69•beigebrucewayne•5h ago•37 comments

Show HN: An interactive map of US lighthouses and navigational aids

https://www.lighthouses.app/
91•idd2•22h ago•19 comments

Show HN: TUI for managing XDG default applications

https://github.com/mitjafelicijan/xdgctl
128•mitjafelicijan•1d ago•43 comments

Show HN: A small programming language where everything is pass-by-value

https://github.com/Jcparkyn/herd
78•jcparkyn•17h ago•54 comments

Show HN: Netfence – Like Envoy for eBPF Filters

https://github.com/danthegoodman1/netfence
54•dangoodmanUT•1d ago•7 comments

Show HN: Fence – Sandbox CLI commands with network/filesystem restrictions

https://github.com/Use-Tusk/fence
71•jy-tan•5d ago•21 comments

Show HN: Alprina – Intent matching for co-founders and investors

https://www.alprina.com
2•Othrya•5h ago•0 comments

Show HN: NukeCast – If it happened today, where would the fallout go

https://nukecast.com/
16•todd_tracerlab•13h ago•5 comments

Show HN: Bonsplit – Tabs and splits for native macOS apps

https://bonsplit.alasdairmonk.com
238•sgottit•1d ago•33 comments

Show HN: WhyThere – Compare cities side-by-side to decide where to move

https://whythere.life
10•daversa•13h ago•18 comments

Show HN: FaceTime-style calls with an AI Companion (Live2D and long-term memory)

https://thebeni.ai/
30•summerlee9611•17h ago•15 comments

Show HN: AutoShorts – Local, GPU-accelerated AI video pipeline for creators

https://github.com/divyaprakash0426/autoshorts
69•divyaprakash•1d ago•34 comments

Show HN: LLMNet – The Offline Internet, Search the web without the web

https://github.com/skorotkiewicz/llmnet
27•modinfo•1d ago•6 comments

Show HN: C From Scratch – Learn safety-critical C with prove-first methodology

https://github.com/SpeyTech/c-from-scratch
64•william1872•1d ago•10 comments

Show HN: Bytepiper – turn .txt files into live APIs

https://www.bytepiper.com
3•DhirajKadam27•9h ago•2 comments

Show HN: CertRadar – Find every certificate ever issued for your domain

https://certradar.net/
18•ops_mechanic•22h ago•8 comments

Show HN: Coi – A language that compiles to WASM, beats React/Vue

220•io_eric•5d ago•69 comments

Show HN: Sightline – Shodan-style search for real-world infra using OSM Data

https://github.com/ni5arga/sightline
22•ni5arga•1d ago•1 comments

Show HN: Nhx – Node.js Hybrid eXecutor (a uvx inspired tool)

https://www.npmjs.com/package/nhx
4•kolodny•14h ago•0 comments

Show HN: isometric.nyc – giant isometric pixel art map of NYC

https://cannoneyed.com/isometric-nyc/
1315•cannoneyed•3d ago•240 comments

Show HN: Privacy-first JSON/YAML toolkit – 100% client-side, no server

https://tools.pinusx.com
2•dbhariprakash•10h ago•1 comments

Show HN: A Local OS for LLMs. MIT License. Zero Hallucinations. Infinite Memory

https://github.com/merchantmoh-debug/Remember-Me-AI
3•MohskiBroskiAI•15h ago•2 comments

Show HN: Deploy backends without the hassle. An Open source alternative

https://www.shorlabs.com/
14•tarzenyinc•10h ago•0 comments

Show HN: Open-source Figma design to code

https://github.com/vibeflowing-inc/vibe_figma
49•alepeak•2d ago•8 comments

Show HN: Elo ranking for landing pages

https://landingleaderboard.com/
19•Intragalactic•16h ago•10 comments

Show HN: StormWatch – Weather emergency dashboard with prep checklists

https://jeisey.github.io/stormwatch/
43•lotusxblack•1d ago•11 comments

Show HN: Text-to-video model from scratch (2 brothers, 2 years, 2B params)

https://huggingface.co/collections/Linum-AI/linum-v2-2b-text-to-video
156•schopra909•4d ago•24 comments

Show HN: (29x faster)Rapidvalidators - Python's validators lib rewritten in Rust

https://github.com/vivekkalyanarangan30/rapidvalidators
2•vivekkalyanaran•12h ago•0 comments

Show HN: BrowserOS – "Claude Cowork" in the browser

https://github.com/browseros-ai/BrowserOS
87•felarof•4d ago•35 comments

Show HN: VM-curator – a TUI alternative to libvirt and virt-manager

https://github.com/mroboff/vm-curator
40•theYipster•1d ago•9 comments