frontpage.

Show HN: Droidrun – LLM Agent for Android

4•nodueck•3mo ago

Hi HN,

I'm Nikolai, software engineer and co-founder at DroidRun. We built DroidRun, an LLM-based agent that leverages the Android Accessibility Tree for precise control and understanding of UI elements. It works on real phones and emulators, and it's open source.

How it started:

Our co-founder Niels Schmidt (you’ll see him in the demos) coded a prototype and shared a quick video. It went viral, about 50k views on X in under 2 hours. That moment pushed us to go all-in on DroidRun and soon after, we open-sourced it.

How it works:

Most agents rely on screenshots alone for context. We do that plus feed the Accessibility Tree into the LLM. That gives structural, hierarchical, and spatial metadata about UI elements.

Here’s an example:

Screenshot of a real UI: https://imgur.com/a/ePRLpyv

And a matching accessibility JSON snippet:

  {
    "index": 3,
    "resourceId": "com.android.settings:id\\/search_action_bar",
    "className": "LinearLayout",
    "text": "search_action_bar",
    "bounds": "42, 149, 1038, 338",
    "children": [
      {
        "index": 4,
        "resourceId": "com.android.settings:id\\/search_bar_title",
        "className": "TextView",
        "text": "In Einstellungen suchen",
        "bounds": "189, 205, 768, 282",
        "children": []
      }
    ]
  }

We also annotate UI regions in screenshots with numbers, then match them in the tree. This structure gives the agent a deep understanding of what’s on screen, even across different device types like tablets.

This allows for better generalization across devices and screen sizes. Agents can act with greater confidence and fewer hallucinations.

Current Status:

- Ranked #1 on AndroidWorld until recently (it became highly competitive)

- Supports real devices + Emulators

- Strong performance on simple and complex UI tasks

- Gemini 2.5 Pro works best so far, but we’re iterating fast

What's next:

We’re working on a cloud platform where you can run prompts on Android devices without setup. Think of LLM controlling a phone in the cloud, ready to test your automations.

Looking for:

- Feedback from HN

- Collaborators who love Android, LLMs, agents

- OSS contributors

Show HN: A calculus course with an AI tutor watching the lectures with you

Show HN: 83K lines of C++ – cryptocurrency written from scratch, not a fork

Show HN: SAA – A minimal shell-as-chat agent using only Bash

Mario Tchou

Does Anyone Even Know What's Happening in Zim?

The last Morse code maritime radio station in North America [video]

Show HN: Hacker Newspaper – Yet another HN front end optimized for mobile

OpenClaw Is Changing My Life

Everything you need to know about lasers in one photo

SCOTUS to decide if 1988 video tape privacy law applies to internet uses

Epstein files reveal deeper ties to scientists than previously known

Red teamers arrested conducting a penetration test

Show HN: Open-source AI powered Kubernetes IDE

Show HN: Lucid – Use LLM hallucination to generate verified software specs

AI Doesn't Write Every Framework Equally Well

Aisbf – an intelligent routing proxy for OpenAI compatible clients

Let's handle 1M requests per second

OpenClaw Partners with VirusTotal for Skill Security

Goal: Ship 1M Lines of Code Daily

Show HN: Codex-mem, 90% fewer tokens for Codex

FastLangML: FastLangML:Context‑aware lang detector for short conversational text

LineageOS 23.2

Crypto Deposit Frauds

Substack makes money from hosting Nazi newsletters

Framing an LLM as a safety researcher changes its language, not its judgement

Are there anyone interested about a creator economy startup

Show HN: Skill Lab – CLI tool for testing and quality scoring agent skills

2003: What is Google's Ultimate Goal? [video]

Roger Ebert Reviews "The Shawshank Redemption"

Busy Months in KDE Linux