The core insight: every GUI application already describes itself as structured text through the accessibility layer (built for screen readers since 1997). Every major AI lab is taking screenshots of this and running vision models on it. DirectShell just reads the text directly.
What it does: - Reads every button, field, menu item into a queryable SQLite DB (refreshed every 500ms) - Generates multiple output formats: full DB, interactive elements list, LLM-optimized snapshots (50-200 tokens vs 1,200-5,000 for a screenshot) - Controls apps via 5 action types: click by element name, set text via UIA ValuePattern, type character-by-character, send key combos, scroll - Includes an MCP server so Claude/GPT can use it directly
Day 1 demo: filled 360 Google Sheets cells in 90 seconds, read and replied to a Claude.ai conversation cross-app, wrote to Notepad instantly. No screenshots, no vision model, no coordinate guessing.
Limitations (honest): built in 8.5 hours, single-app scope, Chromium apps need a 4-phase activation hack to expose their tree, accessibility quality varies by app. AGPL-3.0.
Demo: https://youtu.be/nvZobyt0KBg Full technical paper: in the repo under Dokumentation/ and on https://dev.to/tlrag/i-built-a-new-software-primitive-in-85-...