But it's a vision that I can get behind, where basic tasks like transcription, computer use, in-app tool, image understanding, etc, are local, secure and private.
Apple GUI's have underlying accessibility annotations that if surfaced would make UI manipulation easy for LLM's.
"Back in the day" - 1990's - Apple had Virtual User, basically a lisp derivative that reported UI state as S-expressions (like a web DOM) and allowed scripts to manipulate settings and perform UI actions.
With such a curated DOM/model and selective UI inputs, they could manage privacy and safety, opening up LLM control to users who would otherwise never trust a machine.
I hope they're working on that approach and training models for it. It's one way they could distinguish the Apple platform as being more controllable, with safety and permissions built into the subsystems instead of giving the LLM full control over UI input.
brudgers•2d ago