We just released Holo1.5, a family of vision-language models tuned for computer-use agents—AI that can see and control real apps (web, desktop, mobile).
Holo1.5 is specialized for UI localization (pixel-accurate element detection) and UI QA (reasoning about screen content), enabling agents to reliably click, type, and navigate software. On benchmarks like Screenspot and WebClick, it sets new SOTA (up to 4.5% better), with the 7B model fully open under Apache 2.0. Models handle 4K screens and are trained with supervised fine-tuning + RL.
We see this as a step toward open, robust autonomous “software operators”—with clear applications in productivity and RPA, but also real security implications (phishing, CAPTCHA bypass, large-scale automation).
marc-thibault•1h ago
Holo1.5 is specialized for UI localization (pixel-accurate element detection) and UI QA (reasoning about screen content), enabling agents to reliably click, type, and navigate software. On benchmarks like Screenspot and WebClick, it sets new SOTA (up to 4.5% better), with the 7B model fully open under Apache 2.0. Models handle 4K screens and are trained with supervised fine-tuning + RL.
We see this as a step toward open, robust autonomous “software operators”—with clear applications in productivity and RPA, but also real security implications (phishing, CAPTCHA bypass, large-scale automation).
Blog: https://www.hcompany.ai/blog/holo-1-5
Models: https://huggingface.co/collections/Hcompany/holo15-68c1a5736...
We’re a European AI team—happy to answer questions about the models, benchmarks, or implications. AMA.