Frontier VLMs (GPT, Claude, Gemini) can describe what they see, but they can’t reliably act on visual inputs. Ask them to detect objects, segment images, or chain visual steps — they’ll fail in surprisingly inconsistent ways. High-res images collapse to ~1024px. And the visual AI ecosystem is fragmented across separate APIs for image understanding, OCR, image-gen, video-gen, etc.
We built Orion to fix this.
Orion combines VLM reasoning with reliable computer-vision tools inside a unified chat-completions interface. You can chain visual steps, inspect results, and treat visual tasks the same way you treat text workflows. Here’s a quick demo [2].
What Orion can do today: - Detect objects, faces, people (with precise, visualized boxes) - Segment objects or salient regions interactively - Edit, remix, and re-imagine images/videos from prompts - Summarize visual content (images or videos) - Transform images: crop, rotate, upscale - Transform videos: trim, sample, highlight scenes - Parse and structure documents: pagination, layout, OCR, extraction
One unified “chat-completions”-like interface — no juggling multiple vision APIs. Check out the tours in the chat [3] or read the announcement [4].
API access opens next week. Happy to answer any questions — otherwise, feel free to try the tours and break things!
[1] Learn more about Orion: https://vlm.run/orion
[2] Promo video: https://youtu.be/cPJN4iZz6QQ
[3] Chat: https://chat.vlm.run
[4] LinkedIn announcement: https://www.linkedin.com/posts/sudeeppillai_ai-computervisio...
aivisionperson•1h ago