Show HN: Chat with Orion – a visual agent that sees, reasons and acts

https://chat.vlm.run/

20•fzysingularity•1h ago

Hey HN! We’re excited to share Orion [1] — our new visual agent that sees, reasons, and acts across images, videos, and documents.

Frontier VLMs (GPT, Claude, Gemini) can describe what they see, but they can’t reliably act on visual inputs. Ask them to detect objects, segment images, or chain visual steps — they’ll fail in surprisingly inconsistent ways. High-res images collapse to ~1024px. And the visual AI ecosystem is fragmented across separate APIs for image understanding, OCR, image-gen, video-gen, etc.

We built Orion to fix this.

Orion combines VLM reasoning with reliable computer-vision tools inside a unified chat-completions interface. You can chain visual steps, inspect results, and treat visual tasks the same way you treat text workflows. Here’s a quick demo [2].

What Orion can do today: - Detect objects, faces, people (with precise, visualized boxes) - Segment objects or salient regions interactively - Edit, remix, and re-imagine images/videos from prompts - Summarize visual content (images or videos) - Transform images: crop, rotate, upscale - Transform videos: trim, sample, highlight scenes - Parse and structure documents: pagination, layout, OCR, extraction

One unified “chat-completions”-like interface — no juggling multiple vision APIs. Check out the tours in the chat [3] or read the announcement [4].

API access opens next week. Happy to answer any questions — otherwise, feel free to try the tours and break things!

[1] Learn more about Orion: https://vlm.run/orion

[2] Promo video: https://youtu.be/cPJN4iZz6QQ

[3] Chat: https://chat.vlm.run

[4] LinkedIn announcement: https://www.linkedin.com/posts/sudeeppillai_ai-computervisio...

Comments

aivisionperson•1h ago

Really crazy results there. would love to test more

SoftwareManHere•1h ago

It's really cool how good of a job it did!

hackintothings•1h ago

I just tried out generating and editing this video it performed a pretty good results which is not possible with other chat interfaces. can you tell what is the bottleneck of this agents?

fzysingularity•1h ago

It's still early days, but we'll expand to more capabilities very quickly given that we're not bottlenecked by training a single large VLM to do these tasks - think video tracking, in-image editing, and 3D.

Lona_Kiragu•1h ago

The AI world just got better with Orion!

slater•1h ago

wow, so many astro-turfed responses in this post. it must be a really good app!!

....

orm•45m ago

The video was interesting. Seems like a nice way to start a shopping search if you have a picture with something you want where the look matters. Eg, cars, furniture. etc.

kernel33•34m ago

I tried object segmentation and it’s really good

fzysingularity•30m ago

Hey, thanks! Curious what you tried to test it. Segmentation models like SAM2 only gets you so far, but by make this instruction-driven with reasoning in the loop, it's remarkable what you can do these days.

Stay tuned for more updates here, tracking segments is coming soon!

Has Airbnb reached its peak?

Night lighting increases ecosystem CO2 release without boosting absorption

What I learned about programming (and life) by running

Show HN: YAML Validator – A simple Docker-based YAML checker

Show HN: Granola API Reverse Engineering

AI is making hedge funds unable to hedge

Ketogenic Diet May Help Ease Depression

Blue Origin lands New Glenn rocket booster on second try

Show HN: A Serious Language App

Code Wiki: Accelerating your code understanding

I posted my project here and the visitor counter went wild – here it is again

OpenMANET Wi-Fi HaLow open-source project for Raspberry Pi–based MANET radios

Professional Programming: The First 10 Years

Jeff Atwood doesn't like DHH

The Department of Labor is embracing AI

Writing is probably the worst use case of AI

Unlocking the Power of C++ and WASM

Michael Burry to close hedge fund as he warns on valuations

Ford CEO says he has 5k open mechanic jobs with 6-figure salaries

Socialist Politics Can Break Through to Asian Immigrants

A Philosophy of Software Design vs. Clean Code (2024)

Measuring Political Bias in Claude

1942 KNILM Douglas DC-3 shootdown

Show HN: An AI to automate busywork for managers in tech

Programming the Commodore 64 with .NET

Show HN: An interactive 2D fractal tree generator/simulator created in Java

Ask HN: On-prem vs. self-serve for my SaaS?

Ask HN: How far have you gone creating a digital twin?

Rage Baiting Product Strategy Is for Losers

Drawbot – let's hack something cute