Why these specific models / versions?
Large general models have taken over in NLP, and (outside of embedded/low latency applications) it seems like they are coming for CV next.
So you should soon be able to have large generic model that can detect whatever for you.
It's already pretty much possible with open-vocabulary detectors like SAM3, where you could just prompt it with "Apple": https://ai.meta.com/research/sam3/
If you need something less restricted to existing labels (say wanting all the red apples, or all cardboard signs) SAM3 is great, as the sibling comment says
A quick note to say that this is also a task you can hand to things like gemini.
So there's room for even better performance!
Sure, running models on the CPU is very much a thing in computer vision (the benchmarked YOLOv8n has 37M params). But this whole announcement feels more like OpenCV catching up to the modern world, not "The Biggest Leap in Years for Computer Vision"
Still great, needing fewer libraries is a good thing, but maybe a bit oversold
Opencv 4.11 : ~255ms Opencv 5.0.0 : ~185ms
with the same code.
leoncos•3d ago
serf•2d ago
some SBC w/ an industrial camera that is doing pick-place or go/no-go operations on a conveyor belt against a singular object type doesn't need a huge image-gen/llm model governing it.
I mean have you even considered the kind of performance an opencv function can get w/ just mask-matching? I mean even with a fancy YOLO model these answers get thrown out in 1.5-50ms ; this is just a wholly different time scaling.
nicolailolansen•1h ago
TZubiri•1h ago
taneq•13m ago
mirsadm•1h ago
regularfry•1h ago
We're not going to fit Nano Banana or anything like it on a device with 512MB RAM and a GPU old enough to be irrelevant, and again, API calls just aren't on the menu.
kryptiskt•37m ago
Like, the AI model tools already exist, all that would be accomplished if OpenCV pivoted would be to take it away for people who want to do low-level vision programming. It wouldn't add anything useful to the world, just destroy an excellent library.
wongarsu•35m ago
I might be on board about LLMs being the future of OCR (though many would disagree), but for general CV they are very inefficient for very limited benefit
IanCal•16m ago
Also if they are better then you can also have a flow that’s cheap model -> marginal cases go to more complex thing (and a chain of these).
The yolo models are really shockingly good for their cost and how well they can work with not much training data as well.
sebmellen•24m ago