frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

OpenCV 5 Is Here: The Biggest Leap in Years for Computer Vision

https://opencv.org/opencv-5/
89•ternaus•3d ago

Comments

leoncos•3d ago
When I use Codex/Claude to complete a computer vision task, such as extracting assets from an image, OpenCV is their default solution. However, I believe that using YOLO and other methods is outdated. The best solution now is to directly use Nano Banana or other AI image models. A paper has proven that image generation models can perform most CV tasks well. I believe the new OpenCV should become a wrapper for VLM or AI image models.
serf•2d ago
do you realize how many edge or unconnected nodes do OpenCV work?

some SBC w/ an industrial camera that is doing pick-place or go/no-go operations on a conveyor belt against a singular object type doesn't need a huge image-gen/llm model governing it.

I mean have you even considered the kind of performance an opencv function can get w/ just mask-matching? I mean even with a fancy YOLO model these answers get thrown out in 1.5-50ms ; this is just a wholly different time scaling.

nicolailolansen•1h ago
Whenever you can run a model like Nano Banana or other vision-LLM with the same compute and time performance/restrictions as an OpenCV or YOLO call, you can make that comparison. Until then, I would not call YOLO and OpenCV outdated, it's simply wrong. There's a time and place for big V-LLMs just as there is a time and place for more "traditional" computer vision methods.
TZubiri•1h ago
I am confused, how can functions that output images help with functions that should take images as input?
taneq•13m ago
They’re multimodal LLMs trained for image generation. Turns out that if you want to generate images you gotta know what things look like.
mirsadm•1h ago
That is a very uninformed view. Real time CV is not going to be doing that anytime soon.
regularfry•1h ago
I've built hardware with a pi zero 2 + pi cam running a mildly fine-tuned YOLO doing local-only object detection as a USB-OTG device, in a use case where any off-device API calls would have been totally unacceptable, and where the object detection was part of the human interaction loop with a hard ceiling of 300ms on the total interaction time of which the object detection was only one process among many.

We're not going to fit Nano Banana or anything like it on a device with 512MB RAM and a GPU old enough to be irrelevant, and again, API calls just aren't on the menu.

kryptiskt•37m ago
If I want to identify and measure the size of round things in my orange sorter machine, I shouldn't have to resort to an unnecessarily complicated solution just because some AI bros can't understand that not everything needs to be an AI model.

Like, the AI model tools already exist, all that would be accomplished if OpenCV pivoted would be to take it away for people who want to do low-level vision programming. It wouldn't add anything useful to the world, just destroy an excellent library.

wongarsu•35m ago
I can get great results from a YOLO model with 30M to maybe 300M params. To get decent CV from a LLM 8B params is the absolute minimum, closer to 30B for interesting tasks

I might be on board about LLMs being the future of OCR (though many would disagree), but for general CV they are very inefficient for very limited benefit

IanCal•16m ago
They can however be extremely useful for curating training data. Also things like SAM and the DINO (/grounding dino) models.

Also if they are better then you can also have a flow that’s cheap model -> marginal cases go to more complex thing (and a chain of these).

The yolo models are really shockingly good for their cost and how well they can work with not much training data as well.

sebmellen•24m ago
Great, let me know when those models can run on-server and process/analyze streams of ID images with less than 100ms of latency. You’ll need to make sure you have a massive set of training data including all manner of slightly blurred and slightly distorted ID cards
hbcondo714•2d ago
> LLMs and VLMs, Running Inside OpenCV…Qwen 2.5, Gemma 3, PaliGemma, and the GPT-2 / GPT-4 family

Why these specific models / versions?

globalnode•1h ago
does this mean im actually able to try object detection in opencv now? i mean i know basic image processing techniques, and i know "in theory" how ML works but ive never really seen a case where i can just say "heres an image now detect all the apples". theres always 1. find a model that has the knowledge, 2. hook it up to an inference engine, 3. do something useful. i always get stuck at 1.
fnands•1h ago
That seems to be the way things are going.

Large general models have taken over in NLP, and (outside of embedded/low latency applications) it seems like they are coming for CV next.

So you should soon be able to have large generic model that can detect whatever for you.

It's already pretty much possible with open-vocabulary detectors like SAM3, where you could just prompt it with "Apple": https://ai.meta.com/research/sam3/

wongarsu•52m ago
YOLO has basically solved that for my use cases for a couple years now. If you want labels that are not in the pretrained labels it's also easy to fine-tune, provided you're willing to label 200 or so images

If you need something less restricted to existing labels (say wanting all the red apples, or all cardboard signs) SAM3 is great, as the sibling comment says

IanCal•14m ago
> provided you're willing to label 200 or so images

A quick note to say that this is also a task you can hand to things like gemini.

shenberg•
ftchd•1h ago
> One practical detail is worth knowing. The new engine is CPU-only at the moment, so if you select a non-CPU backend and target (for example CUDA or OpenVINO through setPreferableBackend and setPreferableTarget), you will want the classic engine.

So there's room for even better performance!

wongarsu•40m ago
It's certainly a choice to make your headline feature a new ONNX engine, feature a bunch of comparisons how it's better than ONNX Runtime, while casually mentioning on the side that the cool new much faster engine is CPU-only

Sure, running models on the CPU is very much a thing in computer vision (the benchmarked YOLOv8n has 37M params). But this whole announcement feels more like OpenCV catching up to the modern world, not "The Biggest Leap in Years for Computer Vision"

Still great, needing fewer libraries is a good thing, but maybe a bit oversold

nnevatie•27m ago
No one uses ONNXRuntime (nor the new engine in OpenCV 5) in production. For anything performance-sensitive, one would run models under TensorRT, as an example.
gunalx•5m ago
Production dosent have to be performance sensitive, so devex may still outcompete the performance differences in some scenarios.
arcanine•18m ago
They really improved the performance. I tested yolov8 medium segmentation model on intel i7 11th gen cpu.

Opencv 4.11 : ~255ms Opencv 5.0.0 : ~185ms

with the same code.

37m ago
moondream is a beast

Microsoft's open source tools were hacked to steal passwords of AI developers

https://techcrunch.com/2026/06/08/microsofts-open-source-tools-were-hacked-to-steal-passwords-of-...
58•raffael_de•1h ago•13 comments

Apple reveals new AI architecture built around Google Gemini models

https://www.macrumors.com/2026/06/08/apple-reveals-new-ai-architecture/
570•unclefuzzy•13h ago•439 comments

OpenCV 5 Is Here: The Biggest Leap in Years for Computer Vision

https://opencv.org/opencv-5/
97•ternaus•3d ago•22 comments

Porting the ThinkPad X61 to Coreboot

https://blog.aheymans.xyz/post/thinkpad_x61/
61•walterbell•4h ago•20 comments

Siri AI

https://www.apple.com/apple-intelligence/
557•0xedb•14h ago•506 comments

Old'aVista – The most powerful guide to the old Internet

https://oldavista.com/
93•abnercoimbre•16h ago•20 comments

xAI is looking more like a datacentre REIT than a frontier lab

https://martinalderson.com/posts/xais-new-rental-business/
547•martinald•17h ago•425 comments

Show HN: Performative-UI – A react component library of design tropes

https://vorpus.github.io/performativeUI/
956•lizhang•18h ago•174 comments

EU-banned pesticides found in rice, tea and spices

https://www.foodwatch.org/en/eu-banned-pesticides-found-in-rice-tea-and-spices
386•john-titor•16h ago•160 comments

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second

https://mimo.xiaomi.com/blog/mimo-tilert-1000tps
555•gainsurier•17h ago•412 comments

Apple Core AI Framework

https://developer.apple.com/documentation/coreai/
286•hmokiguess•13h ago•73 comments

Thi.ng – open-source building blocks for computational design and art

https://thi.ng
16•nmstoker•1d ago•0 comments

Looking Forward to Postgres 19: Query Hints

https://www.pgedge.com/blog/looking-forward-to-postgres-19-query-hints
141•jjgreen•3d ago•22 comments

Show HN: Gitdot – A better GitHub. Open-source, written in Rust

https://gitdot.io/
240•baepaul•15h ago•215 comments

Facebook is paying people overseas promoting Alberta separatism

https://www.cbc.ca/news/canada/facebook-overseas-alberta-separtism-9.7223966
75•vrganj•2h ago•23 comments

GoGoGrandparent (YC S16) is hiring Back end Engineers

https://www.ycombinator.com/companies/gogograndparent/jobs/2vbzAw8-backend-engineer
1•davidchl•5h ago

Passing DBs through continuations

https://remy.wang/blog/cps.html
50•remywang•2d ago•6 comments

Ask HN: What are tools you have made for yourself since the advent of AI?

286•aryamaan•14h ago•467 comments

FrontierCode

https://cognition.ai/blog/frontier-code
180•streamer45•11h ago•32 comments

Ask HN: Why hasn't there been a real competitor to Ticketmaster yet?

158•mdni007•15h ago•128 comments

Why are cells small?

https://burrito.bio/essays/what-limits-a-cells-size
141•mailyk•13h ago•66 comments

Surveillance is not safety: A statement on the UK's latest threat to privacy [pdf]

https://signal.org/blog/pdfs/2026-06-08-uk-surveillance-is-not-safety.pdf
549•g0xA52A2A•12h ago•205 comments

How much do amd64 microarchitecture levels help in Go?

https://lemire.me/blog/2026/06/06/how-much-do-amd64-microarchitecture-levels-help-in-go/
48•zdw•1d ago•25 comments

Job: Head of Stonehenge

https://www.english-heritage.org.uk/about/our-people/careers-with-us/job-search/default-job-page/...
153•mooreds•5h ago•119 comments

I'm building a parallel internet, and it's called The Thinnernet

https://inavoyage.blogspot.com/2026/06/im-building-parallel-internet-and-its.html
79•initramfs•12h ago•79 comments

CRDTs merge concurrent edits. Why not concurrent creation?

https://loro.dev/blog/mergeable-containers
18•czx111331•3h ago•2 comments

AI is slowing down

https://www.wheresyoured.at/ai-is-slowing-down/
533•crescit_eundo•16h ago•553 comments

Apple bets cheaper AI will woo small developers

https://techcrunch.com/2026/06/08/apple-bets-cheaper-ai-will-woo-small-developers/
60•jbernardo95•11h ago•25 comments

Launch HN: Intuned (YC S22) – Build and run reliable browser automations as code

https://intunedhq.com
106•fkilaiwi•19h ago•48 comments

Anti-social: It's fads, not friends, which now dominate social media feeds

https://www.bbc.com/worklife/article/20260520-how-social-media-ceased-to-be-social
631•1vuio0pswjnm7•20h ago•430 comments