One oddity is that I haven't seen the claimed improvements beyond the 2025-01-09 tag - subsequent releases improve recall but degrade precision pretty significantly. It'd be amazing if object detection VLMs like this reported class confidences to better address this issue. That said, having a dedicated object detection API is very nice and absent from other models/wrappers AFAIK.
Looking forward to Moondream 3 post-inference optimizations. Congrats to the team. The founder Vik is a great follow on X if that's your thing.
Re: chart understanding, there are a lot of different types of charts out there but it does fairly well! We posted benchmarks for ChartQA in the blog but it's on par with GPT5* and slightly better than Gemini 2.5 Flash.
* To be fair to GPT5, it's going to work well on many more types of charts/graphs than Moondream. To be fair to Moondream, GPT5 isn't really well suited to deploy in a lot of vision AI applications due to cost/latency.
I noticed that Moondream 2 was Apache 2 licensed but the 3 preview is currently BSL ("You can’t (without a deal): offer the model’s functionality to anyone outside your organization—e.g., an external API, or managed hosting for customers") - is that a permanent change to your licensing policies?
I don't even know what a "quantized version" is, but I was expecting answers about NVIDIA graphics cards and their memory. My computer has 24GB of memory, but I'll go for 64GB to run this locally on a new computer.
Aeolun•4mo ago
conwayanderson•4mo ago
lawlessone•4mo ago
Seems like it could be somewhat useful for people with poor eyesight or blindness
conwayanderson•4mo ago
couple people got it running on a raspberry pi though
apwell23•4mo ago
simonw•4mo ago
Hoping someone will correct me if that's not the right mental model!