frontpage.

Evaluating visual capabilities of language models is hard.

On the one end of the evaluation spectrum, we have vibe checks which, while useful for building intuition, are time-consuming to run across a dozen or more models. On the other end, we have large benchmarks which are so large that they are intractable to most users.

Vision AI Checkup is a new tool for evaluating VLMs. The site is made up of hand-crafted prompts focused on real-world problems: defect detection, understanding how the position of one object relates to another, colour understanding, and more.

Our prompts are especially focused on industrial tasks -- serial number reading, assembly line understanding, and more -- although we're excited to add more general prompts.

The tool lets you see how models do across categories of prompts, and how different models do on a single prompt.

We have open sourced the codebase, with instructions on how to add a prompt to the assessment: https://github.com/roboflow/vision-ai-checkup. You can also add new models.

We'd love feedback and, also, ideas for areas where VLMs struggle that you'd like to see assessed!

Why VC and software have PE envy

Prepare for a scam gold rush with the App Store changes

Ask HN: Is AI more of a threat to software engineers than game designers?

Craig Mod on the Creative Power of Walking: "From this boredom, words flow."

Building the Bridge: Running JavaScript Modules from Dart

What Is HDR, Anyway?

State of the Art SIMD Perlin Noise

China adds to classified TJS, Yaogan satellite series with two launches

Cracking the Dave and Buster's Anomaly

ChatGPT vs. Nazi Encryption: Why Enigma Wouldn't Stand a Chance Today

The Bait and Switch of Trump's Tariffs

Small startups should go to small hackathons

Writing that changed how I think about PL

Show HN: Panels for macOS – Comic Reader for iOS available on the Mac

Calisthenics

The Max Headroom Chronicles

Show HN: Stop Reels – Block Instagram/Facebook/TikTok Reels, Keep the Rest

Native Farmers Pair Ancestral Knowledge with Climate Expertise

Ask HN: Automating Unit Tests

ESPN's new all-access streaming app will cost $29.99 per month

Pope addressed AI as a matter of concern

Show HN: How to make your MCP clients more context-aware

AI therapy is a surveillance machine in a police state

Three things we learned about Sam Altman by scoping his kitchen

Tough microbes found in NASA cleanrooms hold clues to space survival and biotech

Photographer's Lawsuit Could Redefine When Creators Can Sue for Infringement

SmolVLM: Real-time camera-based objection detection demo using llama.cpp

Interview with Seth Godin: On Strategy, Stories and How to Hack Back

Cryptocurrency boss' daughter escapes kidnapping attempt in Paris

Earthquake fault rupture: M7.9 surface rupture near Thazi, Myanmar [video]

Show HN: Vision AI Checkup, an Optometrist for VLMs