Instead of training a detector for a fixed set of classes, you can type what you want to find and the system returns bounding boxes for matching objects.
Examples of prompts: "dented car bumper" "person wearing red backpack" "cat scratching couch" "broken window"
The model is open-vocabulary, so it can generalize beyond predefined categories.
I originally built this while experimenting with ways to bootstrap datasets for training YOLO models without manually labeling thousands of images. The detections can be exported as labels and used to train a traditional detector.
There is a demo where you can upload an image and try different prompts.
Curious where people think prompt-based detection is actually useful in real workflows (dataset labeling, QA inspection, etc.).