I have built a small CLI tool for extracting image feature vectors for images, videos and image folders. I work in an application area where people who are not deep learning experts are interested in working with image data and foundation models, but are perhaps not familiar with python and would rather analyze embeddings in R or other software. Dinotool makes this possible, producing both local and global vector embeddings and outputs them in parquet files, making it possible to process large amounts of images or videos quite easily.
The tool and API is still in a phase where I would like to improve it further, and any feedback is welcome, especially suggestions for additional backbone models.