Implemented a sliding window based mean n-gram histogram vector solution for fingerprinting embedding models after coming across the post[1] below by Han Xiao of Jina AI and it surprisingly worked way better than I expected! Link to Colab notebook [2] and a quick visualization [3] below.
I had this idea couple of years ago but couldn't get myself to work on it. Seeing the post got me thinking about it again and I was pleasantly surprised at the results.
1 - https://jina.ai/news/identifying-embedding-models-from-raw-n...
2 - https://colab.research.google.com/drive/1CTFltQrHRTViYSs3JLr...
3 - https://www.youtube.com/watch?v=Iv5hmv70xs0