Runs showed that the 260K model ran at ~120 tok/s and at 15M ran at 1.8 tok/s, which probably could be a bit higher if it weren't a single threaded application. Had fun working on it as a weekend project, check it out for yourselves: https://github.com/callbacked/psvita-llm