I'm working on implementing Nvidia's parakeet tdt ASR model inference in GGML framework. The performance result compared to the MLX python version surprised me. My ggml implementation is 1000x slower than the MLX python version. Any help/comments/suggestions are welcome. THanks a lot!
jasonni•2h ago