It's just a concept, but i'm sure it's possible to build something like that with esp32 and eink display. It uses fully local speech to text, for real device i'd use a server based transcription.
it works only on desktop, unfortunately didn't have much patience to make it responsive on mobile