this is correct but mis-stated - it's not the caches themselves that cost energy but MMUs that automatically load/fetch/store to cache on "page faults". TPUs don't have MMUs and furthermore are a push architecture (as opposed to pull).
If so, wild. That seems like overkill.
[0]: https://henryhmko.github.io/posts/tpu/images/tpu_tray.png
This is not the only way though. TPUs are available to companies operating on GCP as an alternative to GPUs with a different price/performance point. That is another way to get hands-on experience with TPUs.
jan_Sate•4h ago