If you’re actually deploying inference at scale, I’d love to hear what your "must-haves" are before we lock our specs. I'm specifically curious about the friction of moving away from Nvidia, whether it’s the software stack, physical interconnects, or thermal issues that usually kill new hardware for you.
I’m looking to talk to anyone and everyone who have dealt with inference silicon in production.
If you’re open to sharing some battle stories, my email is in my profile.