To make it easier for more engineers to learn about inference, I wrote a book that provides a survey of the dozens of technologies that work together to make inference possible, along with an introduction to the primary techniques for inference optimization as well as commentary on how those techniques apply across various modalities.
This book is completely free to download digitally, and I'll have print copies with me at various conferences + available to purchase once Amazon decides to approve my account.
I hope you find Inference Engineering useful! Am around to answer any questions.