The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

How to Engineer AI Inference Systems with Philip Kiely - #766

Philip Kiely, head of AI education at Baseten, discusses AI inference engineering. He covers topics such as GPU programming, distributed systems, the difference between inference and model serving, and the role of batching, quantization, speculation, and KV cache reuse in…

Listen