Eating some mooncake

2025-09-12

This episode discusses Kimi's serving architecture, the use of mooncake to offload GPU memory, the prevalence of vLLM, and the evolving standard LLM stack.

Listen