Pretrained

Eating some mooncake

This episode discusses Kimi's serving architecture, the use of mooncake to offload GPU memory, the prevalence of vLLM, and the evolving standard LLM stack.

Listen